-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@slow test RFC #55
@slow test RFC #55
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @driazati, this is a great concise and meaningful improvement to TVM. I've put a few notes but they're mostly either wording or straight-forward questions.
My personal opinion here is that CI broken often enough on main
we need to improve that before we can really complain about differences between tests. Alongside this, we still have PRs landing without test coverage, so until we can accurately define what we do and don't test, it seems better to remove friction from contributions - hopefully driving us to ask for more and better tests 😸
We should also be prioritising a release schedule rather than treating TVM as ever-green which limits our ability to make larger changes between releases as a broken main
shouldn't impact end users. This change actually moves us closer to this.
TLDR: Supportive + minor comments 😸
[future-possibilities]: #future-possibilities | ||
|
||
- Better communication in Jenkins job pages of which tests ran, which did not, and why | ||
- Different levels of tests. `main` is the most frequent step, but longer running tests could be moved out to nightly or even release level testing (though this makes debugging failures more difficult). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the concerns raised by @areusch is the change in coverage by removing slow tests, it would be good to investigate the difference in coverage both in Python and C++ to help guide us in re-introducing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think that's where flipping the proposal from the pre-RFC from "disable by time cutoff" -> "start with the slowest tests, manually investigate and @slow
" will help out, since we should have a better idea what's going on with the individual tests that way. Coverage data + proper test reports help out here too, we currently have https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/2414/tests but it's broken (doesn't report failures) and we don't have any tooling to scan across those vertically (i.e. from one commit to the next) to see changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just re-read this again, and I think that explicitly stating the roll out strategy as a manual approach of test investigation would clarify this to me - i.e. there's no automation involved albeit you've demonstrated we could automate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bit about implementation in the guide section
rfcs/0055-slow-tests.md
Outdated
Often TVM's tests are running much more work than they actually intend to test in | ||
more of an integration test than a unit test. Replacing these types of test with | ||
a framework that makes it easier to test TVM passes and functionality in smaller | ||
chunks is related but orthogonal to this work. It still has the same issue (coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is coverage still reduced if we put in place the better testing practices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit of both I suppose, if we take an integration test that is running a whole model to test 1 pass and use infrastructure to test just that 1 pass we're making the test more narrow in what its trying to achieve, but in the end it's flexing fewer parts of the system (even if those parts 99% of the time don't matter for the tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The theory at least, is that then each Pass
is tested exhaustively with some integration tests covering interactions between Pass
es? Therefore test coverage should overall be better as we can test more of a single Pass
and be more specific about Pass
interactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see what you mean, it all comes down to implementation/how it's used, but either way it's pretty immaterial to this RFC so I removed that language
Also see discussion in the pre-RFC: https://discuss.tvm.apache.org/t/rfc-ci-skip-slow-tests-on-prs/11910 @areusch @Mousius
Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com>
Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mousius can you take another look when you have a min? |
[future-possibilities]: #future-possibilities | ||
|
||
- Better communication in Jenkins job pages of which tests ran, which did not, and why | ||
- Different levels of tests. `main` is the most frequent step, but longer running tests could be moved out to nightly or even release level testing (though this makes debugging failures more difficult). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just re-read this again, and I think that explicitly stating the roll out strategy as a manual approach of test investigation would clarify this to me - i.e. there's no automation involved albeit you've demonstrated we could automate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @driazati, you're clear to land 😸!
Also see discussion in the pre-RFC: https://discuss.tvm.apache.org/t/rfc-ci-skip-slow-tests-on-prs/11910
@areusch @Mousius