Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@slow test RFC #55

Merged
merged 6 commits into from
Feb 9, 2022
Merged

@slow test RFC #55

merged 6 commits into from
Feb 9, 2022

Conversation

driazati
Copy link
Member

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @driazati, this is a great concise and meaningful improvement to TVM. I've put a few notes but they're mostly either wording or straight-forward questions.

My personal opinion here is that CI broken often enough on main we need to improve that before we can really complain about differences between tests. Alongside this, we still have PRs landing without test coverage, so until we can accurately define what we do and don't test, it seems better to remove friction from contributions - hopefully driving us to ask for more and better tests 😸

We should also be prioritising a release schedule rather than treating TVM as ever-green which limits our ability to make larger changes between releases as a broken main shouldn't impact end users. This change actually moves us closer to this.

TLDR: Supportive + minor comments 😸

[future-possibilities]: #future-possibilities

- Better communication in Jenkins job pages of which tests ran, which did not, and why
- Different levels of tests. `main` is the most frequent step, but longer running tests could be moved out to nightly or even release level testing (though this makes debugging failures more difficult).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the concerns raised by @areusch is the change in coverage by removing slow tests, it would be good to investigate the difference in coverage both in Python and C++ to help guide us in re-introducing tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think that's where flipping the proposal from the pre-RFC from "disable by time cutoff" -> "start with the slowest tests, manually investigate and @slow" will help out, since we should have a better idea what's going on with the individual tests that way. Coverage data + proper test reports help out here too, we currently have https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/2414/tests but it's broken (doesn't report failures) and we don't have any tooling to scan across those vertically (i.e. from one commit to the next) to see changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-read this again, and I think that explicitly stating the roll out strategy as a manual approach of test investigation would clarify this to me - i.e. there's no automation involved albeit you've demonstrated we could automate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bit about implementation in the guide section

rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
Often TVM's tests are running much more work than they actually intend to test in
more of an integration test than a unit test. Replacing these types of test with
a framework that makes it easier to test TVM passes and functionality in smaller
chunks is related but orthogonal to this work. It still has the same issue (coverage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is coverage still reduced if we put in place the better testing practices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit of both I suppose, if we take an integration test that is running a whole model to test 1 pass and use infrastructure to test just that 1 pass we're making the test more narrow in what its trying to achieve, but in the end it's flexing fewer parts of the system (even if those parts 99% of the time don't matter for the tests).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theory at least, is that then each Pass is tested exhaustively with some integration tests covering interactions between Passes? Therefore test coverage should overall be better as we can test more of a single Pass and be more specific about Pass interactions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see what you mean, it all comes down to implementation/how it's used, but either way it's pretty immaterial to this RFC so I removed that language

rfcs/0055-slow-tests.md Outdated Show resolved Hide resolved
driazati and others added 4 commits January 31, 2022 13:57
Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com>
Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com>
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks pretty good to me @driazati , thanks for the writeup! I'll defer to @Mousius for approval since he had more comments here.

rfcs/0055-slow-tests.md Show resolved Hide resolved
@areusch
Copy link
Contributor

areusch commented Feb 2, 2022

@Mousius can you take another look when you have a min?

@driazati driazati requested a review from Mousius February 2, 2022 21:28
[future-possibilities]: #future-possibilities

- Better communication in Jenkins job pages of which tests ran, which did not, and why
- Different levels of tests. `main` is the most frequent step, but longer running tests could be moved out to nightly or even release level testing (though this makes debugging failures more difficult).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-read this again, and I think that explicitly stating the roll out strategy as a manual approach of test investigation would clarify this to me - i.e. there's no automation involved albeit you've demonstrated we could automate it.

@driazati driazati requested a review from Mousius February 8, 2022 18:30
Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @driazati, you're clear to land 😸!

@Mousius Mousius merged commit 9b6203a into apache:main Feb 9, 2022
@denise-k denise-k added this to Q1 2022 in Apache TVM CI & Testing Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants