-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pass] Profiling TVM compiler passes #7500
Conversation
example output:
|
After a first pass, this looks mostly good to me. Any idea how much overhead this brings to running the passes? |
Is the percentage after the pass relative to the global scope or to the parent? |
I'll do some basic measurements but it should be pretty negligible. With a runtime flag, the worse case would be like 2 additional boolean comparisons per pass invocation when profiling is disabled. |
Parent, that made the most sense to me but I can see global scope also being useful (maybe more so actually). This would be easy to change. Something that I think people will generally want is a way of exporting the profiler data across FFI to enable more flexible Python-based data analysis (rather than e.g. having to parse stdout or a string), although I'm not sure how the data should be represented. edit: I could probably just patch up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good!
I did a few compilation runs on a large model that takes ~50 seconds and didn't notice any large performance hits, seemed to be mostly hidden by normal compilation variance. In any case, I'm disabling the profiling by default and added API to enable/disable as needed. I think this should be good to go for now. |
I updated the printer to additionally show time spent in the pass itself (excluding sub-passes), along with percentage relative to the total time. I described the exact formatting in the Python docstring, but open to changing. Also, long term I'm planning on exposing the profiling data through the Object FFI so that users can customize output/analysis, but I'll do that in a separate PR. Here's new example output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the nice work. Could you add a unit test to it so that people can easily know how to use the profiler?
Thanks, how should I do this? I can add something like |
Yeah, this should be fine. Is it possible to assert that the passes being observed are printed? |
Hmm, I think I'll change the API to return a String rather than printing to stdout. This should make things more flexible and I'll be able to check the output for the passes in the unit test. I will do this tomorrow- thanks! |
bump @zhiics, sorry for the delay This should be good for merging now |
* basic pass profiler prototype * allow enable/disable of pass profiling * lint * add example pass profiler usage as test * render pass profiles to String instead of stdout
* basic pass profiler prototype * allow enable/disable of pass profiling * lint * add example pass profiler usage as test * render pass profiles to String instead of stdout
This is a basic prototype, looking to have some discussion about the design and what API people would like. This profiler handles nested passes.
cc @tkonolige @masahi @mbrookhart @jroesch