Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pass] Profiling TVM compiler passes #7500

Merged
merged 5 commits into from
Mar 3, 2021
Merged

Conversation

altanh
Copy link
Contributor

@altanh altanh commented Feb 22, 2021

This is a basic prototype, looking to have some discussion about the design and what API people would like. This profiler handles nested passes.

cc @tkonolige @masahi @mbrookhart @jroesch

@altanh
Copy link
Contributor Author

altanh commented Feb 22, 2021

example output:

tests/python/frontend/pytorch/test_object_detection.py sequential: 16636769us (94.70%)
	RemoveUnusedFunctions: 747us (0.00%)
	ToBasicBlockNormalForm: 16075us (0.10%)
	sequential: 179031us (1.08%)
		InferType: 38727us (21.63%)
		Legalize: 50240us (28.06%)
			InferType: 39085us (77.80%)
		InferType: 39895us (22.28%)
		Legalize: 50149us (28.01%)
			InferType: 38553us (76.88%)
	InferType: 40402us (0.24%)
	Legalize: 67676us (0.41%)
		InferType: 39490us (58.35%)
	EtaExpand: 10162us (0.06%)
	InferType: 40605us (0.24%)
	SimplifyInference: 49953us (0.30%)
		InferType: 39228us (78.53%)
	InferType: 40696us (0.24%)
	EliminateCommonSubexpr: 117125us (0.70%)
		InferType: 36413us (31.09%)
	InferType: 36998us (0.22%)
	SimplifyExpr: 424215us (2.55%)
		InferType: 73673us (17.37%)
		InferType: 74374us (17.53%)
		InferType: 74723us (17.61%)
		InferType: 75850us (17.88%)
		InferType: 37884us (8.93%)
	InlinePrimitives: 58960us (0.35%)
		Inline: 10235us (17.36%)
		DeadCodeElimination: 48721us (82.63%)
			InferType: 37255us (76.47%)
	FoldConstant: 3839048us (23.08%)
		sequential: 1074us (0.03%)
			InferType: 710us (66.11%)
			FuseOps: 158us (14.71%)
				InferType: 88us (55.70%)
			ToANormalForm: 61us (5.68%)
			InferType: 137us (12.76%)
			...

@mbrookhart
Copy link
Contributor

After a first pass, this looks mostly good to me. Any idea how much overhead this brings to running the passes?

@tkonolige
Copy link
Contributor

Is the percentage after the pass relative to the global scope or to the parent?

@altanh
Copy link
Contributor Author

altanh commented Feb 22, 2021

After a first pass, this looks mostly good to me. Any idea how much overhead this brings to running the passes?

I'll do some basic measurements but it should be pretty negligible. With a runtime flag, the worse case would be like 2 additional boolean comparisons per pass invocation when profiling is disabled.

@altanh
Copy link
Contributor Author

altanh commented Feb 22, 2021

Is the percentage after the pass relative to the global scope or to the parent?

Parent, that made the most sense to me but I can see global scope also being useful (maybe more so actually). This would be easy to change. Something that I think people will generally want is a way of exporting the profiler data across FFI to enable more flexible Python-based data analysis (rather than e.g. having to parse stdout or a string), although I'm not sure how the data should be represented.

edit: I could probably just patch up the PassProfile object to use TVM's FFI Object and ObjectRef interface, but not sure how the C++ chrono types would be handled.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good!

src/ir/transform.cc Outdated Show resolved Hide resolved
src/ir/transform.cc Outdated Show resolved Hide resolved
@altanh
Copy link
Contributor Author

altanh commented Feb 24, 2021

After a first pass, this looks mostly good to me. Any idea how much overhead this brings to running the passes?

I did a few compilation runs on a large model that takes ~50 seconds and didn't notice any large performance hits, seemed to be mostly hidden by normal compilation variance. In any case, I'm disabling the profiling by default and added API to enable/disable as needed. I think this should be good to go for now.

@altanh altanh marked this pull request as ready for review February 24, 2021 19:44
@altanh
Copy link
Contributor Author

altanh commented Feb 24, 2021

I updated the printer to additionally show time spent in the pass itself (excluding sub-passes), along with percentage relative to the total time. I described the exact formatting in the Python docstring, but open to changing. Also, long term I'm planning on exposing the profiling data through the Object FFI so that users can customize output/analysis, but I'll do that in a separate PR.

Here's new example output:

InferType: 242us [242us] (0.03%; 0.03%)
InferType: 278us [278us] (0.04%; 0.04%)
InferType: 2501us [2501us] (0.34%; 0.34%)
sequential: 1us [1us] (0.00%; 0.00%)
sequential: 678773us [90us] (91.82%; 91.82%)
	RemoveUnusedFunctions: 92us [92us] (0.01%; 0.01%)
	ToBasicBlockNormalForm: 1219us [1219us] (0.16%; 0.18%)
	sequential: 11724us [12us] (1.59%; 1.73%)
		InferType: 2573us [2573us] (0.35%; 21.95%)
		Legalize: 3059us [738us] (0.41%; 26.09%)
			InferType: 2322us [2322us] (0.31%; 75.89%)
		InferType: 2619us [2619us] (0.35%; 22.34%)
		Legalize: 3460us [865us] (0.47%; 29.51%)
			InferType: 2595us [2595us] (0.35%; 75.00%)
	InferType: 2783us [2783us] (0.38%; 0.41%)
	Legalize: 4525us [2064us] (0.61%; 0.67%)
		InferType: 2461us [2461us] (0.33%; 54.38%)

@altanh altanh changed the title [WIP][Pass] Profiling TVM compiler passes [Pass] Profiling TVM compiler passes Feb 24, 2021
Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the nice work. Could you add a unit test to it so that people can easily know how to use the profiler?

@altanh
Copy link
Contributor Author

altanh commented Feb 25, 2021

Thank for the nice work. Could you add a unit test to it so that people can easily know how to use the profiler?

Thanks, how should I do this? I can add something like tests/python/relay/test_pass_profiler.py that just shows how to use it but there isn't really a "correctness" property I can check. I'll go ahead and do this but let me know if you have something more specific.

@zhiics
Copy link
Member

zhiics commented Feb 25, 2021

Thank for the nice work. Could you add a unit test to it so that people can easily know how to use the profiler?

Thanks, how should I do this? I can add something like tests/python/relay/test_pass_profiler.py that just shows how to use it but there isn't really a "correctness" property I can check. I'll go ahead and do this but let me know if you have something more specific.

Yeah, this should be fine. Is it possible to assert that the passes being observed are printed?

@altanh
Copy link
Contributor Author

altanh commented Feb 25, 2021

Yeah, this should be fine. Is it possible to assert that the passes being observed are printed?

Hmm, I think I'll change the API to return a String rather than printing to stdout. This should make things more flexible and I'll be able to check the output for the passes in the unit test. I will do this tomorrow- thanks!

@altanh
Copy link
Contributor Author

altanh commented Mar 1, 2021

bump @zhiics, sorry for the delay

This should be good for merging now

@jroesch jroesch merged commit 3a02e0b into apache:main Mar 3, 2021
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* basic pass profiler prototype

* allow enable/disable of pass profiling

* lint

* add example pass profiler usage as test

* render pass profiles to String instead of stdout
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
* basic pass profiler prototype

* allow enable/disable of pass profiling

* lint

* add example pass profiler usage as test

* render pass profiles to String instead of stdout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants