[Dlight] Fix Customized Dynamic Shape Sampling#15466
[Dlight] Fix Customized Dynamic Shape Sampling#15466zxybazh wants to merge 10 commits intoapache:unityfrom
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
e5182e7 to
c667e9a
Compare
c667e9a to
b523607
Compare
a52f8b0 to
4a3ad87
Compare
|
any update? |
|
Thanks for checking in! The issues are all fixed, and demo is updated in mlc-ai/dlight-bench#6 . Ready for review. After this PR it should be really easy to conduct PrimFunc level benchmarking & customize dynamic shape. I'm creating a notebook to demonstrate all the use cases. Will post in the thread. |
|
The mentioned colab notebook can be found here: https://colab.research.google.com/drive/1wof8KvUnAaqoXI7HAcItlOquB4cPfiny?usp=sharing |
| else "static", | ||
| "Time(us)": median * 1e6, | ||
| "Std(us)": std * 1e6, | ||
| "Memory(GB/s)": total_input_bytes / median / 1024**3, |
There was a problem hiding this comment.
One potential future work item is to show the HW maximum bandwidth. This would be helpful to understand the gap from the theoretical upper bound.
There was a problem hiding this comment.
Thanks for the tip! I think it makes sense to have a maximum bandwidth if the workload is memory bond. On the other hand, this is a rough calculation of throughput so it may not reflect the computation flops, we can show peak FLOPS of the hardware for comparison.
c95d45f to
45eeb8c
Compare
This PR is a follow up for #15322 to better support model level workload benchmarking as in mlc-ai/dlight-bench#6
introduces some minor fixes to enable easier dynamic shape sampling for model level workload benchmarking. For example, a dynamic shape sampling function can take
sample_idxandsample_numas input to generate better selection of shapes. It also makes life easier by converting all the dynamic variable to str when parsing from PrimFuncs.Other than that, this PR includes a few minor changes, including type annotation and supporting customizing the columns in the benchmarking output.