Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon] Support template-free meta schedule tuning #12854

Merged
merged 4 commits into from Oct 3, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Sep 21, 2022

Building on #12845, this PR adds an initial support for template-free auto tuning on Hexagon.

Test cases demonstrate:

  • Auto-scheduler style, template free tuning for fp16 conv2d in NHWC layout.
  • vrmpy auto tensorization for TE int8 dense (weight pre-packed), achieving 440 GOPs on SD888.

Known issues:

  • Due to the issue explained in [Node][Metaschedule] Allow ignoring NDArray raw data in StructuralEqual / Hash #12706, link-params = True, required by Hexagon, causes identical workloads to be tuned as distinct tasks. So e2d tuning is very slow without the changes from 12706.
  • Tuning nn.dense essentially requires metaschedule RewriteLayout postproc: I found that the memory access pattern of nn.dense, C[i, j] += A[i, k] * B[j, k], where the j axis is vectorized, performs terribly on Hexagon. But the implementation of RewriteLayout is completely incompatible with link-params = True. Until we fix this, we cannot enable RewriteLayout for Hexagon and hence tuning nn.dense (and nn.batch_matmul) is not supported for now.

cc @kparzysz-quic @junrushao @tmoreau89

"relay.FuseOps.link_params": link_params,
"relay.backend.use_meta_schedule": True,
"relay.backend.tir_converter": "default",
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See

if pass_config is None:
pass_config = {
"relay.backend.use_meta_schedule": True,
"relay.backend.tir_converter": tir_converter,
}
for why this change is necessary. We only need to pass relay.FuseOps.link_params config, others are for compatibility with the existing code.

@masahi
Copy link
Member Author

masahi commented Sep 21, 2022

@tvm-bot rerun

postproc.RewriteTensorize(vectorize_init_loop=True),
]

if True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover from something?

Copy link
Member Author

@masahi masahi Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally left it so that people can experiment with both then and else paths. The else path just compiles and runs the best schedule found in my experiment, which reproduces 440 GOPs performance.

@tmoreau89 tmoreau89 merged commit fa17da2 into apache:main Oct 3, 2022
@tmoreau89
Copy link
Contributor

Thanks @masahi @kparzysz-quic, the PR has been merged!

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [Metaschedule] Support template-free tuning on Hexagon

* enable multi threading

* update tests

* black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants