Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable for qnn operations for const folding transformation #9164

Closed
wants to merge 1 commit into from

Conversation

apeskov
Copy link
Contributor

@apeskov apeskov commented Sep 30, 2021

Current the sequence cons -> qnn.quantize is not treated like a constant subgraph. Suggestion is to allow FoldConstant pass to replace this pattern with single int8 constant tensor.

Reason: Some BYOC runtimes may has a limitation to have a weight like a constant tensor. Pointed FoldConstant pass limitation may breaks BYOC runtimes applicability.

Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apeskov !

It is an interesting change.

However, looking at the test case, I am struggling to understand why this can not be achieved via running Legalize before ConstantFold pass. Would you be able to share some motivation?

func = relay.Function([x], add)
return func

zz = run_opt_pass(before(), transform.FoldConstant())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it different from running Legalize followed by FoldConstant ?

@masahi
Copy link
Member

masahi commented Oct 7, 2021

@apeskov Please see this PR #9135. I understand why you want to do this, namely, constant fold quantize(weight_fp32) in a QNN graph. Returning float32 weights from the PyTorch frontend and relying on Relay constant folding to recover quantized weights was my design mistake. Now you can directly obtain quantized weights from the frontend (we do quantize at numpy level).

@manupa-arm Running lowering before const fold is not acceptable when we want to keep the rest of QNN graphs (BYOC), while selectively lower constant subgraphs and evaluate them.

@manupak
Copy link
Contributor

manupak commented Oct 7, 2021

@masahi What is stopping us from the running the legalization on the IRModule with just the external function ? i.e. in the relay.ext.<codegen>

@masahi
Copy link
Member

masahi commented Oct 7, 2021

I'm thinking of use cases in BYOC where we want to pattern match against QNN ops, in which case we don't want to run QNN legalization. Not sure if this answers your question @manupa-arm

In my prev job, I directly took QNN subgraphs and send them to external codegen. I believe ethos-N does something similar. We had to develop constant folding on the external codegen side.

@manupak
Copy link
Contributor

manupak commented Oct 7, 2021

Hmmm, I was not suggesting to run the legalization before the partitioning.

We could identify the patterns with QNN ops and then we partition the external function.
I believe the requirement to do the legalization + constant folding comes post partitioning.

So there are two places we could do this :

  1. post PartitionGraph pass in the partition_for_* function
  2. relay.ext. which gets the partitioned function passed in via relay.build process.

I believe, this particular requirement is to mutate only the external function (thus, we could do it 2) and not 1) ) and not the 'main'. Therefore, why cant we achieve the same effect running the two passes -- legalization + constant folding -- in 2) ?

@masahi
Copy link
Member

masahi commented Oct 7, 2021

Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on main, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like qnn.quantize(const_weight_fp32). The other case is QNN produced by FakeQuantizationToInteger pass, which also generates many qnn.quantize with constant weights.

In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization had never been my option. I did wish that we could selectively lower const-foldable QNN subgraphs only. Maybe I'm missing something.

@manupak
Copy link
Contributor

manupak commented Oct 7, 2021

In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops?

I think the constant folding pass is supposed to work in the IRModule (with the external function). Therefore, everything in the IRModule will be affected. However, we could create IRModules with what is in-scope for the transformation.

I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization had never been my option. I did wish that we could selectively lower const-foldable QNN subgraphs only. Maybe I'm missing something.

It is about further granularity one would to do further partitioning. Today, I think we need to do further partitioning to achieve this. However, whether we want to annotations to block constant folding seems like an interesting but an orthogonal conversation to this one.

In the scope of changes in this PR, I feel it does the same thing (destroys QNN info in the process of constant folding). However, we could control what we want to pass into the Constant Folding Pass.

@masahi
Copy link
Member

masahi commented Jan 9, 2022

@apeskov please update or close this PR

@masahi
Copy link
Member

masahi commented Jan 19, 2022

@apeskov Please update or close this PR

@areusch
Copy link
Contributor

areusch commented Apr 8, 2022

closing due to inactivity, feel free to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants