Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler #15233

Closed
29 tasks done
Archermmt opened this issue Jul 5, 2023 · 7 comments
Closed
29 tasks done

[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler #15233

Archermmt opened this issue Jul 5, 2023 · 7 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs

Comments

@Archermmt
Copy link
Contributor

Archermmt commented Jul 5, 2023

  • [M0] Build MSCGraph core parts. Enable translation between Relay, Relax and MSCGraph without lossing information.
    • [M0.1] Passes for set name and layout for expressions (src/contrib/msc/transform)
    • [M0.2] MSCGraph core (src/contrib/msc/core/ir/graph && python/tvm/contrib/msc/core/ir/graph)
    • [M0.3] MSCGraph Builder (src/contrib/msc/core/ir/graph_builder)
    • [M0.4] Codegen (src/contrib/msc/core/codegen, src/contrib/msc/framework/tvm/codegen)
    • [M0.5] Translation test (relax/relay test && related helper modules in python)
  • [M1] Finish RuntimeManager for relax, and torch, so that a compiling process can be test based on MSCGraph.
    • [M1.1] Add translate && codegen for torch
    • [M1.2] Add translate && codegen for tensorflow
    • [M1.3] Add codegen for tensorrt
    • [M1.4] Add Runner and test with relax
    • [M1.5] Add Runner and test with torch
    • [M1.6] Add Runner and test with tensorflow
    • [M1.7] Add Runner and test with tensorrt
  • [M2] Use msc.runtime.Manager to manage the compiling pipeline && tools.
    • [M2.1] Add Manager for compile pipeline
    • [M2.2] Add pruner for model pruning
    • [M2.3] Add tracker for track layer datas
    • [M2.4] Add quantizer for quantize model
  • [M3] Add MSCGym, enable auto compression. Add distiller, enable knowledge distilliation.
    • [M3.1] Add distiller for distill model
    • [M3.2] Add gym for pruning and quantization, enable auto prune/quantize
  • [M4] Add plugin builder, enable plugin wrap in different frameworks.
    • [M4.1] Add plugin && plugin_builder, enable build and test in different frameworks.
    • [M4.2] Enable plugin with manager, test plugins in compile pipeline.
  • [M5] [Optional] Add MSCWrapper as compression toolchain.
    • [M5.1] Build wrapper to support compression
    • [M5.2] Enable quantize && prune with gym by wrapper
    • [M5.3] Support torch.dynamo for dynamic models

cc @quic-sanirudh

@Archermmt Archermmt added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs labels Jul 5, 2023
@Archermmt
Copy link
Contributor Author

@Archermmt
Copy link
Contributor Author

TODO: add tests for M0.2 after M0.3

@Archermmt
Copy link
Contributor Author

Discussion on translate relay to relax without loss info: https://discuss.tvm.apache.org/t/msc-translate-relay-to-relax-without-loss-info/15650

@Lunderberg
Copy link
Contributor

I'm somewhat concerned about the relay -> python codegen -> relax code path used in tvm.contrib.msc.framework.torch.frontend.translate.from_torch when via_relax=False. This is a duplication of the serialization/parsing used in TVMScript (tvm.script), and can cause CI failures (e.g #15783) due to this duplication.

While I agree with the need for a operator-level conversion from relay to relax, I think it should be done through extending the existing relax.testing.relay_translator.from_relay converter rather than having an additional python code-generator.

@Archermmt
Copy link
Contributor Author

Archermmt commented Oct 13, 2023

@Lunderberg sorry for the late reply.... I've checked the failures, seems like tril/triu method have been changed, I'll fix them in latter PRs.

And the reasons why build a duplicate "relay -> relax" converter:

  1. A operator-level conversion is needed, as you said. This is essential when developers want to use relay based features (like me, testing tensorflow).
  2. Using relay also have some problems in optimizing the model, especially in quantization, pruning, parameter reusing and training. The real process in test_translate_torch.py from relay is : relay -> MSCGraph -> relax, MSCGraph is the basic DAG structure in model compression. This via_relax=False only shows an example of using MSC with relax and relay, not meaning to be a converter between relay and relax. When the final solution for the "operator-level conversion from relay to relax" is done, I will change the relay-relax method accordingly.

Thanks for watching !

@Lunderberg
Copy link
Contributor

@Archermmt No worries, and I've been slow responding as well.

After thinking on it, I think my primary concern is in the method used for the MSCGraph -> relax conversion, which is done by first producing a python string, then calling exec on the generated string. This makes it very difficult to tell where an error has been introduced, as any errors in this process are thrown at runtime while executing the generated string.

Instead of generating a string to use the Python API, I think the MSC to Relax conversion should instead be done by directly calling the C++ APIs. This would expose any errors during the C++ compilation, rather than delaying them until runtime.

@Archermmt
Copy link
Contributor Author

@Lunderberg Emmm....I've also thought about this, which method is better: 1. Convert in C++ to enable eager errors detection; 2. Convert by string generation to enable independent loading. Both has advantage and disadvantage.

The first method (lets say converter, either C++ or python) like relax.builder can check and normalize the op while building graph, but that limit the deployment possibility. For example if I need compare the results between an old version tvm without relax and the new unity version(which maybe a real task for me....), I have to spend lot of time setting up environments and dumps testing datas with the converter solution. And MSC is designed not only for converting to relax, but also torch/torch2, tensorflow/tf2, tensorrt, and so on. Considering dispatch models in different framework and environment, the converter may not be a good solution.

The second method (lets say string generation) like cutlass codegen first generate strings and process them to kernel/model/engine. That means codegen process disable check and normalization, that may lead to lazy errors detection. However, strings can be change to script/C++ files and loaded in any environment, that method seperates codegen and loading, which is very essential in fast model release, especially on cloud(where different environment and framework are used).

And as mentioned in the RFC:https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251
MSC is currently targeting at solving the model optimization problems base on relax. That means the codegen part should have the ability of using features in different framework, such as training, weights reusing/reloading, distribution system, and so on. Current I only have experience "describe" these features in python with string generation(not that good at C++ -_-).

To partially solve the error detection problem, the codegen in MSC not only generate the model, but also generate the unittest. Using the unittest developers can locate and solve the problems efficiently.

I think we can leave this part as a todo, thus enable C++ converter for MSC. After the main target is reached, I'll consider of building a converter, or may be directly use relax as the core IR.

junrushao pushed a commit that referenced this issue Dec 31, 2023
This is a pull request for MSC(Multi-System Compile)
RFC: https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251/5
Tracking issue: #15233

This PR change test workspace to random workspace, which fix the bug for workspace conflict.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs
Projects
None yet
Development

No branches or pull requests

2 participants