[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler #15233

Archermmt · 2023-07-05T02:24:34Z

Archermmt · 2023-07-05T02:25:10Z

Intorduction @ https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251

Archermmt · 2023-08-18T03:48:43Z

TODO: add tests for M0.2 after M0.3

Archermmt · 2023-09-09T12:39:25Z

Discussion on translate relay to relax without loss info: https://discuss.tvm.apache.org/t/msc-translate-relay-to-relax-without-loss-info/15650

Lunderberg · 2023-09-26T18:28:21Z

I'm somewhat concerned about the relay -> python codegen -> relax code path used in tvm.contrib.msc.framework.torch.frontend.translate.from_torch when via_relax=False. This is a duplication of the serialization/parsing used in TVMScript (tvm.script), and can cause CI failures (e.g #15783) due to this duplication.

While I agree with the need for a operator-level conversion from relay to relax, I think it should be done through extending the existing relax.testing.relay_translator.from_relay converter rather than having an additional python code-generator.

Archermmt · 2023-10-13T05:29:48Z

@Lunderberg sorry for the late reply.... I've checked the failures, seems like tril/triu method have been changed, I'll fix them in latter PRs.

And the reasons why build a duplicate "relay -> relax" converter:

A operator-level conversion is needed, as you said. This is essential when developers want to use relay based features (like me, testing tensorflow).
Using relay also have some problems in optimizing the model, especially in quantization, pruning, parameter reusing and training. The real process in test_translate_torch.py from relay is : relay -> MSCGraph -> relax, MSCGraph is the basic DAG structure in model compression. This via_relax=False only shows an example of using MSC with relax and relay, not meaning to be a converter between relay and relax. When the final solution for the "operator-level conversion from relay to relax" is done, I will change the relay-relax method accordingly.

Thanks for watching !

Lunderberg · 2023-10-18T14:02:54Z

@Archermmt No worries, and I've been slow responding as well.

After thinking on it, I think my primary concern is in the method used for the MSCGraph -> relax conversion, which is done by first producing a python string, then calling exec on the generated string. This makes it very difficult to tell where an error has been introduced, as any errors in this process are thrown at runtime while executing the generated string.

Instead of generating a string to use the Python API, I think the MSC to Relax conversion should instead be done by directly calling the C++ APIs. This would expose any errors during the C++ compilation, rather than delaying them until runtime.

Archermmt · 2023-10-18T23:25:20Z

@Lunderberg Emmm....I've also thought about this, which method is better: 1. Convert in C++ to enable eager errors detection; 2. Convert by string generation to enable independent loading. Both has advantage and disadvantage.

The first method (lets say converter, either C++ or python) like relax.builder can check and normalize the op while building graph, but that limit the deployment possibility. For example if I need compare the results between an old version tvm without relax and the new unity version(which maybe a real task for me....), I have to spend lot of time setting up environments and dumps testing datas with the converter solution. And MSC is designed not only for converting to relax, but also torch/torch2, tensorflow/tf2, tensorrt, and so on. Considering dispatch models in different framework and environment, the converter may not be a good solution.

The second method (lets say string generation) like cutlass codegen first generate strings and process them to kernel/model/engine. That means codegen process disable check and normalization, that may lead to lazy errors detection. However, strings can be change to script/C++ files and loaded in any environment, that method seperates codegen and loading, which is very essential in fast model release, especially on cloud(where different environment and framework are used).

And as mentioned in the RFC:https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251
MSC is currently targeting at solving the model optimization problems base on relax. That means the codegen part should have the ability of using features in different framework, such as training, weights reusing/reloading, distribution system, and so on. Current I only have experience "describe" these features in python with string generation(not that good at C++ -_-).

To partially solve the error detection problem, the codegen in MSC not only generate the model, but also generate the unittest. Using the unittest developers can locate and solve the problems efficiently.

I think we can leave this part as a todo, thus enable C++ converter for MSC. After the main target is reached, I'll consider of building a converter, or may be directly use relax as the core IR.

This is a pull request for MSC(Multi-System Compile) RFC: https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251/5 Tracking issue: #15233 This PR change test workspace to random workspace, which fix the bug for workspace conflict.

Archermmt added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs labels Jul 5, 2023

This was referenced Aug 5, 2023

[RFC][Unity][MSC] MileStone 0 #15489

Closed

[Unity][MSC][M0.1] Enable set name and layout for exprs #15500

Merged

Archermmt mentioned this issue Aug 15, 2023

[Unity][MSC][M0.2] MSCGraph core #15569

Merged

This was referenced Aug 23, 2023

[Unity][MSC][M0.3] MSCGraph Builder #15615

Merged

[Unity][MSC][M0.4 && M0.5] Codegen && Test #15645

Merged

Archermmt mentioned this issue Sep 8, 2023

[Unity][MSC][M1.1] Add translate && codegen for torch #15704

Merged

Archermmt mentioned this issue Sep 14, 2023

[Unity][MSC][M1.2] Add translate && codegen for tensorflow #15751

Closed

Archermmt mentioned this issue Sep 24, 2023

[Unity][MSC][pre M1.2] Reconstruct codegen #15813

Merged

Archermmt mentioned this issue Oct 9, 2023

[Unity][MSC][M1.2] Add translate && codegen for tensorflow #15905

Merged

Archermmt mentioned this issue Oct 19, 2023

[Unity][MSC][M1.3] Add translate && codegen for tensorrt #15950

Merged

Archermmt mentioned this issue Oct 26, 2023

[Unity][MSC][M1.4] Add Runner and test with relax #15997

Merged

Archermmt mentioned this issue Nov 4, 2023

[Unity][MSC][M1.5-1.7] Add Runner and test with torch, tensorflow && tensorrt #16072

Merged

Archermmt mentioned this issue Nov 14, 2023

[Unity][MSC] Enable add attributes while fuse ops #16128

Merged

This was referenced Nov 24, 2023

[Unity][MSC][M2.1] Add Manager for compile pipeline #16163

Merged

[Unity][MSC][M2.1] Add pruner for model pruning #16186

Merged

This was referenced Dec 6, 2023

[Unity][MSC][M2.3] Add tracker for track layer datas #16207

Merged

[Unity][MSC][M2.4] Add quantizer for quantize model #16228

Merged

This was referenced Dec 19, 2023

[Unity][MSC][M3.1] Add distiller for distill model #16264

Merged

[Unity][MSC][M3.2] Add gym for pruning and quantization, enable auto prune/quantize #16280

Merged

Archermmt mentioned this issue Dec 31, 2023

[Unity][MSC][Bugfix] Use random workspace for test #16322

Merged

This was referenced Jan 1, 2024

[Unity][MSC][Legalize] legalize codes and mute logging #16325

Merged

[Unity][MSC][M4.Test] add gym test #16365

Closed

Archermmt mentioned this issue Jan 14, 2024

[Unity][MSC][M4.1] Add plugin && plugin_builder, enable build and test in different frameworks #16397

Merged

This was referenced Jan 24, 2024

[Unity][MSC][Refactor] Reconstruct BYOC and runner #16467

Merged

[Unity][MSC][M4.2][Step1] Enable plugin with manager, test plugins in compile pipeline #16495

Merged

Archermmt mentioned this issue Feb 16, 2024

[Unity][MSC][M4.2][Step2] Enable plugin with manager, test plugins in compile pipeline #16581

Merged

Archermmt mentioned this issue Mar 3, 2024

[MSC][M5.1] Build wrapper to support compression #16668

Merged

Archermmt mentioned this issue Mar 11, 2024

[MSC][M5.2] Enable quantize && prune with gym by wrapper #16702

Merged

Archermmt mentioned this issue Mar 22, 2024

[MSC][M5.3] Support torch.dynamo for dynamic models #16772

Merged

Archermmt closed this as completed May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler #15233

[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler #15233

Archermmt commented Jul 5, 2023 •

edited

Archermmt commented Jul 5, 2023

Archermmt commented Aug 18, 2023

Archermmt commented Sep 9, 2023

Lunderberg commented Sep 26, 2023

Archermmt commented Oct 13, 2023 •

edited

Lunderberg commented Oct 18, 2023

Archermmt commented Oct 18, 2023

[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler #15233

[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler #15233

Comments

Archermmt commented Jul 5, 2023 • edited

Archermmt commented Jul 5, 2023

Archermmt commented Aug 18, 2023

Archermmt commented Sep 9, 2023

Lunderberg commented Sep 26, 2023

Archermmt commented Oct 13, 2023 • edited

Lunderberg commented Oct 18, 2023

Archermmt commented Oct 18, 2023

Archermmt commented Jul 5, 2023 •

edited

Archermmt commented Oct 13, 2023 •

edited