Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TVM][RUNTIME] A minimum example to generate external library wrappers for DSOModule #4280

Merged
merged 23 commits into from Nov 22, 2019

Conversation

zhiics
Copy link
Member

@zhiics zhiics commented Nov 8, 2019

A minimum runtime for external codegen as mentioned in #4258

@tqchen
Copy link
Member

tqchen commented Nov 8, 2019

To followup on the discussion thread. While we understand that the current PR achieves the purpose of supporting external library by a different kind of wrapping, it does brings additional code that has duplicated functionality with DSOModule.

Unless there is a strong reason to do so, I think we should not introduce the new Extern runtime. Instead, let us document the DSOModule's calling convention clearly and rewrite compilers to generate functions that can be loaded by the DSO module. That is, instead of generating a function with the extern calling convention, generate a function that takes the following signature

// the original foo you intended to generate
void foo_(float* a, int N, float* b, int M, float* c) {
   bar(...);
   foobar(...);
}

// DSOModule compatible c function that can be compiled by gcc.
extern "C" int foo(TVMValue* value, int *type_code, int nargs) {
   CHECK_EQ(nargs, 2);
   DLTensor* arg0 = static_cast<DLTensor*>(value.v_handle);
   DLTensor* arg1 = static_cast<DLTensor*>(value.v_handle);

   foo_(static_cast<float*>(args0->data), args0->shape[0],
           static_cast<float*>(args1->data), args1->shape[0]));
}

We can just return the above code as a CSourceModule, compile it with gcc and link with to the other generated code. In this way, the generated code can enjoy all the benefits of our existing infra, pass through RPC and combined with other libs.

Rationale

The main rationale behind the suggestion is that we want to avoid code specializations and reduce future technical debts. As we can see the current PR introduces special changes to several locations to support the new mode. Which increases the cost of maintaining the separate loading mechanism.

In short: the best design is not achieved when there is nothing to add, but when there is nothing to take away

@tqchen
Copy link
Member

tqchen commented Nov 8, 2019

As a followup thread about what we might be able to benefit from in terms of external module, it would be great if we have another function that generates something like a graph spec, which can be interpreted and calls into the library functions(just a bit like our graph runtime). We implement the SaveToBinary to serialize the graph spec. See also https://discuss.tvm.ai/t/standardize-graphruntime-exports-into-a-single-dll/4667

This kind of external module is something closer to support modules like tensorrt, or tf and would be a nice example to have.

@zhiics zhiics force-pushed the external_runtime branch 2 times, most recently from 3440e04 to a2b3567 Compare November 11, 2019 18:16
@zhiics
Copy link
Member Author

zhiics commented Nov 11, 2019

@tqchen I removed everything, but just kept C APIs as a CSourceModule. I think this could be a minimal runtime/example that we can use for external library that needs to produce DSO. For the ones that need artifacts like Json (e.g TRT), let me think a little more.

@comaniac
Copy link
Contributor

@tqchen could you help review again? Here is a short summary of the current runtime after your suggestions:

  • The runtime takes only one .so file including host and external kernels.
  • The runtime is now unified with DSOModule so backend developers don't need to implement their own runtime (and GetFunction) anymore. Instead, they need to unpack TVMValue in the generated kernels.
  • The example in this PR demonstrates a case that user backend will generate third-party library calls. This is applicable to runtime-less libraries such as MKL-DNN.
  • For the external backend that has a runtime engine such as TRT, we can still codegen external functions like this example, construct the subgraph in that function, and invoke the TRT engine to execute it on the fly.

We will refine the other codegen PR based on this one after merging.

@tqchen tqchen changed the title [tvm][runtime] A minimum runtime for external library [TVM][RUNTIME] A minimum example to generate external library wrappers for DSOModule Nov 12, 2019
@tqchen
Copy link
Member

tqchen commented Nov 12, 2019

Thanks @comaniac i feel that given it is already unified with the DSOModule, we don't have to call it a separate extern runtime, instead it is a way to interface external libraries into the current DSOModule runtime:)

It would still be great if we can add the other example that I mentioned, where we implement a customized runtime that have its own serialization mechanism(perhaps a graph like graph runtime) and the PackedFunc executes the corresponding functions by interpreting the serialized structure(e.g. graph). This kind of example will be closer to what people want, especially they want to serialize states that are other than sequence of API calls. @zhiics

@zhiics
Copy link
Member Author

zhiics commented Nov 12, 2019

yes, we don't need to call this as a runtime now. It actually just uses DSOModule and CSourceModule. I think @comaniac was asking should we add this example (DSO style external library) first and then move to implement the customized runtime you are mentioning. Never mind, we can work on the customized runtime and update the PR. Thanks.

@comaniac
Copy link
Contributor

It would still be great if we can add the other example that I mentioned, where we implement a customized runtime that have its own serialization mechanism(perhaps a graph like graph runtime) and the PackedFunc executes the corresponding functions by interpreting the serialized structure(e.g. graph). This kind of example will be closer to what people want, especially they want to serialize states that are other than sequence of API calls. @zhiics

I've added another example that mimics the graph-runtime-like programming model. This example assumes the external backend has its own runtime that accepts a graph representation such as JSON. Accordingly, the codegen would serialize the subgraph to a string like this code snippet (I use a very simple representation instead of a complete JSON format in this example for simplification). Note that the codegen can either embed the graph to kernel code, or put the graph to a separate file and read it back in runtime.

@comaniac
Copy link
Contributor

cc @tqchen

@yzhliu
Copy link
Member

yzhliu commented Nov 15, 2019

@tqchen would you take a look again?

@tqchen
Copy link
Member

tqchen commented Nov 15, 2019

@comaniac Thanks for the update! Can you move the ExampleJSONModule into a formal module that is compiled by the TVM runtime? Because normally, we will need to implement a SaveToBinary and LoadFromBinary instead of pumping things into a DSO file

@comaniac
Copy link
Contributor

@comaniac Thanks for the update! Can you move the ExampleJSONModule into a formal module that is compiled by the TVM runtime? Because normally, we will need to implement a SaveToBinary and LoadFromBinary instead of pumping things into a DSO file

The current implementation expects external codegens to write the JSON string directly to the DSO module so that we can have a unified interface. Do you mean we should also provide the following JSON module other than DSO module for developers to use:

JSONModule : runtime::Module {
    void SaveToBinary();
    void LoadFromFile();
};

Now, user would have two JSON files if the user invoke json_module.SaveToBinary() and relay.build()? Would this mean that we have to change API of graph_runtime.create() to include the second JSON file?

Also, we will generate the following function in the .so.

// JSONModule compatible c function that can be compiled by gcc.
extern "C" int foo(TVMValue* value, int *type_code, int nargs, std::string json_file) {
   CHECK_EQ(nargs, 3);
   DLTensor* arg0 = static_cast<DLTensor*>(value.v_handle);
   DLTensor* arg1 = static_cast<DLTensor*>(value.v_handle);

   // Parse json_file and use "foo" as the key to get the subgraph JSON.
  std::string subgraph_json = ...;

   foo_(static_cast<float*>(args0->data), args0->shape[0],
           static_cast<float*>(args1->data), args1->shape[0]),
           subgraph_json);
}

So that the user codegen is expected to generate the following:

// the original foo you intended to generate
void foo_(float* a, int N, float* b, int M, float* c, std::string subgraph_json) {
    // Launch the external engine with subgraph_json and arguements.
}

@tqchen
Copy link
Member

tqchen commented Nov 16, 2019

The current interfacing example already looks great. For the JSONModule usecase, i was thinking about something like

ExampleJSONModule : runtime::Module {
    void SaveToBinary();
    void LoadFromFile();
    PackedFunc GetFunction(name) {
       if (name="tensortrun") {
         for each elem in json:
             if (elem.name == "func1") {
                run_trt_func1(func, elem.args);
             }  
            ...
      }
    }
};

In this case, because JSONModule directly interprets the json and calls into the runtime API, we no longer needs the gcc related components. This is an approach that will work better say if the serialized code is a TF blob plus some meta data and we just want to call into TF runtime.

@tqchen
Copy link
Member

tqchen commented Nov 16, 2019

Also note that my comment about ExampleJSONModule is mainly to serve as example ways to integrate for other external runtimes. The module itself is not meant to be used in production

@zhiics
Copy link
Member Author

zhiics commented Nov 16, 2019

@tqchen Thanks for the suggestions. I made a gtest for the simple example JSON runtime. Hopefully, this is good now. Please take a look when you have time.

@zhiics
Copy link
Member Author

zhiics commented Nov 21, 2019

@u99127 Thanks for your comment.

@tqchen No worries. We actually had the same impression that the example should be put under contrib. But it looks that app dir also fine as this is really just an example. But anyways, we now moved it to contrib and added comments to the file. Please take another look.

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some final comments

CMakeLists.txt Outdated Show resolved Hide resolved
@u99127
Copy link
Contributor

u99127 commented Nov 21, 2019

@u99127 Thanks for your comment.

Thanks @zhiics and @tqchen.

@comaniac
Copy link
Contributor

Comment fixed.
@tqchen somehow the CI doesn't update the config.cmake to enable the example runtime. Would you take a look when you get a chance? Thanks.

@tqchen
Copy link
Member

tqchen commented Nov 21, 2019

@comaniac please send another PR first with the updated Jenkinsfile. For security reasons CI will only respect Jenkinsfile that is already in master

@u99127
Copy link
Contributor

u99127 commented Nov 22, 2019

There are still a couple of changes requested in the review that need to be addressed. It would be good to clean this up with a separate pull request.

@tqchen
Copy link
Member

tqchen commented Nov 22, 2019

@u99127
Copy link
Contributor

u99127 commented Nov 22, 2019

@u99127 please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

Sorry about the noise, it seems I misread the pull request this morning as being merged but now I realize it was because of the CI changes that got merged and not the actual PR.

@zhiics
Copy link
Member Author

zhiics commented Nov 22, 2019

@u99127 Thanks. Cloud you please explicitly approve if everything looks good to you? Otherwise, please let us know your comment. Thanks again.

@u99127
Copy link
Contributor

u99127 commented Nov 22, 2019

LGTM - I don't see an actual "Approve" button if clicking that's expected from a workflow perspective.

Ramana

@zhiics
Copy link
Member Author

zhiics commented Nov 22, 2019

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@comaniac
Copy link
Contributor

@tqchen would you mind to take a final look and merge if it's all good?
Thanks!

@tqchen tqchen merged commit e081051 into apache:master Nov 22, 2019
@tqchen
Copy link
Member

tqchen commented Nov 22, 2019

Thanks @comaniac @zhiics @u99127 !

@tqchen tqchen added status: accepted and removed status: need update need update based on feedbacks labels Nov 22, 2019
@zhiics zhiics deleted the external_runtime branch November 22, 2019 23:53
@tqchen
Copy link
Member

tqchen commented Nov 23, 2019

@zhiics @comaniac please followup to fix the windows build error here https://dev.azure.com/tvmai/tvm/_build/results?buildId=3417 Likely we will need to declare TVM_DLL in the function level instead

zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Nov 26, 2019
yongwww pushed a commit to neo-ai/tvm that referenced this pull request Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants