-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConvertToOnnx options to exclude the data generation pipeline #5271
Comments
Hi, @go2ready . Can you please share the code where you create your model, so that I can fully understand your question . Thanks 😄 |
Thank you very much @antoniovs1029 , this gist is what I used to train the model and convert to ONNX https://gist.github.com/go2ready/05a5f6cf95d98ee8f12c8dda71294f4f This is what the ONNX model exported looks like Ideally I want to produce an ONNX model contains just the part that I circled in red, excluding all previous transformation like OneHotEncoder etc.. is it possible? |
Hi, @go2ready . Unfortunately there's no option in In the meantime, may I ask you what's your use case and why do you want to separate the preprocessing and the inferencing steps on your Onnx model? Why is exporting the full onnx model not good for you? Thanks. |
Hey @antoniovs1029 thanks for looking into this. Our use case is that the training and inferencing process is two individual system, and we have to deploy the model from training env to inferencing env in ONNX format. The preprocessing steps in the training is not needed in our inferencing stage which have all the features gathered already, the preprocessing steps will also make the model file larger and run slower. |
Hi, @go2ready . Thanks for your answer. I still haven't had the chance to try out a "hack" to let you split your onnx model in two, but I do have some follow-up questions, and some suggestions on how to do the hack yourself in the meantime. But before, please notice that ML.NET doesn't provide any direct way to achieve the result you desire, and we actually don't encourage our users to try to split their Onnx models created in ML.NET. There are some hacks to try to do it, but they're, precisely, "hacks". So, the follow-up questions are:
Question 1: In your inferencing system, is your preprocessing done with ML.NET? If it is done with ML.NET, then please consider simply exporting the whole pipeline to onnx and using that in your inferencing system, as your system would be anyway spending time in preprocessing, and you would anyway be spending some disk space on the preprocessing model (whether it's ML.NET-based or the onnx model exported from ML.NET). Question 2: In your training system, is the preprocessing done with ML.NET? If it is done with ML.NET and you can't change that, then it gets trickier. The current implementation of
Notice that this needs to be done only when training, once you have your ONNX model for inferencing, you won't need to do this for inferencing. Also I think it's possible to skip the steps 2 and 3 about saving/loading preprocessed data from disk, by using Please, let us know if the suggestions I made work for you... and if you end up using the hack I described, please let me know if you run into any problems with it. |
By the way, another alternative would be to explore if there any tools elsewhere that let you manipulate an Onnx graph, so that you can manually remove the preprocessing steps. I don't know of any such tool, but maybe |
Hi, @go2ready . Any updates on this? Were you able to apply any of the suggestions I made? Which one? Thanks! |
I'm tagging this as P2 / Feature Enhancement to keep track of the feature request of enabling users to export only part of their pipeline, instead of the whole pipeline, without having to go through the workaround described on my previous comment: #5271 (comment) |
System information
Issue
Calling
mlContext.Model.ConvertToOnnx(model, colSelTrainingData2, fs);
Of which
model
involved data transformations pipeline like encoding, concatenating and column manipulation.All the data transformation pipeline is included in the final ONNX model, which I do not want.
I want the model to tramsformed into an ONNX model where the InputColumn is the InputColumn that is fit to the model, excluding all the data transformations pipeline like encoding, concatenating and column manipulation before fitted to the model.
Source code / logs
The text was updated successfully, but these errors were encountered: