Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
[WIP] Avoid propagating input columns when applying Onnx models #4971
The goal of this PR is to:
Fixing point number 2 would actually fix point number 1, given the way that the onnx export works today for
I think the main problem here is that
Currently, this PR uses the second approach, and it's based on a previous attempt that @harishsk tried to implement to solve these problems some months ago, but that weren't included on his final commit because other solutions seemed more appropriate at the time.
With his approach, this PR adds a
Personally I am still not convinced that this is the best way to go, particularly after fixing some issues that appeared on our tests while trying to use this approach. I feel that the correct solution should be to stop inheriting from RTRT and then OnnxDataTransform might not be necessary. I still want to explore that other option, perhaps on another PR, but I haven't managed to fully accomplish it, so I'll just leave this PR here and see if anyone has some comments on it.
Also, there are still things that I might be able to remove from OnnxTransformer and OnnxDataTransform, but I will still need to look more into it.
In the vast majority of cases (including the test that Antonio mentioned needs to change) this will be doing the wrong thing. The ML.NET pipelines (unless they explicitly contain a
In general, I think that the problem this PR is trying to solve is not a bug. The contract of the
In reply to: 603839891 (ancestors = 603839891)
You're right. In fact, the test that is failing right now on the CI demonstrates this. It is failing because it is trying to apply a DNN Featurizer .onnx model that has only 1 input and 1 output to a feature column, before using a classifier... and, with the current implementation on this PR, applying the featurizer also drops the "Label" column, and so then the classifier throws an exception because it can't find that column.
I have discussed this offline with Harish, and I had misunderstood what columns where expected to be propagated. So this was my mistake.
In general, the onnx transformer should only drop input columns when the input columns are actually mentioned as input inside the .onnx model, but they aren't connected to the output of the .onnx model. This way it will actually make the ColumnSelectingTransformer to work as expected (i.e. with the onnx transformer actually dropping the columns).
I will update the PR and my comments here to fit the actual expected behavior.