Add OnnxTransform for scoring Onnx 1.2 models - integrates Microsoft.ML.Scoring/Sonoma Library#942
Conversation
|
|
||
| <ItemGroup> | ||
| <ProjectReference Include="../Microsoft.ML/Microsoft.ML.nupkgproj" /> | ||
| <PackageReference Include="Microsoft.ML.Scoring" Version="1.0.4-dev47509" /> |
There was a problem hiding this comment.
1.0.4-dev47509 [](start = 62, length = 14)
please put it to https://github.com/dotnet/machinelearning/blob/master/build/Dependencies.props #Closed
| internal const string LoaderSignature = "OnnxTransform"; | ||
|
|
||
| public readonly string[] Inputs; | ||
| public readonly string[] Outputs; |
There was a problem hiding this comment.
Do you plan to support multiple input /outputs or only one?
It looks weird, you constantly cast single objects to arrays and back. #Closed
There was a problem hiding this comment.
It's single input/output transform. I cleaned all the casts.
In reply to: 219612560 [](ancestors = 219612560)
| { | ||
| public sealed class OnnxTransform : ITransformer, ICanSaveModel | ||
| { | ||
| public sealed class Arguments : TransformInputBase |
There was a problem hiding this comment.
Arguments [](start = 28, length = 9)
So far we have tendency to use Arguments class for entry points and command line, and to have separate class ColumnInfo with same functionality, but without all attributes.
And in our constructors we tend to use ColumnInfo class (except constructor for SignatureLoadDataTransform) #Pending
There was a problem hiding this comment.
I was basing this on TensorFlowTransform which only has Arguments class. I looked at couple other transforms (MutualInformationFeatureSelectionTransform, GroupTransform) and they have Arguments only. Could you please point me to a sample transform that follows above? #Resolved
There was a problem hiding this comment.
[Adding Shahab's response]
I was basing this on TensorFlowTransform which only has Arguments class. I looked at couple other transforms (MutualInformationFeatureSelectionTransform, GroupTransform) and they have Arguments only. Could you please point me to a sample transform that follows above?
In reply to: 219613034 [](ancestors = 219613034)
| } | ||
|
|
||
| // Factory method for SignatureLoadDataTransform | ||
| public static IDataTransform Create(IHostEnvironment env, ModelLoadContext ctx, IDataView input) |
There was a problem hiding this comment.
public [](start = 8, length = 6)
can make private #Closed
There was a problem hiding this comment.
There was a problem hiding this comment.
Aren't some of the Create(...) need to be public, b/c they get called from another assembly? #Closed
There was a problem hiding this comment.
Some, yes, probably one which returns IDataTransform. All other methods, just for our dependency framework, and can be private.
In reply to: 220260318 [](ancestors = 220260318)
There was a problem hiding this comment.
There was a problem hiding this comment.
Ok, thanks Ivan. We've made them private for now -- all unit tests are passing so far.
In reply to: 220309543 [](ancestors = 220309543,220260318)
| ctx.SaveNonEmptyString(_args.OutputColumn); | ||
| } | ||
|
|
||
| internal sealed class Mapper : IRowMapper |
There was a problem hiding this comment.
internal [](start = 8, length = 8)
why not private? #Closed
There was a problem hiding this comment.
| /// After adaptation, you'd call GetTensor() on the IdvToTensorAdapter object to get the Tensor equivalent of | ||
| /// each row. | ||
| /// </summary> | ||
| internal sealed class IdvToTensorAdapter |
There was a problem hiding this comment.
IdvToTensorAdapter [](start = 30, length = 18)
I would move it to OnnxUtils #Closed
There was a problem hiding this comment.
| [Argument(ArgumentType.Required, HelpText = "Path to the onnx model file.", ShortName = "model", SortOrder = 0)] | ||
| public string ModelFile; | ||
|
|
||
| [Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "TBD", SortOrder = 1)] |
There was a problem hiding this comment.
TBD [](start = 81, length = 3)
nit: needs to be updated #Resolved
There was a problem hiding this comment.
|
|
||
| public static OnnxModel CreateFromBytes(byte[] modelBytes) | ||
| { | ||
| var tempModelDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString()); |
There was a problem hiding this comment.
tempModelDir [](start = 16, length = 12)
when does this tempModelDir get cleaned up ?
| public static OnnxModel CreateFromBytes(byte[] modelBytes) | ||
| { | ||
| var tempModelDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString()); | ||
| Directory.CreateDirectory(tempModelDir); |
There was a problem hiding this comment.
tempModelDir [](start = 38, length = 12)
might consider ACL this temp dir .
Temp directories with models are considered executable code and considered security threat by .NET security team
| </para> | ||
|
|
||
| <para> | ||
| This transform requires the <a href="https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML.TensorFlow/0.5.0-preview-26830-5">Microsoft.ML.TensorFlow</a> nuget to be installed. |
There was a problem hiding this comment.
Microsoft.ML.TensorFlow [](start = 150, length = 23)
is this a requirement for the OnnxTransform ? #Resolved
There was a problem hiding this comment.
Jignesh, this should be Sonoma nuget, right? Could you please update it. #Resolved
There was a problem hiding this comment.
The entire doc.xml has been modified. No reference to TF anywhere now.
In reply to: 219920734 [](ancestors = 219920734)
There was a problem hiding this comment.
Yes, this doc has been rewritten. The version being commented on was obselete.
In reply to: 220279203 [](ancestors = 220279203)
| { | ||
| _host.CheckNonWhiteSpace(args.ModelFile, nameof(args.ModelFile)); | ||
| _host.CheckUserArg(File.Exists(args.ModelFile), nameof(args.ModelFile)); | ||
| _model = new OnnxModel(args.ModelFile); |
There was a problem hiding this comment.
_model [](start = 16, length = 6)
is there a memory limitation of keeping everything in memory ? i would presume some Onnx models to be quite large
#Resolved
There was a problem hiding this comment.
This API is what is supported in Sonoma currently. There's no alternative at this point. This can be a design discussion for the Lotus# library. Making as resolved.
In reply to: 219921638 [](ancestors = 219921638)
…machinelearning into jignparm/onnxtransform
| } | ||
|
|
||
| [Fact] | ||
| public void OnnxStatic() |
There was a problem hiding this comment.
Not related to this one, but I would love to have test which would ran ML.Net-> Onnx conversion and then score that model, and compare results. #WontFix
There was a problem hiding this comment.
Yup, that's a great Uber-test to have. We can probably track that as a separate feature or task (since it's using 2 diff transforms). Let me know if we can close this for now.
In reply to: 219967750 [](ancestors = 219967750)
|
|
||
| <PropertyGroup> | ||
| <TargetFramework>netstandard2.0</TargetFramework> | ||
| <IncludeInPackage>Microsoft.ML.TensorFlow</IncludeInPackage> |
There was a problem hiding this comment.
Microsoft.ML.TensorFlow [](start = 22, length = 23)
Microsoft.ML.OnnxTransform #Closed
| try | ||
| { | ||
| pipe.Fit(invalidDataWrongVectorSize); | ||
| //Assert.False(true); |
There was a problem hiding this comment.
//Assert.False(true); [](start = 16, length = 21)
why this is commented? #Closed
There was a problem hiding this comment.
| var invalidDataWrongTypes = ComponentCreation.CreateDataView(Env, stringData); | ||
| var invalidDataWrongVectorSize = ComponentCreation.CreateDataView(Env, sizeData); | ||
| TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongNames); | ||
| //TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongTypes); |
There was a problem hiding this comment.
//TestEstimatorCore(pipe, dataView, invalidInput: invalidDataWrongTypes); [](start = 12, length = 73)
? #Closed
There was a problem hiding this comment.
| { | ||
| using (var env = new ConsoleEnvironment()) | ||
| { | ||
| Assert.Equal(Maml.Main(new[] { @"showschema loader=Text{col=a:R4:0-3 col=b:R4:0-3} xf=OnnxTransform{inputs=a inputs=b outputs=c model={model_matmul/frozen_saved_model.pb}} in=f:\2.txt" }), (int)0); |
There was a problem hiding this comment.
inputs=a inputs=b outputs=c [](start = 116, length = 27)
in your argument class you have this:
public string InputColumn;
public string OutputColumn
you accept only one column as input and one as output.
also names are different. #Closed
There was a problem hiding this comment.
| <ProjectReference Include="..\..\src\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" /> | ||
| <ProjectReference Include="..\..\src\Microsoft.ML.Onnx\Microsoft.ML.Onnx.csproj" /> | ||
| <ProjectReference Include="..\..\src\Microsoft.ML.TensorFlow\Microsoft.ML.TensorFlow.csproj" /> | ||
| <ProjectReference Include="..\..\src\Microsoft.ML.OnnxTransform\Microsoft.ML.OnnxTransform.csproj" /> |
There was a problem hiding this comment.
revert whole file. #Pending
There was a problem hiding this comment.
There was a problem hiding this comment.
You still have package reference on Onnx.TestModels.
In reply to: 220411170 [](ancestors = 220411170,220398733)
|
|
||
| resultDic[Transformer.Output] = new SchemaShape.Column(Transformer.Output, | ||
| Transformer.OutputType.IsKnownSizeVector ? SchemaShape.Column.VectorKind.Vector | ||
| : SchemaShape.Column.VectorKind.VariableVector, Transformer.OutputType.ItemType, false); |
There was a problem hiding this comment.
Transformer.OutputType.ItemType [](start = 64, length = 31)
Either modify OnnxUtils.CopyTo to support other types rather than float, or put NumberType.R4 here. #Closed
There was a problem hiding this comment.
| _host.AssertValue(input); | ||
| _host.Assert(typeof(T) == _outputItemRawType); | ||
|
|
||
| ValueGetter<VBuffer<T>> valuegetter = (ref VBuffer<T> dst) => |
There was a problem hiding this comment.
valuegetter [](start = 40, length = 11)
camelCase please #Closed
There was a problem hiding this comment.
| <PropertyGroup> | ||
| <TargetFramework>netstandard2.0</TargetFramework> | ||
| <IncludeInPackage>Microsoft.ML.OnnxTransform</IncludeInPackage> | ||
| <DefineConstants>CORECLR</DefineConstants> |
There was a problem hiding this comment.
This one is unnecessary #Closed
There was a problem hiding this comment.
| <description>The name of each output column should match one of the operations in the Tensorflow graph.</description> | ||
| </item> | ||
| <item> | ||
| <description>Currently, float and double are the only acceptable data types for input/output.</description> |
There was a problem hiding this comment.
double [](start = 46, length = 6)
You throw exception in case of double type as output column. #Closed
There was a problem hiding this comment.
| </list> | ||
|
|
||
| The inputs and outputs of a TensorFlow model can be obtained using the <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/README.md#inspecting-graphs"> | ||
| <code>summarize_graph</code> tool |
There was a problem hiding this comment.
Should be onnx specific or removed. #Closed
There was a problem hiding this comment.
| @@ -0,0 +1,404 @@ | |||
| using Microsoft.ML.Runtime; | |||
| </ItemGroup> | ||
| <ItemGroup> | ||
| <PackageReference Include="Microsoft.ML.TensorFlow.TestModels" Version="0.0.3-test" /> | ||
| <PackageReference Include="Microsoft.ML.Onnx.TestModels" Version="0.0.2-test" /> |
There was a problem hiding this comment.
[](start = 4, length = 80)
You don't need this line anymore.
|
Is Microsoft.ML.Scoring going to be a part of ML.NET and open source? |
Fixes issue #695
Fixes issue #892
This adds a new transform for scoring Onnx v1.2 models, leveraging an updated version of the scoring library at the link below.
https://www.nuget.org/packages/Microsoft.ML.Scoring/