Onnx input and output data type #6469

triandco · 2022-11-24T03:14:27Z

Is your feature request related to a problem? Please describe.
I was trying to follow a guide on how to use Onnx model with dotnet. I find it difficult to understand how to translate the data type of input and output into C#. The type as display on Netron seems to be python but not quite. From the example I understand that something like int32[n,1] would be a single int32 value, however, in one of my model, I found the type float32[batch, sequence,768] which is harder to translate.

Describe the solution you'd like
Is there any further documentation on what these types are?

Describe alternatives you've considered
I opened an issue on Netron's repo asking if there is any documentation on the type and he suggested that I post an issue on MS side.

I appreciate that there is already an opening issue about improving on this documentation. However, is there any quick pointer on this particular issue?

Context
The current model was trying to run is an onnx export of msmarco-distilbert-base-tas-b from huggingface.

Much appreciated

The text was updated successfully, but these errors were encountered:

triandco · 2022-11-24T07:35:12Z

Related to this, I have created a repository documenting what I was trying to do. Unfortunately there is an error that stop me from running the model.

Unhandled exception. System.ArgumentOutOfRangeException: Could not determine an IDataView type and registered custom types for 
member InputIds (Parameter 'rawType')
   at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(String name, Type rawType, IEnumerable`1 attributes, Boolean& isVector, Type& itemType)
   at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(MemberInfo memberInfo, Boolean& isVector, Type& itemType)   at Microsoft.ML.Data.SchemaDefinition.Create(Type userType, Direction direction)
   at Microsoft.ML.Data.InternalSchemaDefinition.Create(Type userType, Direction direction)
   at Microsoft.ML.Data.DataViewConstructionUtils.CreateFromEnumerable[TRow](IHostEnvironment env, IEnumerable`1 data, SchemaDefinition schemaDefinition)
   at Microsoft.ML.DataOperationsCatalog.LoadFromEnumerable[TRow](IEnumerable`1 data, SchemaDefinition schemaDefinition)       
   at Library.Test.get_prediction_pipeline(String file_path, MLContext mlContext) in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Lib.fs:line 33
   at Library.Test.run(String file_path) in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Lib.fs:line 40
   at <StartupCode$App>.$Program.main@() in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Program.fs:line 5

I'm still unsure whether this is an issue caused by my lack of understanding or it is actually a bug.

Here's my current Input and Output model

type OnnxInput() =
  [< ColumnName("input_ids") >]
  member val InputIds: int64 seq seq = [[]] with get, set

  [< ColumnName("attention_mask")>]
  member val AttentionMask: int64 seq seq = [[]] with get, set

type OnnxOutput() =
  [< ColumnName("last_hidden_state") >]
  member val LastHiddenState: float32 seq seq = [[]] with get, set

I have also tried the OnnxSequenceType attribute just to receive the same error message.

type OnnxInput() =
  [< ColumnName("input_ids"); OnnxSequenceType(typedefof<int64 seq>) >]
  member val InputIds: int64 seq seq = [[]] with get, set

  [< ColumnName("attention_mask"); OnnxSequenceType(typedefof<int64 seq>)>]
  member val AttentionMask: int64 seq seq = [[]] with get, set

type OnnxOutput() =
  [< ColumnName("last_hidden_state"); OnnxSequenceType(typedefof<float32 seq>) >]
  member val LastHiddenState: float32 seq seq = [[]] with get, set

michaelgsharp · 2022-11-28T19:47:50Z

@luisquintanilla I know we have already discussed making this whole process more intuitive. Any quick pointers to help here though? You are much more familiar with F# than I am.

luisquintanilla · 2022-11-29T19:59:18Z

Hi @triandco

Thanks for your question. There's a few issues at hand here:

ML.NET expects Tensors (N-dimensional Arrays) to be represented as one-dimensional. For example, I would change the definition of InputIds to:
```
member val InputIds: int64 seq = [] with get, set
```

ML.NET works with Single values, so you might want to perform some mapping

[< ColumnName("input_ids");  OnnxMapType(typedefof(Int64), typedefof(Single)); 
OnnxSequenceType(typedefof<Single>);>]

ML.NET only supports only 1 unknown dimension. For example, batch and sequence are both unknown dimensions for input_ids. You know that because instead of having a number, they have a variable name. While you can set one of the dimensions as -1 to indicate unknown, you need to define the rest of the dimensions. While not the same, here is a sample that does that with the BiDAF ONNX model.

Hope this helps.

Another unsolicited tip, you can use Records with F#.

[<CLIMutable>]
type OnnxInput
{
    [<ColumnName("input_ids")>] InputIds : int64 seq
    //...
}

yli223 · 2023-01-02T03:53:00Z

I am facing the same issue here. My data type is: type: int64[batch,sequence], and I still don't have any clue on how to make it work. Does anyone figure it out?

Thanks in advance for any help!

triandco · 2023-01-02T04:06:02Z

@yli223 I had some luck with the tips from @luisquintanilla 🙇‍♂️, it is actually very helpful in term of understanding the type. Thank you @luisquintanilla. However, I gave up in the end because I kept getting block by some other issue.

In the end, I decided to use an InferenceSession to run the model. You can see my code here

It doesn't have all the type safety of the above implementation, but it works. 😅

yli223 · 2023-01-02T04:10:24Z

@triandco Thank you!

triandco added the enhancement New feature or request label Nov 24, 2022

ghost added the untriaged New issue has not been triaged label Nov 24, 2022

michaelgsharp added question Further information is requested needs-further-triage labels Nov 28, 2022

triandco closed this as completed Jan 2, 2023

ghost removed the untriaged New issue has not been triaged label Jan 2, 2023

ghost locked as resolved and limited conversation to collaborators Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onnx input and output data type #6469

Onnx input and output data type #6469

triandco commented Nov 24, 2022 •

edited

Loading

triandco commented Nov 24, 2022

michaelgsharp commented Nov 28, 2022

luisquintanilla commented Nov 29, 2022 •

edited

Loading

yli223 commented Jan 2, 2023

triandco commented Jan 2, 2023

yli223 commented Jan 2, 2023

Onnx input and output data type #6469

Onnx input and output data type #6469

Comments

triandco commented Nov 24, 2022 • edited Loading

triandco commented Nov 24, 2022

michaelgsharp commented Nov 28, 2022

luisquintanilla commented Nov 29, 2022 • edited Loading

yli223 commented Jan 2, 2023

triandco commented Jan 2, 2023

yli223 commented Jan 2, 2023

triandco commented Nov 24, 2022 •

edited

Loading

luisquintanilla commented Nov 29, 2022 •

edited

Loading