Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onnx input and output data type #6469

Closed
triandco opened this issue Nov 24, 2022 · 6 comments
Closed

Onnx input and output data type #6469

triandco opened this issue Nov 24, 2022 · 6 comments
Labels
enhancement New feature or request needs-further-triage question Further information is requested

Comments

@triandco
Copy link

triandco commented Nov 24, 2022

Is your feature request related to a problem? Please describe.
I was trying to follow a guide on how to use Onnx model with dotnet. I find it difficult to understand how to translate the data type of input and output into C#. The type as display on Netron seems to be python but not quite. From the example I understand that something like int32[n,1] would be a single int32 value, however, in one of my model, I found the type float32[batch, sequence,768] which is harder to translate.

Describe the solution you'd like
Is there any further documentation on what these types are?

Describe alternatives you've considered
I opened an issue on Netron's repo asking if there is any documentation on the type and he suggested that I post an issue on MS side.

I appreciate that there is already an opening issue about improving on this documentation. However, is there any quick pointer on this particular issue?

Context
The current model was trying to run is an onnx export of msmarco-distilbert-base-tas-b from huggingface.
image

Much appreciated

@triandco triandco added the enhancement New feature or request label Nov 24, 2022
@ghost ghost added the untriaged New issue has not been triaged label Nov 24, 2022
@triandco
Copy link
Author

Related to this, I have created a repository documenting what I was trying to do. Unfortunately there is an error that stop me from running the model.

Unhandled exception. System.ArgumentOutOfRangeException: Could not determine an IDataView type and registered custom types for 
member InputIds (Parameter 'rawType')
   at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(String name, Type rawType, IEnumerable`1 attributes, Boolean& isVector, Type& itemType)
   at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(MemberInfo memberInfo, Boolean& isVector, Type& itemType)   at Microsoft.ML.Data.SchemaDefinition.Create(Type userType, Direction direction)
   at Microsoft.ML.Data.InternalSchemaDefinition.Create(Type userType, Direction direction)
   at Microsoft.ML.Data.DataViewConstructionUtils.CreateFromEnumerable[TRow](IHostEnvironment env, IEnumerable`1 data, SchemaDefinition schemaDefinition)
   at Microsoft.ML.DataOperationsCatalog.LoadFromEnumerable[TRow](IEnumerable`1 data, SchemaDefinition schemaDefinition)       
   at Library.Test.get_prediction_pipeline(String file_path, MLContext mlContext) in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Lib.fs:line 33
   at Library.Test.run(String file_path) in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Lib.fs:line 40
   at <StartupCode$App>.$Program.main@() in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Program.fs:line 5  

I'm still unsure whether this is an issue caused by my lack of understanding or it is actually a bug.

Here's my current Input and Output model

type OnnxInput() =
  [< ColumnName("input_ids") >]
  member val InputIds: int64 seq seq = [[]] with get, set

  [< ColumnName("attention_mask")>]
  member val AttentionMask: int64 seq seq = [[]] with get, set

type OnnxOutput() =
  [< ColumnName("last_hidden_state") >]
  member val LastHiddenState: float32 seq seq = [[]] with get, set

I have also tried the OnnxSequenceType attribute just to receive the same error message.

type OnnxInput() =
  [< ColumnName("input_ids"); OnnxSequenceType(typedefof<int64 seq>) >]
  member val InputIds: int64 seq seq = [[]] with get, set

  [< ColumnName("attention_mask"); OnnxSequenceType(typedefof<int64 seq>)>]
  member val AttentionMask: int64 seq seq = [[]] with get, set

type OnnxOutput() =
  [< ColumnName("last_hidden_state"); OnnxSequenceType(typedefof<float32 seq>) >]
  member val LastHiddenState: float32 seq seq = [[]] with get, set

@michaelgsharp
Copy link
Member

@luisquintanilla I know we have already discussed making this whole process more intuitive. Any quick pointers to help here though? You are much more familiar with F# than I am.

@michaelgsharp michaelgsharp added question Further information is requested needs-further-triage labels Nov 28, 2022
@luisquintanilla
Copy link
Contributor

luisquintanilla commented Nov 29, 2022

Hi @triandco

Thanks for your question. There's a few issues at hand here:

  1. ML.NET expects Tensors (N-dimensional Arrays) to be represented as one-dimensional. For example, I would change the definition of InputIds to:

    member val InputIds: int64 seq = [] with get, set
  2. ML.NET works with Single values, so you might want to perform some mapping

    [< ColumnName("input_ids");  OnnxMapType(typedefof(Int64), typedefof(Single)); 
    OnnxSequenceType(typedefof<Single>);>]
  3. ML.NET only supports only 1 unknown dimension. For example, batch and sequence are both unknown dimensions for input_ids. You know that because instead of having a number, they have a variable name. While you can set one of the dimensions as -1 to indicate unknown, you need to define the rest of the dimensions. While not the same, here is a sample that does that with the BiDAF ONNX model.

Hope this helps.

Another unsolicited tip, you can use Records with F#.

[<CLIMutable>]
type OnnxInput
{
    [<ColumnName("input_ids")>] InputIds : int64 seq
    //...
}

@yli223
Copy link

yli223 commented Jan 2, 2023

I am facing the same issue here. My data type is: type: int64[batch,sequence], and I still don't have any clue on how to make it work. Does anyone figure it out?

Thanks in advance for any help!

@triandco
Copy link
Author

triandco commented Jan 2, 2023

@yli223 I had some luck with the tips from @luisquintanilla 🙇‍♂️, it is actually very helpful in term of understanding the type. Thank you @luisquintanilla. However, I gave up in the end because I kept getting block by some other issue.

In the end, I decided to use an InferenceSession to run the model. You can see my code here

It doesn't have all the type safety of the above implementation, but it works. 😅

@yli223
Copy link

yli223 commented Jan 2, 2023

@triandco Thank you!

@triandco triandco closed this as completed Jan 2, 2023
@ghost ghost removed the untriaged New issue has not been triaged label Jan 2, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request needs-further-triage question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants