Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in ML.net training #4464

Open
dmoise2 opened this issue Nov 9, 2019 · 3 comments
Assignees

Comments

@dmoise2
Copy link

@dmoise2 dmoise2 commented Nov 9, 2019

System information

  • Windows 10 Home:
  • .NET Core 2.1.802:

Issue

  • **First time running ML.NET. Set the Database & started Training.
  • **What happened? – training stopped “Failed – See more in Output Pane.”
  • **What did you expect? – Training to Complete

Source code / logs

| Trainer MicroAccuracy MacroAccuracy Duration #Iteration |
Schema mismatch for score column 'Score': expected vector of two or more items of type Single, got Vector<Single, 1>
Parameter name: schema
Must be at least 2.
Parameter name: numClasses
Schema mismatch for score column 'Score': expected vector of two or more items of type Single, got Vector<Single, 1>
Parameter name: schema
Training failed with the exception: System.ArgumentOutOfRangeException: Schema mismatch for score column 'Score': expected vector of two or more items of type Single, got Vector<Single, 1>
Parameter name: schema
at Microsoft.ML.Data.MulticlassClassificationEvaluator.CheckScoreAndLabelTypes(RoleMappedSchema schema)
at Microsoft.ML.Data.EvaluatorBase1.CheckColumnTypes(RoleMappedSchema schema) at Microsoft.ML.Data.EvaluatorBase1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data)
at Microsoft.ML.Data.MulticlassClassificationEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel)
at Microsoft.ML.AutoML.MultiMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn)
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger)

@dmoise2

This comment has been minimized.

Copy link
Author

@dmoise2 dmoise2 commented Nov 9, 2019

Ran it again after selecting a different table column as my "column to predict". Failed again with different output

| Trainer MicroAccuracy MacroAccuracy Duration #Iteration |
Splitter/consolidator worker encountered exception while consuming source data
Splitter/consolidator worker encountered exception while consuming source data
Splitter/consolidator worker encountered exception while consuming source data
Training failed with the exception: System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data ---> System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data ---> System.FormatException: Parsing failed with an exception: Could not parse value Shenzhen University in line 2975, column NoDates ---> System.InvalidOperationException: Could not parse value Shenzhen University in line 2975, column NoDates
at Microsoft.ML.Data.TextLoader.Parser.ProcessOne(FieldSet vs, ColInfo info, ColumnPipe v, Int32 irow, Int64 line)
at Microsoft.ML.Data.TextLoader.Parser.ProcessItems(RowSet rows, Int32 irow, Boolean[] active, FieldSet fields, Int32 srcLim, Int64 line)
at Microsoft.ML.Data.TextLoader.Parser.ParseRow(RowSet rows, Int32 irow, Helper helper, Boolean[] active, String path, Int64 line, String text)
at Microsoft.ML.Data.TextLoader.Cursor.ParallelState.Parse(Int32 tid)
at Microsoft.ML.Data.TextLoader.Cursor.ParallelState.ThreadProc(Object obj)
--- End of inner exception stack trace ---
at Microsoft.ML.Data.TextLoader.Cursor.d__33.MoveNext()
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Data.LinkedRowFilterCursorBase.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass9_0.b__1()
--- End of inner exception stack trace ---
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass5_1.b__2()
--- End of inner exception stack trace ---
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext()
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.GetMetainfo(IChannel ch, Factory factory, Int32& numRow, Single[]& labels, Single[]& weights, Int32[]& groups) at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.LoadTrainingData(IChannel ch, RoleMappedData trainData, CategoricalMetaData& catMetaData)
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input)
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger)

@justinormont

This comment has been minimized.

Copy link
Member

@justinormont justinormont commented Nov 11, 2019

@dmoise2: Can you post a sample of your dataset? How are you calling AutoML (Model Builder, CLI, or API)?

System.FormatException: Parsing failed with an exception: Could not parse value Shenzhen University in line 2975, column NoDates ---> System.InvalidOperationException: Could not parse value Shenzhen University in line 2975, column NoDates

You can look at line 2975 in your dataset to see what's different about it. Commonly it's an issue of non-supported characters in a quoted string (eg: quotes or newlines). Our TextLoader fails to read the row if it includes these.

@dmoise2

This comment has been minimized.

Copy link
Author

@dmoise2 dmoise2 commented Nov 12, 2019

The data does have a number of characters in it. I can create a version without those. This does come from text people enter. Is it the longer term expectation that we would have to eliminate those types of characters in our data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.