Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training step failed: AutoML should return more meaningful errors which can be understood easily by users #4640

Open
vardeg2017 opened this issue Jun 5, 2019 · 4 comments
Assignees
Labels

Comments

@vardeg2017
Copy link

@vardeg2017 vardeg2017 commented Jun 5, 2019

Model Builder 16.0.1905.641
OS Windows 10 Pro 17134.765
VS Studion 2019 16.1.1
I made very simple sample - XOR data set. Trying with csv format with "," seporated and tsv - no matter.
Here is my tsv data set:
x y z
1 0 1
0 1 1
1 1 0
0 0 0

When i choose binary-classification on a trin step i got this:
Inferring Columns ... Creating Data loader ... Loading data ... Exploring multiple ML algorithms and settings to find you the best model for ML task: binary-classification For further learning check: https://aka.ms/mlnet-cli [Source=AutoML, Kind=Trace] Channel started | Trainer Accuracy AUC AUPRC F1-score Duration #Iteration | Parameter name: PosSample [Source=AutoML, Kind=Trace] Evaluating pipeline xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=AveragedPerceptronBinary{} cache=+ [Source=AutoML, Kind=Error] Pipeline crashed: xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=AveragedPerceptronBinary{} cache=+ . Exception: System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data at Microsoft.ML.Data.EvaluatorBase1.AucAggregatorBase1.ComputeWeightedAuc(Double& unweighted) at Microsoft.ML.Data.BinaryClassifierEvaluator.<>c__DisplayClass32_0.<GetAggregatorConsolidationFuncs>b__0(UInt32 stratColKey, ReadOnlyMemory1 stratColVal, Aggregator agg)
at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish()
at Microsoft.ML.Data.EvaluatorBase1.ProcessData(IDataView data, RoleMappedSchema schema, Func2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries)
at Microsoft.ML.Data.EvaluatorBase1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data) at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger)
at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn)
[Source=AutoML, Kind=Trace] 1 ­ҐязЁб«® 00:00:00.6932896 xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=AveragedPerceptronBinary{} cache=+
|1 AveragedPerceptronBinary ­ҐязЁб«® ­ҐязЁб«® ­ҐязЁб«® ­ҐязЁб«® 0,7 0 |
System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data
Parameter name: PosSample
at Microsoft.ML.Data.EvaluatorBase1.AucAggregatorBase1.ComputeWeightedAuc(Double& unweighted)
at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish()
at Microsoft.ML.Data.BinaryClassifierEvaluator.<>c__DisplayClass32_0.b__0(UInt32 stratColKey, ReadOnlyMemory1 stratColVal, Aggregator agg) at Microsoft.ML.Data.EvaluatorBase1.ProcessData(IDataView data, RoleMappedSchema schema, Func2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries) at Microsoft.ML.Data.EvaluatorBase1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data)
at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel)
at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn)
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) [Source=AutoML, Kind=Trace] Evaluating pipeline xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=SdcaLogisticRegressionBinary{} cache=+ [Source=AutoML, Kind=Error] Pipeline crashed: xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=SdcaLogisticRegressionBinary{} cache=+ . Exception: System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data Parameter name: PosSample at Microsoft.ML.Data.BinaryClassifierEvaluator.<>c__DisplayClass32_0.<GetAggregatorConsolidationFuncs>b__0(UInt32 stratColKey, ReadOnlyMemory1 stratColVal, Aggregator agg)
at Microsoft.ML.Data.EvaluatorBase1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data) at Microsoft.ML.Data.EvaluatorBase1.ProcessData(IDataView data, RoleMappedSchema schema, Func2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries) at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel) at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger)
at Microsoft.ML.Data.EvaluatorBase1.AucAggregatorBase1.ComputeWeightedAuc(Double& unweighted)
at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish()
[Source=AutoML, Kind=Trace] 2 ­ҐязЁб«® 00:00:06.9448234 xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=SdcaLogisticRegressionBinary{} cache=+
|2 SdcaLogisticRegressionBinary ­ҐязЁб«® ­ҐязЁб«® ­ҐязЁб«® ­ҐязЁб«® 7,0 0 |
System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data
Parameter name: PosSample
at Microsoft.ML.Data.EvaluatorBase1.AucAggregatorBase1.ComputeWeightedAuc(Double& unweighted)
at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish()
at Microsoft.ML.Data.BinaryClassifierEvaluator.<>c__DisplayClass32_0.b__0(UInt32 stratColKey, ReadOnlyMemory1 stratColVal, Aggregator agg) at Microsoft.ML.Data.EvaluatorBase1.ProcessData(IDataView data, RoleMappedSchema schema, Func2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries) at Microsoft.ML.Data.EvaluatorBase1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data)
at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel)
at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn)
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) [Source=AutoML, Kind=Trace] Evaluating pipeline xf=ColumnConcatenating{ col=Features:x,y} tr=LightGbmBinary{} cache=- [Source=AutoML, Kind=Error] Pipeline crashed: xf=ColumnConcatenating{ col=Features:x,y} tr=LightGbmBinary{} cache=- . Exception: System.ArgumentNullException: Value cannot be null. Parameter name: items at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName) at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length) at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.<CreateTreeEnsembleFromInternalDataStructure>b__5_0(InternalRegressionTree tree) at System.Linq.Enumerable.SelectListIterator2.ToList()
at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at Microsoft.ML.Trainers.FastTree.TreeEnsemble1..ctor(IEnumerable1 trees, IEnumerable1 treeWeights, Double bias)
at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure()
at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor()
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger)
[Source=AutoML, Kind=Trace] 3 ­ҐязЁб«® 00:00:00.1836263 xf=ColumnConcatenating{ col=Features:x,y} tr=LightGbmBinary{} cache=-
|3 LightGbmBinary ­ҐязЁб«® ­ҐязЁб«® ­ҐязЁб«® ­ҐязЁб«® 0,2 0 |
System.ArgumentNullException: Value cannot be null.
at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName)
Parameter name: items
at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length)
at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree)
at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.b__5_0(InternalRegressionTree tree)
at System.Linq.Enumerable.SelectListIterator2.ToList() at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source)
at Microsoft.ML.Trainers.FastTree.TreeEnsemble1..ctor(IEnumerable1 trees, IEnumerable1 treeWeights, Double bias) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure() at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor() at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input)
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) Exception occured while exploring pipelines: Training failed with the exception: System.ArgumentNullException: Value cannot be null. Parameter name: items at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName) at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length) at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.<CreateTreeEnsembleFromInternalDataStructure>b__5_0(InternalRegressionTree tree) at System.Linq.Enumerable.SelectListIterator2.ToList()
at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at Microsoft.ML.Trainers.FastTree.TreeEnsemble1..ctor(IEnumerable1 trees, IEnumerable1 treeWeights, Double bias)
at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure()
at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor()
at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger)
System.InvalidOperationException: Training failed with the exception: System.ArgumentNullException: Value cannot be null.
Parameter name: items
at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName)
at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length)
at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree)
at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.b__5_0(InternalRegressionTree tree)
at System.Linq.Enumerable.SelectListIterator2.ToList() at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source)
at Microsoft.ML.Trainers.FastTree.TreeEnsemble1..ctor(IEnumerable1 trees, IEnumerable1 treeWeights, Double bias) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure() at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor() at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input)
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) at Microsoft.ML.CLI.CodeGenerator.CodeGenerationHelper.GenerateCode() at Microsoft.ML.CLI.Program.<>c__DisplayClass1_0.<Main>b__0(NewCommandSettings options) Please see the log file for more info. Exiting ...

@rustd

This comment has been minimized.

Copy link

@rustd rustd commented Jun 10, 2019

Is this a CLI issue or AutoML issue?

@greazer

This comment has been minimized.

Copy link

@greazer greazer commented Jun 25, 2019

@rustd. This is a problem manifesting in auto-train (i.e. the CLI). The hint for the root cause in the output is: "Exception: System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data"

Copying a bunch more rows to the dataset resolves the issue. Therefore, I believe the only problem here is that there just isn't enough data in the input file to do a proper analysis.

@greazer greazer closed this Jun 25, 2019
@rustd

This comment has been minimized.

Copy link

@rustd rustd commented Jun 26, 2019

@greazer it would good to parse the exception and show it first in the output window so it is more discoverable. Thoughts?

@rustd rustd reopened this Jun 26, 2019
@greazer

This comment has been minimized.

Copy link

@greazer greazer commented Jun 26, 2019

@rustd Yeah, actually, I do think that auto-train should catch errors like this an return a more human friendly error. However, in this particular case (and probably for many others), it's not clear what the error means to the user. I am just suspecting that there's not enough data, therefore the AUC can't be calculated. But I'm not sure that's the only reason AUC can't be calculated. Nor would the user understand it. Therefore, it seems that the automl team should capture this type of error and return a more reasonable error to people calling it (like auto-train).

@rustd rustd transferred this issue from dotnet/machinelearning-modelbuilder Jun 26, 2019
@rustd rustd changed the title Training step failed Training step failed: AutoML should return more meaningful errors which can be understood easily by users Jun 26, 2019
@harishsk harishsk transferred this issue from dotnet/machinelearning Jan 10, 2020
@harishsk harishsk transferred this issue from dotnet/machinelearning-modelbuilder Jan 10, 2020
@antoniovs1029 antoniovs1029 added api P3 usability and removed api labels Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.