-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
P0Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.
Description
System information
- OS version/distro: Windows
Issue
I am trying to create a sample dotnet/machinelearning-samples#520 to train a model on large datasets that are stored in a file. I am using BinaryClassification trainer. While training the model I am getting the OutOfMemory exception at the Fit() method as shown below.
var model = trainingPipeLine.Fit(trainTestData.TrainSet);
complete details of error
System.FormatException
HResult=0x80131537
Message=Parsing failed with an exception: Stream reading encountered exception
Source=Microsoft.ML.Data
StackTrace:
at Microsoft.ML.Data.TextLoader.Cursor.<ParseParallel>d__33.MoveNext()
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Data.LinkedRowFilterCursorBase.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Transforms.ValueToKeyMappingTransformer.Train(IHostEnvironment env, IChannel ch, ColInfo[] infos, IDataView keyData, ColumnOptionsBase[] columns, IDataView trainingData, Boolean autoConvert)
at Microsoft.ML.Transforms.ValueToKeyMappingTransformer..ctor(IHostEnvironment env, IDataView input, ColumnOptionsBase[] columns, IDataView keyData, Boolean autoConvert)
at Microsoft.ML.Transforms.ValueToKeyMappingEstimator.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Microsoft.ML.Transforms.OneHotEncodingTransformer..ctor(ValueToKeyMappingEstimator term, IEstimator`1 toVector, IDataView input)
at Microsoft.ML.Transforms.OneHotEncodingEstimator.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at LargeDatasetsInSqlServer.Program.Main() in C:\GitRepos\Fork\ML-samples\ML-samples-LargeDataInFile\samples\csharp\getting-started\LargeDatasetsInFile\LargeDatasetsInFile\Program.cs:line 107
Inner Exception 1:
FormatException: Stream reading encountered exception
Inner Exception 2:
OutOfMemoryException: Insufficient memory to continue the execution of the program.
The data set is copied from shared folder \ct01\data\Criteo\Spark\day_0_withHeader.tsv.
Source code / logs
Please find the entire source code from the https://github.com/prathyusha12345/machinelearning-samples/tree/LargeDatasetsInFile/samples/csharp/getting-started/LargeDatasetsInFile
Metadata
Metadata
Assignees
Labels
P0Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.
