Branch: master
Find file History
CESARDELATORRE Migration to v0.10 of all C# samples (#242)
* BikeSharing sample migrated to 0.10

* Added cultureInfo.InvariantCulture to resolve issue #227

* added Build props file into other samples which are not migrated yet. so that build will not file on remote.

* Did the following changes.
1.Changed the version to 0.10 for global build.props file.
2.Changed the Credit card sample with the following to fix breaking changes.
a. Replace BinaryClassificationContext with BinaryClassificationCatalog
b. Added parameter names to input parameter of ML.Net API calls.
c. Added AppendCacheCheckpoint(mlContext) in the pipeLine.
Resolved some compilation errors according to ML.Net version changes

* Migrated sample Clustering_CustomerSegmentation with the following changes
1.Replaced CreateTextReader with ReadFromTextFile.
2.Changed the order of columns while creating the estimator
3.Added parameter names in API calls.
4.Accessed Fetures and score variables with DefaultColumnNames static class

* Minor changes in Customer Segmentation project
1.Updated Readme File
2.Removed AppendCacheCheckPoint() method as the program is running fine with F5 option in visual studio.

* Minor changes to Credit card Fraud Detection.
1.Changed readme file with version to 0.10 and code changes.
2.Removed AppendCahceCheckPoint(mlContext) as the program runs fine with out delay becuase of F5 option.

* Minor changes to Bike Sharing Demand Sample.
Replaced version with 0.10 in Readme file

* Migrated GitHubLabeler sample to v0.10 with the following changes.
1.Added parameter names in ML.Net API calls
2.strong typed label,features,Predicted label strings with DefaultColumnNames class

* Migrated MultiClassClassification_Iris to v0.10.
-Changed the readme file by updating version information to 0.10
-Deleted local build.props file which has 0.9 version
-Repalced Setosa name with Virginica in  the console.writeLine as we are testing for 'Virginica'

* Migrated Clustering_Iris to v0.10 with the following changes.
1. Replaced string values with nameof(classnme.variableName).
2.Replaced string values with DefaultColumnNames calss for "Features"
3.Changed Readme file.
4.Removed Build.props file.

* Migrated MovieRecommendationE2E sample.
1.updated Readme file with version inforamtion 0.10
2.Replaced the fields with data class while defining the schema and reading training data.
3.Specifiedparameter names in API calls.
4.Specified "Feature" strings with DefaultColumnNames class.

* Migrated Sentiment Analysis sample
1.Used DefaultColumnNames Class to specify/access features, score,label
2.Used parameter names in API calls.
3.Removed the Build.props file which has version 0.9
4. Changed Readme file.

* Migrated ProductRecommendation Sample to v0.10 with the following changes.
1.Added folder with name Common and linked ConsoleHelper file.
2.Removed Build.Props file which ahs version 0.9
3.Changed the code according to ML.Net 0.10 API changes.
4.Specified parameter names in the ML.Net API calls, etc.
5.Changed Reacme file.

* Migrated SalesForecast to v0.10 by fixing the below breaking changes and some refactoring.
1. Removed Build.Props file which ahs version 0.9
3.Changed the code according to ML.Net 0.10 API changes.
4.Specified parameter names in the ML.Net API calls, used DefaultColumnNames class to access constant names like Feature,Label, score etc.
5.Changed Readme file.

* Migrated TaxiFarePedication Sample to 0.10 with the follwoing changes:
1.Changed the order of input and output columns in Ml.Net API calls.
2.Specified parameternames in API calls.
3.Refactored code using DefaultColumnNames class and nameof().
4.Changed Readme file.
5.Removed Build.Props file which has version 0.9

* Migrated MovieRecommendation Sample to v0.10. with the below changes.
1.Changed code according Ml.Net API calls.
2.Used DefaultColumnNames class to specify common ouput types like Features, score,label etc.
3.Removed Build.props file which has version 0.9
4. Changed Readme file
5. seperated DataProcessingPipeLine and TrainingPipeLine.
6. Changed the Nuget package name from MatrixFactorization to Recommender.

* Migrated HeartDisease sample to v0.10 with the following changes.
1.Removed Build.props file which ahs version 0.9
2.Changed Readme file.
3.Used nameof and DefaultColumnNames instead of strings.
4.Specified paramter names in Ml.Net API calls.

* Migrated MNIST sample to v0.10 with the following changes.
1.Refactored code
2.Changed Readme file.
3.Removed Build.Props file.
4.Changed the csproj file to refer version from global file.
5.Changed API calls according to syntax changes Ml.Net API.

* Minor changes to BikaSharinDemand sample.
1.used nameof(Classname.Fieldname) instead of "fieldname"
2.Used DefaultColumnNames class to access Fetaures,label,score etc.

* Migrated TenslorFlow Scorer to v0.10 with the following changes.
1.Specified paramternames, reordered parameters
2. Refactored code to avoid using strings directly in method calls so that we don't get exceptions runtime.
3.Updated ReadMe file.
4.Removed Build.Props file which has version 0.9

* Migrated Changes to TensorFlow Estimator
1. Fixed the breaking changes like changing from "MulticlassClassificationContext" to  "MulticlassClassificationCatalog"
2.Changed the parameter orders in the API calls and specified parameter names.
3.Changed ReadMe file.
4.Removed Build.Props file which has version 0.9

* Pushing v0.10 solution file

* Migrated SpamDetection sample to v0.10 with the fowllowing changes.
1.used ReadformText file to define the schema,train data in a single line instead of reader.
2.Changed Readme File.
3.Removed Build.Props file which had version 0.9
4. Refactored code and specified parameter names in API calls.

* Build file update and remove 0.7,0.9 solutions:
1.Updated Build file to configure Mnist sample
2.V0.10 sample changed when spam detection sample is added
3.Removed 0.7 and 0.9 samples
Latest commit ee509d4 Feb 8, 2019

README.md

eShopDashboardML - Sales forecasting

ML.NET version API type Status App Type Data type Scenario ML Task Algorithms
v0.10 Dynamic API Up-to-date ASP.NET Core web app and Console app SQL Server and .csv files Sales forecast Regression FastTreeTweedie Regression

eShopDashboardML is a web app with Sales Forecast predictions (per product and per country) using Microsoft Machine Learning .NET (ML.NET).

Overview

This end-to-end sample app highlights the usage of ML.NET API by showing the following topics:

  1. How to train, build and generate ML models
  2. How to predict the next month of Sales Forecasts by using the trained ML model

The app is also using a SQL Server database for regular product catalog and orders info, as many typical web apps using SQL Server. In this case, since it is an example, it is, by default, using a localdb SQL database so there's no need to setup a real SQL Server. The localdb database will be created, along with sample populated data, the first time you run the web app.

If you want to use a real SQL Server or Azure SQL Database, you just need to change the connection string in the app.

Here's a sample screenshot of the web app and one of the forecast predictions:

image

Walkthroughs on how to set it up

Learn how to set it up in Visual Studio plus further explanations on the code:

Walkthrough on the implemented ML.NET code

Problem

This problem is centered around country and product forecasting based on previous sales

DataSet

To solve this problem, you build two independent ML models that take the following datasets as input:

Data Set columns
products stats next, productId, year, month, units, avg, count, max, min, prev
country stats next, country, year, month, max, min, std, count, sales, med, prev

ML task - Regression

The ML Task for this sample is a Regression, which is a supervised machine learning task that is used to predict the value of the next period (in this case the sales prediction) from a set of related features/variables.

Solution

To solve this problem, first we will build the ML models while training each model on existing data, evaluate how good it is, and finally you consume the model to predict sales.

Note that the sample implements two independent models:

  • Model to predict product's demand forecast for the next period (month)
  • Model to predict country's sales forecast for the next period (month)

However, when learning/researching the sample, you can focus just on one of the scenarios/models.

Build -> Train -> Evaluate -> Consume

1. Build Model

STEP 1: Define the schema of data in a class type and refer that type while loading data using TextLoader. Here the class type is ProductData.

Schema in a class type

 public class ProductData
    {
        // next,productId,year,month,units,avg,count,max,min,prev
        //The index of column in LoadColumn(int index) should be matched with the position of columns in file.
        [LoadColumn(0)]
        public float next;

        [LoadColumn(1)]
        public string productId;

        [LoadColumn(2)]
        public float year;

        [LoadColumn(3)]
        public float month;

        [LoadColumn(4)]
        public float units;

        [LoadColumn(5)]
        public float avg;

        [LoadColumn(6)]
        public float count;

        [LoadColumn(7)]
        public float max;

        [LoadColumn(8)]
        public float min;

        [LoadColumn(9)]
        public float prev;
    }

Model build and train

Load the dataset into the DataView.


var trainingDataView = mlContext.Data.ReadFromTextFile<ProductData>(dataPath, hasHeader: true, separatorChar:',');

Build the pipeline transformations and to specify what trainer/algorithm you are going to use. In this case you are doing the following transformations:

  • Concat current features to a new Column named NumFeatures
  • Transform productId using one-hot encoding
  • Concat all generated fetures in one column named 'Features'
  • Copy next column to rename it to "Label"
  • Specify the "Fast Tree Tweedie" Trainer as the algorithm to apply to the model

You can load the dataset either before or after designing the pipeline. Although this step is just configuration, it is lazy and won't be loaded until training the model in the next step.

var trainer = mlContext.Regression.Trainers.FastTreeTweedie(labelColumn: DefaultColumnNames.Label, featureColumn: DefaultColumnNames.Features);

var trainingPipeline = mlContext.Transforms.Concatenate(outputColumnName: NumFeatures, nameof(ProductData.year), nameof(ProductData.month), nameof(ProductData.units), nameof(ProductData.avg), nameof(ProductData.count), 
                nameof(ProductData.max), nameof(ProductData.min), nameof(ProductData.prev) )
                .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: CatFeatures, inputColumnName: nameof(ProductData.productId)))
                .Append(mlContext.Transforms.Concatenate(outputColumnName: DefaultColumnNames.Features, NumFeatures, CatFeatures))
                .Append(mlContext.Transforms.CopyColumns(outputColumnName: DefaultColumnNames.Label, inputColumnName: nameof(ProductData.next)))
                .Append(trainer);

2. Evaluate model with cross-validation

In this case, the evaluation of the model is performed before training the model with a cross-validation approach, so you obtain metrics telling you how good is the accuracy of the model.

var crossValidationResults = mlContext.Regression.CrossValidate(data:trainingDataView, estimator:trainingPipeline, numFolds: 6, labelColumn: DefaultColumnNames.Label);
            
ConsoleHelper.PrintRegressionFoldsAverageMetrics(trainer.ToString(), crossValidationResults);

3. Train model

After building the pipeline, we train the forecast model by fitting or using the training data with the selected algorithm. In that step, the model is built, trained and returned as an object:

var model = trainingPipeline.Fit(trainingDataView);

4. Save the model for later comsumption from end-user apps

Once the model is created and evaluated, you can save it into a .ZIP file which could be consumed by any end-user application with the following code:

using (var file = File.OpenWrite(outputModelPath))
    model.SaveTo(mlContext, file);

5. Try the model with a simple test prediction

Basically, you can load the model from the .ZIP file create some sample data, create the "prediction function" and finally you make a prediction.

ITransformer trainedModel;
using (var stream = File.OpenRead(outputModelPath))
{
    trainedModel = mlContext.Model.Load(stream);
}

var predictionEngine = trainedModel.CreatePredictionEngine<ProductData, ProductUnitPrediction>(mlContext);

Console.WriteLine("** Testing Product 1 **");

// Build sample data
ProductData dataSample = new ProductData()
{
    productId = "263",
    month = 10,
    year = 2017,
    avg = 91,
    max = 370,
    min = 1,
    count = 10,
    prev = 1675,
    units = 910
};

// Predict the nextperiod/month forecast to the one provided
ProductUnitPrediction prediction = predictionEngine.Predict(dataSample);
Console.WriteLine($"Product: {dataSample.productId}, month: {dataSample.month + 1}, year: {dataSample.year} - Real value (units): 551, Forecast Prediction (units): {prediction.Score}");

Citation

eShopDashboardML dataset is based on a public Online Retail Dataset from UCI: http://archive.ics.uci.edu/ml/datasets/online+retail

Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197–208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).