Branch: master
Find file History
CESARDELATORRE Migration to v0.10 of all C# samples (#242)
* BikeSharing sample migrated to 0.10

* Added cultureInfo.InvariantCulture to resolve issue #227

* added Build props file into other samples which are not migrated yet. so that build will not file on remote.

* Did the following changes.
1.Changed the version to 0.10 for global build.props file.
2.Changed the Credit card sample with the following to fix breaking changes.
a. Replace BinaryClassificationContext with BinaryClassificationCatalog
b. Added parameter names to input parameter of ML.Net API calls.
c. Added AppendCacheCheckpoint(mlContext) in the pipeLine.
Resolved some compilation errors according to ML.Net version changes

* Migrated sample Clustering_CustomerSegmentation with the following changes
1.Replaced CreateTextReader with ReadFromTextFile.
2.Changed the order of columns while creating the estimator
3.Added parameter names in API calls.
4.Accessed Fetures and score variables with DefaultColumnNames static class

* Minor changes in Customer Segmentation project
1.Updated Readme File
2.Removed AppendCacheCheckPoint() method as the program is running fine with F5 option in visual studio.

* Minor changes to Credit card Fraud Detection.
1.Changed readme file with version to 0.10 and code changes.
2.Removed AppendCahceCheckPoint(mlContext) as the program runs fine with out delay becuase of F5 option.

* Minor changes to Bike Sharing Demand Sample.
Replaced version with 0.10 in Readme file

* Migrated GitHubLabeler sample to v0.10 with the following changes.
1.Added parameter names in ML.Net API calls
2.strong typed label,features,Predicted label strings with DefaultColumnNames class

* Migrated MultiClassClassification_Iris to v0.10.
-Changed the readme file by updating version information to 0.10
-Deleted local build.props file which has 0.9 version
-Repalced Setosa name with Virginica in  the console.writeLine as we are testing for 'Virginica'

* Migrated Clustering_Iris to v0.10 with the following changes.
1. Replaced string values with nameof(classnme.variableName).
2.Replaced string values with DefaultColumnNames calss for "Features"
3.Changed Readme file.
4.Removed Build.props file.

* Migrated MovieRecommendationE2E sample.
1.updated Readme file with version inforamtion 0.10
2.Replaced the fields with data class while defining the schema and reading training data.
3.Specifiedparameter names in API calls.
4.Specified "Feature" strings with DefaultColumnNames class.

* Migrated Sentiment Analysis sample
1.Used DefaultColumnNames Class to specify/access features, score,label
2.Used parameter names in API calls.
3.Removed the Build.props file which has version 0.9
4. Changed Readme file.

* Migrated ProductRecommendation Sample to v0.10 with the following changes.
1.Added folder with name Common and linked ConsoleHelper file.
2.Removed Build.Props file which ahs version 0.9
3.Changed the code according to ML.Net 0.10 API changes.
4.Specified parameter names in the ML.Net API calls, etc.
5.Changed Reacme file.

* Migrated SalesForecast to v0.10 by fixing the below breaking changes and some refactoring.
1. Removed Build.Props file which ahs version 0.9
3.Changed the code according to ML.Net 0.10 API changes.
4.Specified parameter names in the ML.Net API calls, used DefaultColumnNames class to access constant names like Feature,Label, score etc.
5.Changed Readme file.

* Migrated TaxiFarePedication Sample to 0.10 with the follwoing changes:
1.Changed the order of input and output columns in Ml.Net API calls.
2.Specified parameternames in API calls.
3.Refactored code using DefaultColumnNames class and nameof().
4.Changed Readme file.
5.Removed Build.Props file which has version 0.9

* Migrated MovieRecommendation Sample to v0.10. with the below changes.
1.Changed code according Ml.Net API calls.
2.Used DefaultColumnNames class to specify common ouput types like Features, score,label etc.
3.Removed Build.props file which has version 0.9
4. Changed Readme file
5. seperated DataProcessingPipeLine and TrainingPipeLine.
6. Changed the Nuget package name from MatrixFactorization to Recommender.

* Migrated HeartDisease sample to v0.10 with the following changes.
1.Removed Build.props file which ahs version 0.9
2.Changed Readme file.
3.Used nameof and DefaultColumnNames instead of strings.
4.Specified paramter names in Ml.Net API calls.

* Migrated MNIST sample to v0.10 with the following changes.
1.Refactored code
2.Changed Readme file.
3.Removed Build.Props file.
4.Changed the csproj file to refer version from global file.
5.Changed API calls according to syntax changes Ml.Net API.

* Minor changes to BikaSharinDemand sample.
1.used nameof(Classname.Fieldname) instead of "fieldname"
2.Used DefaultColumnNames class to access Fetaures,label,score etc.

* Migrated TenslorFlow Scorer to v0.10 with the following changes.
1.Specified paramternames, reordered parameters
2. Refactored code to avoid using strings directly in method calls so that we don't get exceptions runtime.
3.Updated ReadMe file.
4.Removed Build.Props file which has version 0.9

* Migrated Changes to TensorFlow Estimator
1. Fixed the breaking changes like changing from "MulticlassClassificationContext" to  "MulticlassClassificationCatalog"
2.Changed the parameter orders in the API calls and specified parameter names.
3.Changed ReadMe file.
4.Removed Build.Props file which has version 0.9

* Pushing v0.10 solution file

* Migrated SpamDetection sample to v0.10 with the fowllowing changes.
1.used ReadformText file to define the schema,train data in a single line instead of reader.
2.Changed Readme File.
3.Removed Build.Props file which had version 0.9
4. Refactored code and specified parameter names in API calls.

* Build file update and remove 0.7,0.9 solutions:
1.Updated Build file to configure Mnist sample
2.V0.10 sample changed when spam detection sample is added
3.Removed 0.7 and 0.9 samples
Latest commit ee509d4 Feb 8, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
ProductRecommender Migration to v0.10 of all C# samples (#242) Feb 8, 2019
ProductRecommender.sln Adding Product Recommendation sample Dec 4, 2018
Readme.md Migration to v0.10 of all C# samples (#242) Feb 8, 2019

Readme.md

Product Recommendation - Matrix Factorization problem sample

ML.NET version API type Status App Type Data type Scenario ML Task Algorithms
0.10 Dynamic API Updated to v0.9 Console app .txt files Recommendation Matrix Factorization MatrixFactorizationTrainer (One Class)

In this sample, you can see how to use ML.NET to build a product recommendation scenario.

The style of recommendation in this sample is based upon the co-purchase scenario or products frequently bought together which means it will recommend customers a set of products based upon their purchase order history.

Alt Text

In this example, the highlighted products are being recommended based upon a frequently bought together learning model.

Problem

For this tutorial we will use the Amazon product co-purchasing network dataset.

In terms of an approach for building our product recommender we will use One-Class Factorization Machines which uses a collaborative filtering approach.

The difference between one-class and other Factorization Machines approach we covered is that in this dataset we only have information on purchase order history.

We do not have ratings or other details like product description etc. available to us.

Matrix Factorization relies on ‘Collaborative filtering’ which operates under the underlying assumption that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue than that of a randomly chosen person.

DataSet

The original data comes from SNAP: https://snap.stanford.edu/data/amazon0302.html

ML task - Matrix Factorization (Recommendation)

The ML Task for this sample is Matrix Factorization, which is a supervised machine learning task performing collaborative filtering.

Solution

To solve this problem, you build and train an ML model on existing training data, evaluate how good it is (analyzing the obtained metrics), and lastly you can consume/test the model to predict the demand given input data variables.

Build -> Train -> Evaluate -> Consume

1. Build model

Building a model includes:

  • Download and copy the dataset Amazon0302.txt file from https://snap.stanford.edu/data/amazon0302.html.

  • Replace the column names with only these instead: ProductID ProductID_Copurchased

  • Given in the reader we already provide a KeyRange and product ID's are already encoded all we need to do is call the MatrixFactorizationTrainer with a few extra parameters.

Here's the code which will be used to build the model:

 
    //STEP 1: Create MLContext to be shared across the model creation workflow objects 
    MLContext mlContext = new MLContext();

    //STEP 2: Read the trained data using TextLoader by defining the schema for reading the product co-purchase dataset
    //        Do remember to replace amazon0302.txt with dataset from https://snap.stanford.edu/data/amazon0302.html
    var traindata = mlContext.Data.ReadFromTextFile(path:TrainingDataLocation,
                                                      columns: new[]
                                                                {
                                                                    new TextLoader.Column(DefaultColumnNames.Label, DataKind.R4, 0),
                                                                    new TextLoader.Column(name:nameof(ProductEntry.ProductID), type:DataKind.U4, source: new [] { new TextLoader.Range(0) }, keyCount: new KeyCount(262111)), 
                                                                    new TextLoader.Column(name:nameof(ProductEntry.CoPurchaseProductID), type:DataKind.U4, source: new [] { new TextLoader.Range(1) }, keyCount: new KeyCount(262111))
                                                                },
                                                      hasHeader: true,
                                                      separatorChar: '\t');

    //STEP 3: Your data is already encoded so all you need to do is specify options for MatrxiFactorizationTrainer with a few extra hyperparameters
            //        LossFunction, Alpa, Lambda and a few others like K and C as shown below and call the trainer. 
            MatrixFactorizationTrainer.Options options = new MatrixFactorizationTrainer.Options();
            options.MatrixColumnIndexColumnName = nameof(ProductEntry.ProductID);
            options.MatrixRowIndexColumnName = nameof(ProductEntry.CoPurchaseProductID);
            options.LabelColumnName= DefaultColumnNames.Label;
            options.LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass;
            options.Alpha = 0.01;
            options.Lambda = 0.025;
            // For better results use the following parameters
            //options.K = 100;
            //options.C = 0.00001;

//Step 4: Call the MatrixFactorization trainer by passing options.
            var est = mlContext.Recommendation().Trainers.MatrixFactorization(options);

2. Train Model

Once the estimator has been defined, you can train the estimator on the training data available to us.

This will return a trained model.

    //STEP 5: Train the model fitting to the DataSet
    //Please add Amazon0302.txt dataset from https://snap.stanford.edu/data/amazon0302.html to Data folder if FileNotFoundException is thrown.
    ITransformer model = est.Fit(traindata);

3. Consume Model

We will perform predictions for this model by creating a prediction engine/function as shown below.

The prediction engine creation takes in as input the following two classes.

    public class Copurchase_prediction
    {
        public float Score { get; set; }
    }

    public class ProductEntry
    {
            [KeyType(Count = 262111)]
            public uint ProductID { get; set; }

            [KeyType(Count = 262111)]
            public uint CoPurchaseProductID { get; set; }
    }

Once the prediction engine has been created you can predict scores of two products being co-purchased.

    //STEP 6: Create prediction engine and predict the score for Product 63 being co-purchased with Product 3.
    //        The higher the score the higher the probability for this particular productID being co-purchased 
    var predictionengine = model.CreatePredictionEngine<ProductEntry, Copurchase_prediction>(ctx);
    var prediction = predictionengine.Predict(
                             new ProductEntry()
                             {
                             ProductID = 3,
                             CoPurchaseProductID = 63
                             });