Branch: master
Find file History
CESARDELATORRE Migration to v0.10 of all C# samples (#242)
* BikeSharing sample migrated to 0.10

* Added cultureInfo.InvariantCulture to resolve issue #227

* added Build props file into other samples which are not migrated yet. so that build will not file on remote.

* Did the following changes.
1.Changed the version to 0.10 for global build.props file.
2.Changed the Credit card sample with the following to fix breaking changes.
a. Replace BinaryClassificationContext with BinaryClassificationCatalog
b. Added parameter names to input parameter of ML.Net API calls.
c. Added AppendCacheCheckpoint(mlContext) in the pipeLine.
Resolved some compilation errors according to ML.Net version changes

* Migrated sample Clustering_CustomerSegmentation with the following changes
1.Replaced CreateTextReader with ReadFromTextFile.
2.Changed the order of columns while creating the estimator
3.Added parameter names in API calls.
4.Accessed Fetures and score variables with DefaultColumnNames static class

* Minor changes in Customer Segmentation project
1.Updated Readme File
2.Removed AppendCacheCheckPoint() method as the program is running fine with F5 option in visual studio.

* Minor changes to Credit card Fraud Detection.
1.Changed readme file with version to 0.10 and code changes.
2.Removed AppendCahceCheckPoint(mlContext) as the program runs fine with out delay becuase of F5 option.

* Minor changes to Bike Sharing Demand Sample.
Replaced version with 0.10 in Readme file

* Migrated GitHubLabeler sample to v0.10 with the following changes.
1.Added parameter names in ML.Net API calls
2.strong typed label,features,Predicted label strings with DefaultColumnNames class

* Migrated MultiClassClassification_Iris to v0.10.
-Changed the readme file by updating version information to 0.10
-Deleted local build.props file which has 0.9 version
-Repalced Setosa name with Virginica in  the console.writeLine as we are testing for 'Virginica'

* Migrated Clustering_Iris to v0.10 with the following changes.
1. Replaced string values with nameof(classnme.variableName).
2.Replaced string values with DefaultColumnNames calss for "Features"
3.Changed Readme file.
4.Removed Build.props file.

* Migrated MovieRecommendationE2E sample.
1.updated Readme file with version inforamtion 0.10
2.Replaced the fields with data class while defining the schema and reading training data.
3.Specifiedparameter names in API calls.
4.Specified "Feature" strings with DefaultColumnNames class.

* Migrated Sentiment Analysis sample
1.Used DefaultColumnNames Class to specify/access features, score,label
2.Used parameter names in API calls.
3.Removed the Build.props file which has version 0.9
4. Changed Readme file.

* Migrated ProductRecommendation Sample to v0.10 with the following changes.
1.Added folder with name Common and linked ConsoleHelper file.
2.Removed Build.Props file which ahs version 0.9
3.Changed the code according to ML.Net 0.10 API changes.
4.Specified parameter names in the ML.Net API calls, etc.
5.Changed Reacme file.

* Migrated SalesForecast to v0.10 by fixing the below breaking changes and some refactoring.
1. Removed Build.Props file which ahs version 0.9
3.Changed the code according to ML.Net 0.10 API changes.
4.Specified parameter names in the ML.Net API calls, used DefaultColumnNames class to access constant names like Feature,Label, score etc.
5.Changed Readme file.

* Migrated TaxiFarePedication Sample to 0.10 with the follwoing changes:
1.Changed the order of input and output columns in Ml.Net API calls.
2.Specified parameternames in API calls.
3.Refactored code using DefaultColumnNames class and nameof().
4.Changed Readme file.
5.Removed Build.Props file which has version 0.9

* Migrated MovieRecommendation Sample to v0.10. with the below changes.
1.Changed code according Ml.Net API calls.
2.Used DefaultColumnNames class to specify common ouput types like Features, score,label etc.
3.Removed Build.props file which has version 0.9
4. Changed Readme file
5. seperated DataProcessingPipeLine and TrainingPipeLine.
6. Changed the Nuget package name from MatrixFactorization to Recommender.

* Migrated HeartDisease sample to v0.10 with the following changes.
1.Removed Build.props file which ahs version 0.9
2.Changed Readme file.
3.Used nameof and DefaultColumnNames instead of strings.
4.Specified paramter names in Ml.Net API calls.

* Migrated MNIST sample to v0.10 with the following changes.
1.Refactored code
2.Changed Readme file.
3.Removed Build.Props file.
4.Changed the csproj file to refer version from global file.
5.Changed API calls according to syntax changes Ml.Net API.

* Minor changes to BikaSharinDemand sample.
1.used nameof(Classname.Fieldname) instead of "fieldname"
2.Used DefaultColumnNames class to access Fetaures,label,score etc.

* Migrated TenslorFlow Scorer to v0.10 with the following changes.
1.Specified paramternames, reordered parameters
2. Refactored code to avoid using strings directly in method calls so that we don't get exceptions runtime.
3.Updated ReadMe file.
4.Removed Build.Props file which has version 0.9

* Migrated Changes to TensorFlow Estimator
1. Fixed the breaking changes like changing from "MulticlassClassificationContext" to  "MulticlassClassificationCatalog"
2.Changed the parameter orders in the API calls and specified parameter names.
3.Changed ReadMe file.
4.Removed Build.Props file which has version 0.9

* Pushing v0.10 solution file

* Migrated SpamDetection sample to v0.10 with the fowllowing changes.
1.used ReadformText file to define the schema,train data in a single line instead of reader.
2.Changed Readme File.
3.Removed Build.Props file which had version 0.9
4. Refactored code and specified parameter names in API calls.

* Build file update and remove 0.7,0.9 solutions:
1.Updated Build file to configure Mnist sample
2.V0.10 sample changed when spam detection sample is added
3.Removed 0.7 and 0.9 samples
Latest commit ee509d4 Feb 8, 2019

README.md

Image Classification - Scoring sample

ML.NET version API type Status App Type Data type Scenario ML Task Algorithms
v0.10 Dynamic API up-to-date Console app Images and text labels Images classification TensorFlow Inceptionv3 DeepLearning model

Problem

Image classification is a common case in many business scenarios. For these cases, you can either use pre-trained models or train your own model to classify images specific to your custom domain.

DataSet

There are two data sources: the tsv file and the image files. The tsv file contains two columns: the first one is defined as ImagePath and the second one is the Label corresponding to the image. As you can observe, the file does not have a header row, and looks like this:

broccoli.jpg	broccoli
broccoli.png	broccoli
canoe2.jpg	canoe
canoe3.jpg	canoe
canoe4.jpg	canoe
coffeepot.jpg	coffeepot
coffeepot2.jpg	coffeepot
coffeepot3.jpg	coffeepot
coffeepot4.jpg	coffeepot
pizza.jpg	pizza
pizza2.jpg	pizza
pizza3.jpg	pizza
teddy1.jpg	teddy bear
teddy2.jpg	teddy bear
teddy3.jpg	teddy bear
teddy4.jpg	teddy bear
teddy6.jpg	teddy bear
toaster.jpg	toaster
toaster2.png	toaster
toaster3.jpg	toaster

The training and testing images are located in the assets folders. These images belong to Wikimedia Commons.

Wikimedia Commons, the free media repository. Retrieved 10:48, October 17, 2018 from:
https://commons.wikimedia.org/wiki/Pizza
https://commons.wikimedia.org/wiki/Coffee_pot
https://commons.wikimedia.org/wiki/Toaster
https://commons.wikimedia.org/wiki/Category:Canoes
https://commons.wikimedia.org/wiki/Teddy_bear

Pre-trained model

There are multiple models which are pre-trained for classifying images. In this case, we will use a model based on an Inception topology, and trained with images from Image.Net. This model can be downloaded from https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip, but it's also available at / src / ImageClassification / assets /inputs / inception / tensorflow_inception_graph.pb.

Solution

The console application project ImageClassification.Score can be used to classify sample images based on the pre-trained Inception-v3 TensorFlow model.

Again, note that this sample only uses/consumes a pre-trained TensorFlow model with ML.NET API. Therefore, it does not train any ML.NET model. Currently, TensorFlow is only supported in ML.NET for scoring/predicting with existing TensorFlow trained models.

You need to follow next steps in order to execute the classification test:

  1. Set VS default startup project: Set ImageClassification.Score as starting project in Visual Studio.
  2. Run the training model console app: Hit F5 in Visual Studio. At the end of the execution, the output will be similar to this screenshot: image

Code Walkthrough

There is a single project in the solution named ImageClassification.Score, which is responsible for loading the model in TensorFlow format, and then classify images.

ML.NET: Model Scoring

Define the schema of data in a class type and refer that type while loading data using TextLoader. Here the class type is ImageNetData.

public class ImageNetData
    {
        [LoadColumn(0)]
        public string ImagePath;

        [LoadColumn(1)]
        public string Label;

        public static IEnumerable<ImageNetData> ReadFromCsv(string file, string folder)
        {
            return File.ReadAllLines(file)
             .Select(x => x.Split('\t'))
             .Select(x => new ImageNetData()
             {
                 ImagePath = Path.Combine(folder,x[0]),
                 Label = x[1],
             });
        }
    }

The first step is to load the data using TextLoader

var data = mlContext.Data.ReadFromTextFile<ImageNetData>(dataLocation, hasHeader: true);

The image file used to load images has two columns: the first one is defined as ImagePath and the second one is the Label corresponding to the image.

It is important to highlight that the label in the ImageNetData class is not really used when scoring with the TensorFlow model. It is used when testing the predictions so you can compare the actual label of each sample data with the predicted label provided by the TensorFlow model.

broccoli.jpg	broccoli
bucket.png	bucket
canoe.jpg	canoe
snail.jpg	snail
teddy1.jpg	teddy bear

As you can observe, the file does not have a header row.

The second step is to define the estimator pipeline. Usually, when dealing with deep neural networks, you must adapt the images to the format expected by the network. This is the reason images are resized and then transformed (mainly, pixel values are normalized across all R,G,B channels).

 var pipeline = mlContext.Transforms.LoadImages(imageFolder: imagesFolder, columns: (outputColumnName: ImageReal, inputColumnName: nameof(ImageNetData.ImagePath)))
                            .Append(mlContext.Transforms.Resize(outputColumnName: ImageReal, imageWidth: ImageNetSettings.imageWidth, imageHeight: ImageNetSettings.imageHeight, inputColumnName: ImageReal))
                            .Append(mlContext.Transforms.ExtractPixels(columns: new[] { new ImagePixelExtractorTransformer.ColumnInfo(name: InceptionSettings.inputTensorName, inputColumnName: ImageReal, interleave: ImageNetSettings.channelsLast, offset: ImageNetSettings.mean) }))
                            .Append(mlContext.Transforms.ScoreTensorFlowModel(modelLocation:modelLocation, outputColumnNames:new[] { InceptionSettings.outputTensorName }, inputColumnNames: new[] { InceptionSettings.inputTensorName } ));

You also need to check the neural network, and check the names of the input / output nodes. In order to inspect the model, you can use tools like Netron, which is automatically installed with Visual Studio Tools for AI. These names are used later in the definition of the estimation pipe: in the case of the inception network, the input tensor is named 'input' and the output is named 'softmax2'

inspecting neural network with netron

Finally, we extract the prediction engine after fitting the estimator pipeline. The prediction engine receives as parameter an object of type ImageNetData (containing 2 properties: ImagePath and Label), and then returns and object of type ImagePrediction.

 var modeld = pipeline.Fit(data);
 var predictionEngine = modeld.CreatePredictionEngine<ImageNetData, ImageNetPrediction>(env);

When obtaining the prediction, we get an array of floats in the property PredictedLabels. Each position in the array is assigned to a label, so for example, if the model has 5 different labels, the array will be length = 5. Each position in the array represents the label's probability in that position; the sum of all array values (probabilities) is equal to one. Then, you need to select the biggest value (probability), and check which is the assigned label to that position.

Citation

Training and prediction images

Wikimedia Commons, the free media repository. Retrieved 10:48, October 17, 2018 from https://commons.wikimedia.org/w/index.php?title=Main_Page&oldid=313158208.