Skip to content
Branch: master
Find file History
prathyusha12345 Migrate Samples to 1.0.0 (#404)
* added End-to-End Object detection sample

* The model is predicting differently for different size of same image. Fixed by adding Fill option in resizing images.
other Minor changes.

* fixing namespace name

* Changed ML.Net version to latest preview version i.e 1.0.0-preview-27614-1
.Added new solution file for latest changes of ML.Net
-Added 0.12 version in Build.Props file for packages like ML.TensorFlow, Ml.Onnx instead of updating directly in project file.

* Removed hard coding

* Remvoed hard coding

* Remvoed hard coding and some minor changes

* Minor changes in Power Anomaly Detection

* Minor changes in Product Recommender

* Minor changes in Sales Forecast

* Minor changes in Shampoo Sales console app

* Minor changes in shampoo sales E2E app.

* Minor changes in Tensor Flow Estimator sample

* Added deveops feed for ML.Net

* Minor changes

* Added launchsettings.json

* removed older solution file.

* Added another nuget feed for latest changes.
changed Build.props file

* Removed commented code.

* Update README.md

* renamed folder

* minor changes

* Removed unnecessary code.

* Renamed folders

* Udated ReadMe file.

* Added link in the ReaMe file of launching page

* Changed to latest version of ML.Net feed

<MicrosoftMLVersion>1.0.0-preview-27625-9</MicrosoftMLVersion>
    <MicrosoftMLPreviewVersion>0.12.0-preview-27625-9</MicrosoftMLPreviewVersion>

* -Added Microsoft.ML.Fasttree nuget package to fix build issue.
-Repalced partial with partialAsync in .cshtml file to fix deadlock warnings.
-Replaced using filestream with filepath

* Changed the version to latest of ML.Net. this is similar to 1.0

* Minor changes

* changed the version of ML.Net to 1.0.0 and tesnorflow,Onnx versions to 0.12.0

* CreditCard Fraud detection sample -committing updated model file while doing migration to 1.0.0

* Customer Segmentation sample-commiting updated model file

* Updated model file for GitHubLabeler sample

* Commiting updated model file for MovieRecommender E2E sample

* Fixed path issue.

* Added the previously deleted Object Detection E2E solution

* Removed the redundant project.

* Minor changes

* Changed version in ReadMe file of all samples

* Made folders consistant

* Reloaded the projects

* Reloaded the projects in sln for spike detection e2e sample

* Changed folder name

* Removed the dev div package feed

* Changed the path in CI build file
Latest commit bd5f36c May 2, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
TaxiFarePrediction Fix pedigree scan results issues. (#389) Apr 25, 2019
images Moved image Oct 29, 2018
README.md Migrate Samples to 1.0.0 (#404) May 3, 2019
TaxiFarePrediction.sln Making shorter the path for TaxiFarePrediciton sample Nov 9, 2018

README.md

Taxi Fare Prediction

ML.NET version API type Status App Type Data type Scenario ML Task Algorithms
v1.0.0 Dynamic API Up-to-date Console app .csv files Price prediction Regression Sdca Regression

In this introductory sample, you'll see how to use ML.NET to predict taxi fares. In the world of machine learning, this type of prediction is known as regression.

Problem

This problem is centered around predicting the fare of a taxi trip in New York City. At first glance, it may seem to depend simply on the distance traveled. However, taxi vendors in New York charge varying amounts for other factors such as additional passengers, paying with a credit card instead of cash and so on. This prediction can be used in application for taxi providers to give users and drivers an estimate on ride fares.

To solve this problem, we will build an ML model that takes as inputs:

  • vendor ID
  • rate code
  • passenger count
  • trip time
  • trip distance
  • payment type

and predicts the fare of the ride.

ML task - Regression

The generalized problem of regression is to predict some continuous value for given parameters, for example:

  • predict a house prise based on number of rooms, location, year built, etc.
  • predict a car fuel consumption based on fuel type and car parameters.
  • predict a time estimate for fixing an issue based on issue attributes.

The common feature for all those examples is that the parameter we want to predict can take any numeric value in certain range. In other words, this value is represented by integer or float/double, not by enum or boolean types.

Solution

To solve this problem, first we will build an ML model. Then we will train the model on existing data, evaluate how good it is, and lastly we'll consume the model to predict taxi fares.

Build -> Train -> Evaluate -> Consume

1. Build model's pipeline

Building a model includes: uploading data (taxi-fare-train.csv with TextLoader), transforming the data so it can be used effectively by an ML algorithm (StochasticDualCoordinateAscent in this case):

//Create ML Context with seed for repeteable/deterministic results
MLContext mlContext = new MLContext(seed: 0);

// STEP 1: Common data loading configuration
IDataView baseTrainingDataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(TrainDataPath, hasHeader: true, separatorChar: ',');
IDataView testDataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(TestDataPath, hasHeader: true, separatorChar: ',');

//Sample code of removing extreme data like "outliers" for FareAmounts higher than $150 and lower than $1 which can be error-data 
var cnt = baseTrainingDataView.GetColumn<float>(nameof(TaxiTrip.FareAmount)).Count();
IDataView trainingDataView = mlContext.Data.FilterRowsByColumn(baseTrainingDataView, nameof(TaxiTrip.FareAmount), lowerBound: 1, upperBound: 150);
var cnt2 = trainingDataView.GetColumn<float>(nameof(TaxiTrip.FareAmount)).Count();

// STEP 2: Common data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: nameof(TaxiTrip.FareAmount))
                            .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "VendorIdEncoded", inputColumnName: nameof(TaxiTrip.VendorId)))
                            .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "RateCodeEncoded", inputColumnName: nameof(TaxiTrip.RateCode)))
                            .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "PaymentTypeEncoded",inputColumnName: nameof(TaxiTrip.PaymentType)))
                            .Append(mlContext.Transforms.NormalizeMeanVariance(outputColumnName: nameof(TaxiTrip.PassengerCount)))
                            .Append(mlContext.Transforms.NormalizeMeanVariance(outputColumnName: nameof(TaxiTrip.TripTime)))
                            .Append(mlContext.Transforms.NormalizeMeanVariance(outputColumnName: nameof(TaxiTrip.TripDistance)))
                            .Append(mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded", "PaymentTypeEncoded", nameof(TaxiTrip.PassengerCount)
                            , nameof(TaxiTrip.TripTime), nameof(TaxiTrip.TripDistance)));


// STEP 3: Set the training algorithm, then create and config the modelBuilder - Selected Trainer (SDCA Regression algorithm)                            
var trainer = mlContext.Regression.Trainers.Sdca(labelColumnName: "Label", featureColumnName: "Features");
var trainingPipeline = dataProcessPipeline.Append(trainer);

2. Train model

Training the model is a process of running the chosen algorithm on a training data (with known fare values) to tune the parameters of the model. It is implemented in the Fit() API. To perform training we just call the method while providing the DataView.

var trainedModel = trainingPipeline.Fit(trainingDataView);

3. Evaluate model

We need this step to conclude how accurate our model operates on new data. To do so, the model from the previous step is run against another dataset that was not used in training (taxi-fare-test.csv). This dataset also contains known fares. Regression.Evaluate() calculates the difference between known fares and values predicted by the model in various metrics.

IDataView predictions = trainedModel.Transform(testDataView);
var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "Label", scoreColumnName: "Score");

Common.ConsoleHelper.PrintRegressionMetrics(trainer.ToString(), metrics);

To learn more on how to understand the metrics, check out the Machine Learning glossary from the ML.NET Guide or use any available materials on data science and machine learning.

If you are not satisfied with the quality of the model, there are a variety of ways to improve it, which will be covered in the examples category.

Keep in mind that for this sample the quality is lower than it could be because the datasets were reduced in size for performance purposes. You can use the original datasets to significantly improve the quality (Original datasets are referenced in datasets README).

4. Consume model

After the model is trained, we can use the Predict() API to predict the fare amount for specified trip.

//Sample: 
//vendor_id,rate_code,passenger_count,trip_time_in_secs,trip_distance,payment_type,fare_amount
//VTS,1,1,1140,3.75,CRD,15.5

var taxiTripSample = new TaxiTrip()
{
    VendorId = "VTS",
    RateCode = "1",
    PassengerCount = 1,
    TripTime = 1140,
    TripDistance = 3.75f,
    PaymentType = "CRD",
    FareAmount = 0 // To predict. Actual/Observed = 15.5
};

ITransformer trainedModel;
using (var stream = new FileStream(ModelPath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    trainedModel = mlContext.Model.Load(stream, out var modelInputSchema);
}

// Create prediction engine related to the loaded trained model
var predEngine = mlContext.Model.CreatePredictionEngine<TaxiTrip, TaxiTripFarePrediction>(trainedModel);

//Score
var resultprediction = predEngine.Predict(taxiTripSample);

Console.WriteLine($"**********************************************************************");
Console.WriteLine($"Predicted fare: {resultprediction.FareAmount:0.####}, actual fare: 15.5");
Console.WriteLine($"**********************************************************************");

Finally, you can plot in a chart how the tested predictions are distributed and how the regression is performing with the implemented method PlotRegressionChart() as in the following screenshot:

Regression plot-chart

You can’t perform that action at this time.