# ML.Net - Taxi Prediction Price

## Scenario: Regression model for Taxi fares

Regression is a ML task type of supervised machine learning algorithms. 
A regression ML model predicts continuous value outputs (such as numbers). 
For instance, predicting the fare of a Taxi trip or predicting the price of a car is a regression problem.

## Davi Ramos -> Cientista de Dados 👋
(davi.info@gmail.com)

[![Linkedin Badge](https://img.shields.io/badge/-LinkedIn-blue?style=flat-square&logo=Linkedin&logoColor=white&link=https://www.linkedin.com/in/davi-ramos/)](https://www.linkedin.com/in/davi-ramos/)
[![Twitter Badge](https://img.shields.io/badge/-Twitter-1DA1F2?style=flat-square&logo=Twitter&logoColor=white&link=https://twitter.com/Daviinfo/)](https://twitter.com/Daviinfo/)
<a href="https://github.com/DaviRamos"><img src="https://img.shields.io/github/followers/DaviRamos.svg?label=GitHub&style=social" alt="GitHub"></a>

In [1]:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML"

//Install XPlot package
#r "nuget:XPlot.Plotly"

Installed package Microsoft.ML version 1.5.2

Installed package XPlot.Plotly version 3.0.1

In [2]:
using System;
using Microsoft.ML;
using System.IO;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
using static Microsoft.ML.DataOperationsCatalog;
using XPlot.Plotly;

In [3]:
/// <summary>
/// The TaxiTrip class represents a single taxi trip.
/// </summary>
public class TaxiTrip
{
    [LoadColumn(0)] public string VendorId;
    [LoadColumn(5)] public string RateCode;
    [LoadColumn(3)] public float PassengerCount;
    [LoadColumn(4)] public float TripDistance;
    [LoadColumn(9)] public string PaymentType;
    [LoadColumn(10)] public float FareAmount;
}

/// <summary>
/// The TaxiTripFarePrediction class represents a single far prediction.
/// </summary>
public class TaxiTripFarePrediction
{
    [ColumnName("Score")] public float FareAmount;
}











In [4]:
// file paths to data files
static readonly string dataPath = Path.Combine(Environment.CurrentDirectory, "./Datasets/taxi/taxi-fare-train-small.csv");

/// <summary>
/// The main application entry point.
/// </summary>
/// <param name="args">The command line arguments.</param>

// create the machine learning context
var mlContext = new MLContext();

// set up the text loader 
var textLoader = mlContext.Data.CreateTextLoader(
    new TextLoader.Options() 
    {
        Separators = new[] { ',' },
        HasHeader = true,
        Columns = new[] 
        {
            new TextLoader.Column("VendorId", DataKind.String, 0),
            new TextLoader.Column("RateCode", DataKind.String, 5),
            new TextLoader.Column("PassengerCount", DataKind.Single, 3),
            new TextLoader.Column("TripDistance", DataKind.Single, 4),
            new TextLoader.Column("PaymentType", DataKind.String, 9),
            new TextLoader.Column("FareAmount", DataKind.Single, 10)
        }
    }
);

// load the data 
Console.Write("Loading training data....");
var dataView = textLoader.Load(dataPath);
Console.WriteLine("done");

// split into a training and test partition
TrainTestData partitions  = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);

Loading training data....done


In [5]:
display(h4("Schema of training DataView:"));
display(partitions.TrainSet.Schema);

index,Name,Index,IsHidden,Type,Annotations
0,VendorId,0,False,String,
1,RateCode,1,False,String,
2,PassengerCount,2,False,Single,
3,TripDistance,3,False,Single,
4,PaymentType,4,False,String,
5,FareAmount,5,False,Single,


## Show a few rows of loaded data 

In [6]:
//Util class to preview loaded data in IDataView
public static List<TaxiTrip> Head(MLContext mlContext, IDataView dataView, int numberOfRows = 4)
{
    string msg = string.Format("DataView: Showing {0} rows with the columns", numberOfRows.ToString());
    display(msg);
          
    var rows = mlContext.Data.CreateEnumerable<TaxiTrip>(dataView, reuseRowObject: false)
                    .Take(numberOfRows)
                    .ToList();
    
    return rows;
}

display(h4("Showing a few rows from training DataView:"));

var fewRows = Head(mlContext, partitions.TrainSet, 5);
display(fewRows);

DataView: Showing 5 rows with the columns

index,VendorId,RateCode,PassengerCount,TripDistance,PaymentType,FareAmount
0,CMT,CRD,1271,3.8,,0
1,CMT,CRD,474,1.5,,0
2,CMT,CRD,637,1.4,,0
3,CMT,CSH,181,0.6,,0
4,CMT,CRD,661,1.1,,0


## Extract important input variables as arrays to be used for plotting

In [7]:
//Extract some data into arrays for plotting:

int numberOfRows = 1000;
float[] fares = partitions.TrainSet.GetColumn<float>("FareAmount").Take(numberOfRows).ToArray();
float[] distances = partitions.TrainSet.GetColumn<float>("TripDistance").Take(numberOfRows).ToArray();
float[] passengerCounts = partitions.TrainSet.GetColumn<float>("PassengerCount").Take(numberOfRows).ToArray();

## Show a histogram: Distribution of taxi trips per fare cost 

In [8]:
// Distribution of taxi trips per cost
//XPlot Histogram reference: http://tpetricek.github.io/XPlot/reference/xplot-plotly-graph-histogram.html

var faresHistogram = Chart.Plot(new Graph.Histogram(){x = fares, autobinx = false, nbinsx = 20});
var layout = new Layout.Layout(){title="Distribution of taxi trips per cost"};
faresHistogram.WithLayout(layout);
faresHistogram.WithXTitle("Fare ranges");
faresHistogram.WithYTitle("Number of trips");
display(faresHistogram);

# Plot Fares depending on trip's passengers 

In [9]:
// Plot Fare depending on Passengers

int numberOfRows = 2000;
float[] fares = partitions.TrainSet.GetColumn<float>("FareAmount").Take(numberOfRows).ToArray();
float[] passengerCounts = partitions.TrainSet.GetColumn<float>("PassengerCount").Take(numberOfRows).ToArray();

float[] distances = partitions.TrainSet.GetColumn<float>("TripDistance").Take(numberOfRows).ToArray();


var chartFareVsPassengers = Chart.Plot(
    new Graph.Scatter()
    {
        x = passengerCounts,
        y = fares,
        mode = "markers",
    }
);

var layout = new Layout.Layout(){title="Plot Fare depending on Passengers"};
chartFareVsPassengers.WithLayout(layout);
chartFareVsPassengers.Width = 500;
chartFareVsPassengers.Height = 500;
chartFareVsPassengers.WithXTitle("Passengers");
chartFareVsPassengers.WithYTitle("Fares");
chartFareVsPassengers.WithLegend(false);

display(chartFareVsPassengers);

In [10]:
display(h1("Apply Data Transformations pipeline"));
// set up a learning pipeline
var pipeline = mlContext.Transforms.CopyColumns(
    inputColumnName:"FareAmount", 
    outputColumnName:"Label")

    // one-hot encode all text features
    .Append(mlContext.Transforms.Categorical.OneHotEncoding("VendorId"))
    .Append(mlContext.Transforms.Categorical.OneHotEncoding("RateCode"))
    .Append(mlContext.Transforms.Categorical.OneHotEncoding("PaymentType"))

    // combine all input features into a single column 
    .Append(mlContext.Transforms.Concatenate(
        "Features", 
        "VendorId", 
        "RateCode", 
        "PassengerCount", 
        "TripDistance", 
        "PaymentType"))

    // cache the data to speed up training
    .AppendCacheCheckpoint(mlContext)

    // use the fast tree learner 
    .Append(mlContext.Regression.Trainers.OnlineGradientDescent(labelColumnName: "FareAmount", featureColumnName: "Features"));


// train the model
Console.Write("Training the model....");
var model = pipeline.Fit(partitions.TrainSet);
Console.WriteLine("done");

Training the model....done






In [11]:
// get a set of predictions 
Console.Write("Evaluating the model....");
var predictions = model.Transform(partitions.TestSet);


// get regression metrics to score the model
var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "FareAmount", scoreColumnName: "Score");
display(metrics);

Evaluating the model....

MeanAbsoluteError,MeanSquaredError,RootMeanSquaredError,LossFunction,RSquared
0,0,0,0,


In [12]:
// create a prediction engine for one single prediction
var prediction  = mlContext.Model.CreatePredictionEngine<TaxiTrip, TaxiTripFarePrediction>(model).Predict(
    new TaxiTrip()
    {
        VendorId = "VTS",
        RateCode = "1",
        PassengerCount = 1,
        TripDistance = 3.75f,
        PaymentType = "1",
        FareAmount = 0 // actual fare for this trip = 15.5
    });

// show the prediction
Console.WriteLine($"Single prediction:");
Console.WriteLine($"  Predicted fare: {prediction.FareAmount:0.####}");
Console.WriteLine($"  Actual fare: 15.5");

Single prediction:
  Predicted fare: 0
  Actual fare: 15.5
