# ML.Net - Samples - Regression_BikeSharingDemand

## Bike Sharing Demand - Regression problem sample

| ML.NET version | API type          | Status                        | App Type    | Data type | Scenario            | ML Task                   | Algorithms                  |
|----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
| v1.5 | Dynamic API | Up-to-date | Jupyter Notebook | .csv files | Demand prediction | Regression | Fast Tree regressor compared to additional regression algorithms|

In this sample, you can see how to use ML.NET to predict the demand of bikes. Since you are trying to predict specific numeric values based on past observed data, in machine learning this type of method for prediction is known as regression.

## Problem

For a more detailed descritpion of the problem, read the details from the original [
Bike Sharing Demand competition from Kaggle](https://www.kaggle.com/c/bike-sharing-demand).

## DataSet

The original data comes from a public UCI dataset:
https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

## ML task - [Regression](https://docs.microsoft.com/en-us/dotnet/machine-learning/resources/tasks#regression)

The ML Task for this sample is a Regression, which is a supervised machine learning task that is used to predict the value of the label (in this case the demand units prediction) from a set of related features/variables. 

## Solution

To solve this problem, you build and train an ML model on existing training data, evaluate how good it is (analyzing the obtained metrics), and lastly you can consume/test the model to predict the demand given input data variables.

![Build -> Train -> Evaluate -> Consume](../shared_content/modelpipeline.png)

However, in this example we train multiple models (instead of a single one), each one based on a different regression learner/algorithm and finally we evaluate the accuracy of each approach/algorithm, so you can choose the trained model with better accuracy.

The following list are the trainers/algorithms used and compared:

- Fast Tree
- Poisson Regressor
- SDCA (Stochastic Dual Coordinate Ascent) Regressor
- FastTreeTweedie

In [None]:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML" 

// ML.NET FastTree Nuget packages installation
#r "nuget:Microsoft.ML.FastTree" 

## Using C# Class

In [None]:
using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Collections.Generic;
using static Microsoft.ML.TrainCatalogBase;
using static Microsoft.ML.DataOperationsCatalog;
using System.Diagnostics;
using System.Globalization;

## Declare data-classes for input data and predictions

In [None]:
public class DemandObservation
{
    // Note that we're loading only some columns (certain indexes) starting on column number 2
    // Also, the label column is number 16. 
    // Columns 14, 15 are not being loaded from the file.
    [LoadColumn(2)]
    public float Season { get; set; }
    [LoadColumn(3)]
    public float Year { get; set; }
    [LoadColumn(4)]
    public float Month { get; set; }
    [LoadColumn(5)]
    public float Hour { get; set; }
    [LoadColumn(6)]
    public float Holiday { get; set; }
    [LoadColumn(7)]
    public float Weekday { get; set; }
    [LoadColumn(8)]
    public float WorkingDay { get; set; }
    [LoadColumn(9)]
    public float Weather { get; set; }
    [LoadColumn(10)]
    public float Temperature { get; set; }
    [LoadColumn(11)]
    public float NormalizedTemperature { get; set; }
    [LoadColumn(12)]
    public float Humidity { get; set; }
    [LoadColumn(13)]
    public float Windspeed { get; set; }
    [LoadColumn(16)]
    [ColumnName("Label")]
    public float Count { get; set; }   // This is the observed count, to be used a "label" to predict
}

public class DemandPrediction
{
    [ColumnName("Score")]
    public float PredictedCount;
}

In [None]:
public static class DemandObservationSample
{
    public static DemandObservation SingleDemandSampleData =>
                                    // Single data
                                    // instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
                                    // 13950,2012-08-09,3,1,8,10,0,4,1,1,0.8,0.7576,0.55,0.2239,72,133,205
                                    new DemandObservation()
                                    {
                                        Season = 3,
                                        Year = 1,
                                        Month = 8,
                                        Hour = 10,
                                        Holiday = 0,
                                        Weekday = 4,
                                        WorkingDay = 1,
                                        Weather = 1,
                                        Temperature = 0.8f,
                                        NormalizedTemperature = 0.7576f,
                                        Humidity = 0.55f,
                                        Windspeed = 0.2239f
                                    };
}

### Constants

In [None]:
private static string ModelsLocation = @"./datasets/Regression_BikeSharingDemand/MLModels";
private static string TrainingDataLocation = @"./datasets/Regression_BikeSharingDemand/hour_train.csv";
private static string TestDataLocation = @"./datasets/Regression_BikeSharingDemand/hour_train.csv";

### ConsoleHelper

In [None]:
public static class ConsoleHelper
{
    public static void PrintPrediction(string prediction)
    {
        Console.WriteLine($"*************************************************");
        Console.WriteLine($"Predicted : {prediction}");
        Console.WriteLine($"*************************************************");
    }

    public static void PrintRegressionPredictionVersusObserved(string predictionCount, string observedCount)
    {
        Console.WriteLine($"-------------------------------------------------");
        Console.WriteLine($"Predicted : {predictionCount}");
        Console.WriteLine($"Actual:     {observedCount}");
        Console.WriteLine($"-------------------------------------------------");
    }

    public static void PrintRegressionMetrics(string name, RegressionMetrics metrics)
    {
        Console.WriteLine($"*************************************************");
        Console.WriteLine($"*       Metrics for {name} regression model      ");
        Console.WriteLine($"*------------------------------------------------");
        Console.WriteLine($"*       LossFn:        {metrics.LossFunction:0.##}");
        Console.WriteLine($"*       R2 Score:      {metrics.RSquared:0.##}");
        Console.WriteLine($"*       Absolute loss: {metrics.MeanAbsoluteError:#.##}");
        Console.WriteLine($"*       Squared loss:  {metrics.MeanSquaredError:#.##}");
        Console.WriteLine($"*       RMS loss:      {metrics.RootMeanSquaredError:#.##}");
        Console.WriteLine($"*************************************************");
    }

    public static void PrintBinaryClassificationMetrics(string name, CalibratedBinaryClassificationMetrics metrics)
    {
        Console.WriteLine($"************************************************************");
        Console.WriteLine($"*       Metrics for {name} binary classification model      ");
        Console.WriteLine($"*-----------------------------------------------------------");
        Console.WriteLine($"*       Accuracy: {metrics.Accuracy:P2}");
        Console.WriteLine($"*       Area Under Curve:      {metrics.AreaUnderRocCurve:P2}");
        Console.WriteLine($"*       Area under Precision recall Curve:  {metrics.AreaUnderPrecisionRecallCurve:P2}");
        Console.WriteLine($"*       F1Score:  {metrics.F1Score:P2}");
        Console.WriteLine($"*       LogLoss:  {metrics.LogLoss:#.##}");
        Console.WriteLine($"*       LogLossReduction:  {metrics.LogLossReduction:#.##}");
        Console.WriteLine($"*       PositivePrecision:  {metrics.PositivePrecision:#.##}");
        Console.WriteLine($"*       PositiveRecall:  {metrics.PositiveRecall:#.##}");
        Console.WriteLine($"*       NegativePrecision:  {metrics.NegativePrecision:#.##}");
        Console.WriteLine($"*       NegativeRecall:  {metrics.NegativeRecall:P2}");
        Console.WriteLine($"************************************************************");
    }

    public static void PrintMultiClassClassificationMetrics(string name, MulticlassClassificationMetrics metrics)
    {
        Console.WriteLine($"************************************************************");
        Console.WriteLine($"*    Metrics for {name} multi-class classification model   ");
        Console.WriteLine($"*-----------------------------------------------------------");
        Console.WriteLine($"    AccuracyMacro = {metrics.MacroAccuracy:0.####}, a value between 0 and 1, the closer to 1, the better");
        Console.WriteLine($"    AccuracyMicro = {metrics.MicroAccuracy:0.####}, a value between 0 and 1, the closer to 1, the better");
        Console.WriteLine($"    LogLoss = {metrics.LogLoss:0.####}, the closer to 0, the better");
        Console.WriteLine($"    LogLoss for class 1 = {metrics.PerClassLogLoss[0]:0.####}, the closer to 0, the better");
        Console.WriteLine($"    LogLoss for class 2 = {metrics.PerClassLogLoss[1]:0.####}, the closer to 0, the better");
        Console.WriteLine($"    LogLoss for class 3 = {metrics.PerClassLogLoss[2]:0.####}, the closer to 0, the better");
        Console.WriteLine($"************************************************************");
    }
    
    public static void PrintRegressionFoldsAverageMetrics(string algorithmName, IReadOnlyList<CrossValidationResult<RegressionMetrics>> crossValidationResults)
    {
        var L1 = crossValidationResults.Select(r => r.Metrics.MeanAbsoluteError);
        var L2 = crossValidationResults.Select(r => r.Metrics.MeanSquaredError);
        var RMS = crossValidationResults.Select(r => r.Metrics.RootMeanSquaredError);
        var lossFunction = crossValidationResults.Select(r => r.Metrics.LossFunction);
        var R2 = crossValidationResults.Select(r => r.Metrics.RSquared);

        Console.WriteLine($"*************************************************************************************************************");
        Console.WriteLine($"*       Metrics for {algorithmName} Regression model      ");
        Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
        Console.WriteLine($"*       Average L1 Loss:    {L1.Average():0.###} ");
        Console.WriteLine($"*       Average L2 Loss:    {L2.Average():0.###}  ");
        Console.WriteLine($"*       Average RMS:          {RMS.Average():0.###}  ");
        Console.WriteLine($"*       Average Loss Function: {lossFunction.Average():0.###}  ");
        Console.WriteLine($"*       Average R-squared: {R2.Average():0.###}  ");
        Console.WriteLine($"*************************************************************************************************************");
    }
    
    public static void PrintMulticlassClassificationFoldsAverageMetrics(
                                     string algorithmName,
                                   IReadOnlyList<CrossValidationResult<MulticlassClassificationMetrics>> crossValResults
                                                                       )
    {
        var metricsInMultipleFolds = crossValResults.Select(r => r.Metrics);

        var microAccuracyValues = metricsInMultipleFolds.Select(m => m.MicroAccuracy);
        var microAccuracyAverage = microAccuracyValues.Average();
        var microAccuraciesStdDeviation = CalculateStandardDeviation(microAccuracyValues);
        var microAccuraciesConfidenceInterval95 = CalculateConfidenceInterval95(microAccuracyValues);

        var macroAccuracyValues = metricsInMultipleFolds.Select(m => m.MacroAccuracy);
        var macroAccuracyAverage = macroAccuracyValues.Average();
        var macroAccuraciesStdDeviation = CalculateStandardDeviation(macroAccuracyValues);
        var macroAccuraciesConfidenceInterval95 = CalculateConfidenceInterval95(macroAccuracyValues);

        var logLossValues = metricsInMultipleFolds.Select(m => m.LogLoss);
        var logLossAverage = logLossValues.Average();
        var logLossStdDeviation = CalculateStandardDeviation(logLossValues);
        var logLossConfidenceInterval95 = CalculateConfidenceInterval95(logLossValues);

        var logLossReductionValues = metricsInMultipleFolds.Select(m => m.LogLossReduction);
        var logLossReductionAverage = logLossReductionValues.Average();
        var logLossReductionStdDeviation = CalculateStandardDeviation(logLossReductionValues);
        var logLossReductionConfidenceInterval95 = CalculateConfidenceInterval95(logLossReductionValues);

        Console.WriteLine($"*************************************************************************************************************");
        Console.WriteLine($"*       Metrics for {algorithmName} Multi-class Classification model      ");
        Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
        Console.WriteLine($"*       Average MicroAccuracy:    {microAccuracyAverage:0.###}  - Standard deviation: ({microAccuraciesStdDeviation:#.###})  - Confidence Interval 95%: ({microAccuraciesConfidenceInterval95:#.###})");
        Console.WriteLine($"*       Average MacroAccuracy:    {macroAccuracyAverage:0.###}  - Standard deviation: ({macroAccuraciesStdDeviation:#.###})  - Confidence Interval 95%: ({macroAccuraciesConfidenceInterval95:#.###})");
        Console.WriteLine($"*       Average LogLoss:          {logLossAverage:#.###}  - Standard deviation: ({logLossStdDeviation:#.###})  - Confidence Interval 95%: ({logLossConfidenceInterval95:#.###})");
        Console.WriteLine($"*       Average LogLossReduction: {logLossReductionAverage:#.###}  - Standard deviation: ({logLossReductionStdDeviation:#.###})  - Confidence Interval 95%: ({logLossReductionConfidenceInterval95:#.###})");
        Console.WriteLine($"*************************************************************************************************************");
    }    

    public static double CalculateStandardDeviation (IEnumerable<double> values)
    {
        double average = values.Average();
        double sumOfSquaresOfDifferences = values.Select(val => (val - average) * (val - average)).Sum();
        double standardDeviation = Math.Sqrt(sumOfSquaresOfDifferences / (values.Count()-1));
        return standardDeviation;
    }

    public static double CalculateConfidenceInterval95(IEnumerable<double> values)
    {
        double confidenceInterval95 = 1.96 * CalculateStandardDeviation(values) / Math.Sqrt((values.Count()-1));
        return confidenceInterval95;
    }

    public static void PrintClusteringMetrics(string name, ClusteringMetrics metrics)
    {
        Console.WriteLine($"*************************************************");
        Console.WriteLine($"*       Metrics for {name} clustering model      ");
        Console.WriteLine($"*------------------------------------------------");
        Console.WriteLine($"*       Average Distance: {metrics.AverageDistance}");
        Console.WriteLine($"*       Davies Bouldin Index is: {metrics.DaviesBouldinIndex}");
        Console.WriteLine($"*************************************************");
    }    
    
    public static void PeekDataViewInConsole(MLContext mlContext, IDataView dataView, IEstimator<ITransformer> pipeline, int numberOfRows = 4)
    {
        string msg = string.Format("Peek data in DataView: Showing {0} rows with the columns", numberOfRows.ToString());
        ConsoleWriteHeader(msg);

        //https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md#how-do-i-look-at-the-intermediate-data
        var transformer = pipeline.Fit(dataView);
        var transformedData = transformer.Transform(dataView);

        // 'transformedData' is a 'promise' of data, lazy-loading. call Preview  
        //and iterate through the returned collection from preview.

        var preViewTransformedData = transformedData.Preview(maxRows: numberOfRows);

        foreach (var row in preViewTransformedData.RowView)
        {
            var ColumnCollection = row.Values;
            string lineToPrint = "Row--> ";
            foreach (KeyValuePair<string, object> column in ColumnCollection)
            {
                lineToPrint += $"| {column.Key}:{column.Value}";
            }
            Console.WriteLine(lineToPrint + "\n");
        }
    }
    
    public static void PeekVectorColumnDataInConsole(MLContext mlContext, string columnName, IDataView dataView, IEstimator<ITransformer> pipeline, int numberOfRows = 4)
    {
        string msg = string.Format("Peek data in DataView: : Show {0} rows with just the '{1}' column", numberOfRows, columnName );
        ConsoleWriteHeader(msg);

        var transformer = pipeline.Fit(dataView);
        var transformedData = transformer.Transform(dataView);

        // Extract the 'Features' column.
        var someColumnData = transformedData.GetColumn<float[]>(columnName)
                                                    .Take(numberOfRows).ToList();

        // print to console the peeked rows

        int currentRow = 0;
        someColumnData.ForEach(row => {
                                        currentRow++;
                                        String concatColumn = String.Empty;
                                        foreach (float f in row)
                                        {
                                            concatColumn += f.ToString();                                              
                                        }

                                        Console.WriteLine();
                                        string rowMsg = string.Format("**** Row {0} with '{1}' field value ****", currentRow, columnName);
                                        Console.WriteLine(rowMsg);
                                        Console.WriteLine(concatColumn);
                                        Console.WriteLine();
                                      });
    }
    
    public static void ConsoleWriteHeader(params string[] lines)
    {
        var defaultColor = Console.ForegroundColor;
        Console.ForegroundColor = ConsoleColor.Yellow;
        Console.WriteLine(" ");
        foreach (var line in lines)
        {
            Console.WriteLine(line);
        }
        var maxLength = lines.Select(x => x.Length).Max();
        Console.WriteLine(new string('#', maxLength));
        Console.ForegroundColor = defaultColor;
    }

    public static void ConsoleWriterSection(params string[] lines)
    {
        var defaultColor = Console.ForegroundColor;
        Console.ForegroundColor = ConsoleColor.Blue;
        Console.WriteLine(" ");
        foreach (var line in lines)
        {
            Console.WriteLine(line);
        }
        var maxLength = lines.Select(x => x.Length).Max();
        Console.WriteLine(new string('-', maxLength));
        Console.ForegroundColor = defaultColor;
    }
    
}

### ModelScoringTester

In [None]:
public static class ModelScoringTester
{
    public static void VisualizeSomePredictions(MLContext mlContext,
                                                string modelName, 
                                                string testDataLocation,
                                                PredictionEngine<DemandObservation, DemandPrediction> predEngine,
                                                int numberOfPredictions)
    {
        //Make a few prediction tests 
        // Make the provided number of predictions and compare with observed data from the test dataset
        var testData = ReadSampleDataFromCsvFile(testDataLocation, numberOfPredictions);

        for (int i = 0; i < numberOfPredictions; i++)
        {
            //Score
            var resultprediction = predEngine.Predict(testData[i]);

            ConsoleHelper.PrintRegressionPredictionVersusObserved(resultprediction.PredictedCount.ToString(), 
                                                        testData[i].Count.ToString());
        }

    }

    //This method is using regular .NET System.IO.File and LinQ to read just some sample data to test/predict with 
    public static List<DemandObservation> ReadSampleDataFromCsvFile(string dataLocation, int numberOfRecordsToRead)
    {
        return File.ReadLines(dataLocation)
            .Skip(1)
            .Where(x => !string.IsNullOrWhiteSpace(x))
            .Select(x => x.Split(','))
            .Select(x => new DemandObservation()
            {
                Season = float.Parse(x[2], CultureInfo.InvariantCulture),
                Year = float.Parse(x[3], CultureInfo.InvariantCulture),
                Month = float.Parse(x[4], CultureInfo.InvariantCulture),
                Hour = float.Parse(x[5], CultureInfo.InvariantCulture),
                Holiday = float.Parse(x[6], CultureInfo.InvariantCulture),
                Weekday = float.Parse(x[7], CultureInfo.InvariantCulture),
                WorkingDay = float.Parse(x[8], CultureInfo.InvariantCulture),
                Weather = float.Parse(x[9], CultureInfo.InvariantCulture),
                Temperature = float.Parse(x[10], CultureInfo.InvariantCulture),
                NormalizedTemperature = float.Parse(x[11], CultureInfo.InvariantCulture),
                Humidity = float.Parse(x[12], CultureInfo.InvariantCulture),
                Windspeed = float.Parse(x[13], CultureInfo.InvariantCulture),
                Count = float.Parse(x[16], CultureInfo.InvariantCulture)
            })
            .Take(numberOfRecordsToRead)
            .ToList();
    }
}

## Evaluate

In [None]:
// Create MLContext to be shared across the model creation workflow objects 
// Set a random seed for repeatable/deterministic results across multiple trainings.
var mlContext = new MLContext(seed: 0);

// 1. Common data loading configuration
var trainingDataView = mlContext.Data.LoadFromTextFile<DemandObservation>(path: TrainingDataLocation, hasHeader:true, separatorChar: ',');
var testDataView = mlContext.Data.LoadFromTextFile<DemandObservation>(path: TestDataLocation, hasHeader:true, separatorChar: ',');

// 2. Common data pre-process with pipeline data transformations

// Concatenate all the numeric columns into a single features column
var dataProcessPipeline = mlContext.Transforms.Concatenate("Features",
                                         nameof(DemandObservation.Season), nameof(DemandObservation.Year), nameof(DemandObservation.Month),
                                         nameof(DemandObservation.Hour), nameof(DemandObservation.Holiday), nameof(DemandObservation.Weekday),
                                         nameof(DemandObservation.WorkingDay), nameof(DemandObservation.Weather), nameof(DemandObservation.Temperature),
                                         nameof(DemandObservation.NormalizedTemperature), nameof(DemandObservation.Humidity), nameof(DemandObservation.Windspeed))
                             .AppendCacheCheckpoint(mlContext);
                            // Use in-memory cache for small/medium datasets to lower training time. 
                            // Do NOT use it (remove .AppendCacheCheckpoint()) when handling very large datasets.

// (Optional) Peek data in training DataView after applying the ProcessPipeline's transformations  
ConsoleHelper.PeekDataViewInConsole(mlContext, trainingDataView, dataProcessPipeline, 10);
ConsoleHelper.PeekVectorColumnDataInConsole(mlContext, "Features", trainingDataView, dataProcessPipeline, 10);

// Definition of regression trainers/algorithms to use
//var regressionLearners = new (string name, IEstimator<ITransformer> value)[]
(string name, IEstimator<ITransformer> value)[] regressionLearners =
            {
                ("FastTree", mlContext.Regression.Trainers.FastTree()),
                ("Poisson", mlContext.Regression.Trainers.LbfgsPoissonRegression()),
                ("SDCA", mlContext.Regression.Trainers.Sdca()),
                ("FastTreeTweedie", mlContext.Regression.Trainers.FastTreeTweedie()),
                //Other possible learners that could be included
                //...FastForestRegressor...
                //...GeneralizedAdditiveModelRegressor...
                //...OnlineGradientDescent... (Might need to normalize the features first)
            };

 // 3. Phase for Training, Evaluation and model file persistence
 // Per each regression trainer: Train, Evaluate, and Save a different model
 foreach (var trainer in regressionLearners)
 {
     Console.WriteLine("=============== Training the current model ===============");
     var trainingPipeline = dataProcessPipeline.Append(trainer.value);
     var trainedModel = trainingPipeline.Fit(trainingDataView);

     Console.WriteLine("===== Evaluating Model's accuracy with Test data =====");
     IDataView predictions = trainedModel.Transform(testDataView);
     var metrics = mlContext.Regression.Evaluate(data:predictions, labelColumnName:"Label", scoreColumnName: "Score");               
     ConsoleHelper.PrintRegressionMetrics(trainer.value.ToString(), metrics);

     //Save the model file that can be used by any application
     string modelRelativeLocation = $"{ModelsLocation}/{trainer.name}Model.zip";
    // string modelPath = GetAbsolutePath(modelRelativeLocation);
    // Console.WriteLine(ModelsLocation);
     mlContext.Model.Save(trainedModel, trainingDataView.Schema, ModelsLocation);
     Console.WriteLine("The model is saved to {0}", ModelsLocation);
 }

 // 4. Try/test Predictions with the created models
 // The following test predictions could be implemented/deployed in a different application (production apps)
 // that's why it is seggregated from the previous loop
 // For each trained model, test 10 predictions           
 foreach (var learner in regressionLearners)
 {
     //Load current model from .ZIP file
     string modelRelativeLocation = $"{ModelsLocation}/{learner.name}Model.zip";
   //  string modelPath = GetAbsolutePath(ModelsLocation);
     ITransformer trainedModel = mlContext.Model.Load(ModelsLocation, out var modelInputSchema);

     // Create prediction engine related to the loaded trained model
     var predEngine = mlContext.Model.CreatePredictionEngine<DemandObservation, DemandPrediction>(trainedModel);

     Console.WriteLine($"================== Visualize/test 10 predictions for model {learner.name}Model.zip ==================");
     //Visualize 10 tests comparing prediction with actual/observed values from the test dataset
     ModelScoringTester.VisualizeSomePredictions(mlContext ,learner.name, TestDataLocation, predEngine, 10);
 }