# ML.Net - Samples - Heart disease prediction

## Heart disease prediction

| ML.NET version | API type          | Status                        | App Type    | Data type | Scenario            | ML Task                   | Algorithms                  |
|----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
| v1.5.2           | Dynamic API | Up-to-date | Jupyter Notebook | .txt files | Heart disease classification | Binary classification | FastTree |

In this introductory sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to predict type of heart disease. In the world of machine learning, this type of prediction is known as **binary classification**.

## Dataset

The dataset used is this: [UCI Heart disease] (https://archive.ics.uci.edu/ml/datasets/heart+Disease)
This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. 

Citation for this dataset is available at [DataSets-Citation](./HeartDiseaseDetection/Data/DATASETS-CITATION.txt)

## Problem

This problem is centered around predicting the presence of heart disease based on 14 attributes. To solve this problem, we will build an ML model that takes as inputs 14 columns, 13 are feature columns (also called independent variables) plus the 'Label' column which is what you want to predict and in this case is named 'num': 

Attribute Information:

* (age) - Age
* (sex) -  (1 = male; 0 = female) 
* (cp)  chest pain type  -- Value 1: typical angina  -- Value 2: atypical angina  -- Value 3: non-anginal pain -- Value 4: asymptomatic 
* (trestbps) - resting blood pressure (in mm Hg on admission to the hospital) 
* (chol) - serum cholestoral in mg/dl 
* (fbs)  -  (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) 
* (restecg) - esting electrocardiographic results -- Value 0: normal -- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) -- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria 
* (thalach) - maximum heart rate achieved 
* (exang) - exercise induced angina (1 = yes; 0 = no) 
* (oldpeak) - ST depression induced by exercise relative to rest 
* (slope) - the slope of the peak exercise ST segment -- Value 1: upsloping -- Value 2: flat -- Value 3: downsloping  
* (ca) - number of major vessels (0-3) colored by flourosopy
* (thal) - 3 = normal; 6 = fixed defect; 7 = reversible defect 
* (num) - (the predicted attribute) diagnosis of heart disease (angiographic disease status) -- Value 0: < 50% diameter narrowing -- Value 1: > 50% diameter narrowing

and predicts the presence of heart disease in the patient with integer values from 0 to 4:
Experiments with the Cleveland database (dataset used for this example) have concentrated on simply attempting to distinguish presence (value 1) from absence (value 0). 

## ML task - Binary classification

The generalized problem of **binary classification** is to classify items into items into one of the two classes (classifying items into more than two classes is called **multiclass classification**).

* predict if an insurance claim is valid or not.
* predict if a plane will be delayed or will arrive on time.
* predict if a face ID (photo) belongs to the owner of a device.

The common feature for all those examples is that the parameter we want to predict can take only one of two values. In other words, this value is represented by `boolean` type.

## Solution
To solve this problem, first we will build an ML model. Then we will train the model on existing data, evaluate how good it is, and lastly we'll consume the model to predict if heart disease is present for a list of heart data set.

![Build -> Train -> Evaluate -> Consume](../shared_content/modelpipeline.png)

In [1]:
// ML.NET Nuget packages 
#r "nuget:Microsoft.ML"     

// ML.NET FastTree Nuget packages 
#r "nuget:Microsoft.ML.FastTree"

Installed package Microsoft.ML.FastTree version 1.5.2

Installed package Microsoft.ML version 1.5.2

## Using C# Class

In [2]:
using System;
using System.IO;
using System.IO.Compression;
using Microsoft.ML.Trainers.FastTree;
using System.Linq;
using System.Net;
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Collections.Generic;
using static Microsoft.ML.TrainCatalogBase;
using static Microsoft.ML.DataOperationsCatalog;
using System.Diagnostics;

## Declare data-classes for input data and predictions

In [3]:
public class HeartData
{
    [LoadColumn(0)]
    public float Age { get; set; }
    [LoadColumn(1)]
    public float Sex { get; set; }
    [LoadColumn(2)]
    public float Cp { get; set; }
    [LoadColumn(3)]
    public float TrestBps { get; set; }
    [LoadColumn(4)]
    public float Chol { get; set; }
    [LoadColumn(5)]
    public float Fbs { get; set; }
    [LoadColumn(6)]
    public float RestEcg { get; set; }
    [LoadColumn(7)]
    public float Thalac { get; set; }
    [LoadColumn(8)]
    public float Exang { get; set; }
    [LoadColumn(9)]
    public float OldPeak { get; set; }
    [LoadColumn(10)]
    public float Slope { get; set; }
    [LoadColumn(11)]
    public float Ca { get; set; }
    [LoadColumn(12)]
    public float Thal { get; set; }
    [LoadColumn(13)]
    public bool Label { get; set; }
}

public class HeartPrediction
{
    // ColumnName attribute is used to change the column name from
    // its default value, which is the name of the field.
    [ColumnName("PredictedLabel")]
    public bool Prediction;

    // No need to specify ColumnName attribute, because the field
    // name "Probability" is the column name we want.
    public float Probability;

    public float Score;
}







In [4]:
public class HeartSampleData
{
    internal static readonly List<HeartData> heartDataList = new List<HeartData>()
    {
        new HeartData()
        { 
            Age = 36.0f,
            Sex = 1.0f,
            Cp = 4.0f,
            TrestBps = 145.0f,
            Chol = 210.0f,
            Fbs = 0.0f,
            RestEcg = 2.0f,
            Thalac = 148.0f,
            Exang = 1.0f,
            OldPeak = 1.9f,
            Slope = 2.0f,
            Ca = 1.0f,
            Thal = 7.0f,
        },
        new HeartData()
        {
            Age = 95.0f,
            Sex = 1.0f,
            Cp = 4.0f,
            TrestBps = 145.0f,
            Chol = 210.0f,
            Fbs = 0.0f,
            RestEcg = 2.0f,
            Thalac = 148.0f,
            Exang = 1.0f,
            OldPeak = 1.9f,
            Slope = 2.0f,
            Ca = 1.0f,
            Thal = 7.0f,
        },
        new HeartData()
        {
            Age = 46.0f,
            Sex = 1.0f,
            Cp = 4.0f,
            TrestBps = 135.0f,
            Chol = 192.0f,
            Fbs = 0.0f,
            RestEcg = 0.0f,
            Thalac = 148.0f,
            Exang = 0.0f,
            OldPeak = 0.3f,
            Slope = 2.0f,
            Ca = 0.0f,
            Thal = 6.0f,
        },
        new HeartData()
        {
            Age = 45.0f,
            Sex = 0.0f,
            Cp = 1.0f,
            TrestBps = 140.0f,
            Chol = 221.0f,
            Fbs = 1.0f,
            RestEcg = 1.0f,
            Thalac = 150.0f,
            Exang = 0.0f,
            OldPeak = 2.3f,
            Slope = 3.0f,
            Ca = 0.0f,
            Thal = 6.0f,
        },
        new HeartData()
        {
            Age = 88.0f,
            Sex = 0.0f,
            Cp = 1.0f,
            TrestBps = 140.0f,
            Chol = 221.0f,
            Fbs = 1.0f,
            RestEcg = 1.0f,
            Thalac = 150.0f,
            Exang = 0.0f,
            OldPeak = 2.3f,
            Slope = 3.0f,
            Ca = 0.0f,
            Thal = 6.0f,
        },
    };
}

### Constants

In [5]:
private static string TrainDataPath = @"./Datasets/HeartDiseaseDetection/HeartTraining.csv";
private static string TestDataPath = @"./Datasets/HeartDiseaseDetection//HeartTest.csv";
private static string ModelPath = @"./Datasets/HeartDiseaseDetection/MLModels";

### Methods

In [6]:
private static void BuildTrainEvaluateAndSaveModel(MLContext mlContext)
{
    // STEP 1: Common data loading configuration
    var trainingDataView = mlContext.Data.LoadFromTextFile<HeartData>(TrainDataPath, hasHeader: true, separatorChar: ';');
    var testDataView = mlContext.Data.LoadFromTextFile<HeartData>(TestDataPath, hasHeader: true, separatorChar: ';');

    // STEP 2: Concatenate the features and set the training algorithm
    var pipeline = mlContext.Transforms.Concatenate("Features", "Age", "Sex", "Cp", "TrestBps", "Chol", "Fbs", "RestEcg", "Thalac", "Exang", "OldPeak", "Slope", "Ca", "Thal")
        .Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumnName: "Label", featureColumnName: "Features"));

    Console.WriteLine("=============== Training the model ===============");
    ITransformer trainedModel = pipeline.Fit(trainingDataView);
    Console.WriteLine("");
    Console.WriteLine("");
    Console.WriteLine("=============== Finish the train model. Push Enter ===============");
    Console.WriteLine("");
    Console.WriteLine("");

    Console.WriteLine("===== Evaluating Model's accuracy with Test data =====");
    var predictions = trainedModel.Transform(testDataView);

    var metrics = mlContext.BinaryClassification.Evaluate(data: predictions, labelColumnName: "Label", scoreColumnName: "Score");
    Console.WriteLine("");
    Console.WriteLine("");
    Console.WriteLine($"************************************************************");
    Console.WriteLine($"*       Metrics for {trainedModel.ToString()} binary classification model      ");
    Console.WriteLine($"*-----------------------------------------------------------");
    Console.WriteLine($"*       Accuracy: {metrics.Accuracy:P2}");
    Console.WriteLine($"*       Area Under Roc Curve:      {metrics.AreaUnderRocCurve:P2}");
    Console.WriteLine($"*       Area Under PrecisionRecall Curve:  {metrics.AreaUnderPrecisionRecallCurve:P2}");
    Console.WriteLine($"*       F1Score:  {metrics.F1Score:P2}");
    Console.WriteLine($"*       LogLoss:  {metrics.LogLoss:#.##}");
    Console.WriteLine($"*       LogLossReduction:  {metrics.LogLossReduction:#.##}");
    Console.WriteLine($"*       PositivePrecision:  {metrics.PositivePrecision:#.##}");
    Console.WriteLine($"*       PositiveRecall:  {metrics.PositiveRecall:#.##}");
    Console.WriteLine($"*       NegativePrecision:  {metrics.NegativePrecision:#.##}");
    Console.WriteLine($"*       NegativeRecall:  {metrics.NegativeRecall:P2}");
    Console.WriteLine($"************************************************************");
    Console.WriteLine("");
    Console.WriteLine("");

    Console.WriteLine("=============== Saving the model to a file ===============");
   // Console.WriteLine(ModelPath);
    mlContext.Model.Save(trainedModel, trainingDataView.Schema, ModelPath);
   
    Console.WriteLine("");
    Console.WriteLine("");
    Console.WriteLine("=============== Model Saved ============= ");
}

private static void TestPrediction(MLContext mlContext)
{
    ITransformer trainedModel = mlContext.Model.Load(ModelPath, out var modelInputSchema);

    // Create prediction engine related to the loaded trained model
    var predictionEngine = mlContext.Model.CreatePredictionEngine<HeartData, HeartPrediction>(trainedModel);                   

    foreach (var heartData in HeartSampleData.heartDataList)
    {
        var prediction = predictionEngine.Predict(heartData);

        Console.WriteLine($"=============== Single Prediction  ===============");
        Console.WriteLine($"Age: {heartData.Age} ");
        Console.WriteLine($"Sex: {heartData.Sex} ");
        Console.WriteLine($"Cp: {heartData.Cp} ");
        Console.WriteLine($"TrestBps: {heartData.TrestBps} ");
        Console.WriteLine($"Chol: {heartData.Chol} ");
        Console.WriteLine($"Fbs: {heartData.Fbs} ");
        Console.WriteLine($"RestEcg: {heartData.RestEcg} ");
        Console.WriteLine($"Thalac: {heartData.Thalac} ");
        Console.WriteLine($"Exang: {heartData.Exang} ");
        Console.WriteLine($"OldPeak: {heartData.OldPeak} ");
        Console.WriteLine($"Slope: {heartData.Slope} ");
        Console.WriteLine($"Ca: {heartData.Ca} ");
        Console.WriteLine($"Thal: {heartData.Thal} ");
        Console.WriteLine($"Prediction Value: {prediction.Prediction} ");
        Console.WriteLine($"Prediction: {(prediction.Prediction ? "A disease could be present" : "Not present disease" )} ");
        Console.WriteLine($"Probability: {prediction.Probability} ");
        Console.WriteLine($"==================================================");
        Console.WriteLine("");
        Console.WriteLine("");
    }
}





## Evaluate

In [7]:
var mlContext = new MLContext();

BuildTrainEvaluateAndSaveModel(mlContext);

TestPrediction(mlContext);

Console.WriteLine("=============== End of process, hit any key to finish ===============");





===== Evaluating Model's accuracy with Test data =====


************************************************************
*       Metrics for Microsoft.ML.Data.TransformerChain`1[Microsoft.ML.Data.BinaryPredictionTransformer`1[Microsoft.ML.Calibrators.CalibratedModelParametersBase`2[Microsoft.ML.Trainers.FastTree.FastTreeBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator]]] binary classification model      
*-----------------------------------------------------------
*       Accuracy: 94.74 %
*       Area Under Roc Curve:      96.43 %
*       Area Under PrecisionRecall Curve:  95.48 %
*       F1Score:  92.31 %
*       LogLoss:  .36
*       LogLossReduction:  .63
*       PositivePrecision:  1
*       PositiveRecall:  .86
*       NegativePrecision:  .92
*       NegativeRecall:  100.00 %
************************************************************




Age: 36 
Sex: 1 
Cp: 4 
TrestBps: 145 
Chol: 210 
Fbs: 0 
RestEcg: 2 
Thalac: 148 
Exang: 1 
OldPeak: 1.9 
Slope: 2 
Ca: 1 
Tha