# COVID-19 Data Analysis using .Net DataFrame API

## COVID-19
- As per [Wiki](https://en.wikipedia.org/wiki/Coronavirus_disease_2019) **Coronavirus disease 2019** (**COVID-19**) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in 2019 in Wuhan, the capital of China's Hubei province, and has since spread globally, resulting in the ongoing 2019–20 coronavirus pandemic.
- The virus had caused a pandemic across the globe and spreading/affecting most of the nations. 
- The purpose of notebook is to visualize the trends of virus spread in various countries and explore features present in ML.Net such as DataFrame.

### Acknowledgement
- [Johns Hopkins CSSE](https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data) for dataset
- [COVID-19 data visualization](https://www.kaggle.com/akshaysb/covid-19-data-visualization) by Akshay Sb

### Dataset

- [2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE - Daily reports](https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports).

### Introduction 

**DataFrame**: DataFrame is a new type introduced in .Net. It is similar to DataFrame in Python which is used to manipulate data in notebooks. It's a collection of columns containing data similar to a table and very helpful in analyzing tabular data. It works flawlessly without creating types/classes mapped to columns in a table which we used to do with ML.Net. It has support for GroupBy, Sort, Filter which makes analysis very handy. It's a in-memory representation of structured data.

For overview, please refer below links
- [An Introduction to DataFrame](https://devblogs.microsoft.com/dotnet/an-introduction-to-dataframe/)
- [Exploring the C# Dataframe API](https://www.youtube.com/watch?v=FI3VxXClJ7Y) by Jon Wood

### Summary

Below is the summary of steps we'll be performing

1. Define application level items
    - Nuget packages
    - Namespaces
    - Constants
     
2. Utility Functions
    - Formatters    

3. Load Dataset
    - Download Dataset from [Johns Hopkins CSSE](https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data)
    - Load dataset in DataFrame
    
4. Analyse Data
    - Date Range
    - Display Datset - display(dataframe)
    - Display Top 5 Rows - dataframe.Head(5)
    - Display Random 6 Rows - dataframe.Sample(6)    
    - Display Dataset Statistics - dataframe.Description()
    - Display Dataset type information - dataframe.Info()

5. Data Cleaning
    - Remove Invalid cases

6. Data Visualization
    - Global
        - Confirmed Vs Deaths Vs Recovered
        - Top 5 Countries with Confirmed cases
        - Top 5 Countries with Death cases
        - Top 5 Countries with Recovered cases
        - Number of Confirmed cases over Time
        - Number of Deaths over Time
        - Number of Recovered cases over Time
    - India
        - Confirmed Vs Deaths Vs Recovered
        
**Note** : Graphs/Plots may not render in GitHub due to secutiry reasons, however if you run this notebook locally/binder they will render.

### 1. Define Application wide Items

#### Nuget Packages


In [1]:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML"
#r "nuget:Microsoft.Data.Analysis"

// Install XPlot package
#r "nuget:XPlot.Plotly"
    
// CSV Helper Package for reading CSV
#r "nuget:CsvHelper"

#### Namespaces

In [2]:
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.Data.Analysis;
using Microsoft.AspNetCore.Html;
using System.IO;
using System.Net.Http;
using System.Globalization;
using CsvHelper;
using CsvHelper.Configuration;
using XPlot.Plotly;

#### Constants

In [3]:
// Column Names
const string FIPS = "FIPS";
const string ADMIN = "Admin2";
const string STATE = "Province_State";
const string COUNTRY = "Country_Region";
const string LAST_UPDATE = "Last_Update";
const string LATITUDE = "Lat";
const string LONGITUDE = "Long_";
const string CONFIRMED = "Confirmed";
const string DEATHS = "Deaths";
const string RECOVERED = "Recovered";
const string ACTIVE = "Active";
const string COMBINED_KEY = "Combined_Key";

// File
const string DATASET_FILE = "04-01-2020";
const string FILE_EXTENSION = ".csv";
const string NEW_FILE_SUFFIX = "_new";
const char SEPARATOR = ',';
const char SEPARATOR_REPLACEMENT = '_';
const string DATASET_GITHUB_DIRECTORY = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/";

// DataFrame/Table
const int TOP_COUNT = 5;
const int DEFAULT_ROW_COUNT = 10;
const string VALUES = "Values";
const string INDIA = "India";

### 2. Utility Functions

#### Formatters

By default the output of DataFrame is not proper and in order to display it as a table, we need to have a custom formatter implemented as shown in next cell. 

In [4]:
Formatter<DataFrame>.Register((df, writer) =>
{
    var headers = new List<IHtmlContent>();
    headers.Add(th(i("index")));
    headers.AddRange(df.Columns.Select(c => (IHtmlContent) th(c.Name)));
    var rows = new List<List<IHtmlContent>>();
    var take = DEFAULT_ROW_COUNT;
    for (var i = 0; i < Math.Min(take, df.Rows.Count); i++)
    {
        var cells = new List<IHtmlContent>();
        cells.Add(td(i));
        foreach (var obj in df.Rows[i])
        {
            cells.Add(td(obj));
        }
        rows.Add(cells);
    }

    var t = table(
        thead(
            headers),
        tbody(
            rows.Select(
                r => tr(r))));

    writer.Write(t);
}, "text/html");

#### Copy dataset csv and replace Separator in cells

In [5]:
private void CreateCsvAndReplaceSeparatorInCells(string inputFile, string outputFile, char separator, char separatorReplacement)
{
    var culture = CultureInfo.InvariantCulture;
    using var reader = new StreamReader(inputFile);
    using var csvIn = new CsvReader(reader, new CsvConfiguration(culture));
    using var recordsIn = new CsvDataReader(csvIn);
    using var writer = new StreamWriter(outputFile);
    using var outCsv = new CsvWriter(writer, culture);

    // Write Header
    csvIn.ReadHeader();
    var headers = csvIn.Context.HeaderRecord;
    foreach (var header in headers)
    {
        outCsv.WriteField(header.Replace(separator, separatorReplacement));
    }
    outCsv.NextRecord();

    // Write rows
    while (recordsIn.Read())
    {
        var columns = recordsIn.FieldCount;
        for (var index = 0; index < columns; index++)
        {
            var cellValue = recordsIn.GetString(index);
            outCsv.WriteField(cellValue.Replace(separator, separatorReplacement));
        }
        outCsv.NextRecord();
    }
}

### 3. Load Dataset

#### Download Dataset from [Johns Hopkins CSSE](https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data)

We'll be using COVID-19 dataset from [Johns Hopkins CSSE](https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data). The **csse_covid_19_data directory** has .csv file for each day and we'll be performing analysis on latest file present. Latest file present at the time of last modification of this notebook was **04-01-2020.csv**. If you wish to use a different file, update **DATASET_FILE** constant in Constants cell above.

We'll download file to current directory.

In [6]:
var originalFileName = $"{DATASET_FILE}{FILE_EXTENSION}";
if (!File.Exists(originalFileName))
{
    var remoteFilePath = $"{DATASET_GITHUB_DIRECTORY}/{originalFileName}";
    display(remoteFilePath);
    var contents = new HttpClient()
        .GetStringAsync(remoteFilePath).Result;
        
    File.WriteAllText(originalFileName, contents);
}

https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports//04-01-2020.csv

#### Load dataset in DataFrame

**Issue**: We can load csv using LoadCsv(..) method of DataFrame. However, there is an [issue](https://github.com/dotnet/corefxlab/issues/2787) of not allowing quotes and separator(comma in this case) in a cell value. 
The dataset, we are using has both of them and LoadCsv fails for it. 
As a workaround, we'll use CSVHelper to read the csv file and replace command separator with underscore, save the file and use it to load in DataFrame LoadCsv(..) method.

In [7]:
// Load and create a copy of dataset file
var newFileName = $"{DATASET_FILE}{NEW_FILE_SUFFIX}{FILE_EXTENSION}";
display(newFileName);
CreateCsvAndReplaceSeparatorInCells(originalFileName, newFileName, SEPARATOR, SEPARATOR_REPLACEMENT);

04-01-2020_new.csv

In [8]:
var covid19Dataframe = DataFrame.LoadCsv(newFileName);

### 4. Data Analysis

Data analysis is a critical activity in the field of Data science. It provides ways to uncover the hidden attributes of a dataset which can't be analyzed or predicted by simply looking at the data source. DataFrame makes the analysis simple by providing great API's such as GroupBy, Sort, Filter etc. Jupyter notebook is great tool for this kind of activity which maintains values of variables executed in a cell and providing it to other cells.

##### Finding the range in the records in Dataset

In DataFrame, Columns property allows access to values within a column by specifying column name. we'll use Last_Update column to get the date and sort it to get the start and end date

In [9]:
var dateRangeDataFrame = covid19Dataframe.Columns[LAST_UPDATE].ValueCounts();
var dataRange = dateRangeDataFrame.Columns[VALUES].Sort();
var lastElementIndex = dataRange.Length - 1;

var startDate = DateTime.Parse(dataRange[0].ToString()).ToShortDateString();
var endDate  = DateTime.Parse(dataRange[lastElementIndex].ToString()).ToShortDateString(); // Last Element

display(h4($"The data is between {startDate} and {endDate}"));

##### Display 10 records

Here we have 12 columns which includes Country, State, Confirmed, Deaths, Recovered and Active cases

In [10]:
display(covid19Dataframe)

index,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,45001,Abbeville,South Carolina,US,2020-04-01 21:58:49,34.223335,-82.46171,4,0,0,0,Abbeville_ South Carolina_ US
1,22001,Acadia,Louisiana,US,2020-04-01 21:58:49,30.295065,-92.4142,47,1,0,0,Acadia_ Louisiana_ US
2,51001,Accomack,Virginia,US,2020-04-01 21:58:49,37.76707,-75.63235,7,0,0,0,Accomack_ Virginia_ US
3,16001,Ada,Idaho,US,2020-04-01 21:58:49,43.452656,-116.241554,195,3,0,0,Ada_ Idaho_ US
4,19001,Adair,Iowa,US,2020-04-01 21:58:49,41.330757,-94.47106,1,0,0,0,Adair_ Iowa_ US
5,29001,Adair,Missouri,US,2020-04-01 21:58:49,40.190586,-92.600784,3,0,0,0,Adair_ Missouri_ US
6,40001,Adair,Oklahoma,US,2020-04-01 21:58:49,35.88494,-94.65859,8,0,0,0,Adair_ Oklahoma_ US
7,8001,Adams,Colorado,US,2020-04-01 21:58:49,39.87432,-104.33626,181,2,0,0,Adams_ Colorado_ US
8,17001,Adams,Illinois,US,2020-04-01 21:58:49,39.988155,-91.18787,2,0,0,0,Adams_ Illinois_ US
9,18001,Adams,Indiana,US,2020-04-01 21:58:49,40.745766,-84.936714,1,0,0,0,Adams_ Indiana_ US


##### Display Top 5 records

In [11]:
covid19Dataframe.Head(5)

index,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,45001,Abbeville,South Carolina,US,2020-04-01 21:58:49,34.223335,-82.46171,4,0,0,0,Abbeville_ South Carolina_ US
1,22001,Acadia,Louisiana,US,2020-04-01 21:58:49,30.295065,-92.4142,47,1,0,0,Acadia_ Louisiana_ US
2,51001,Accomack,Virginia,US,2020-04-01 21:58:49,37.76707,-75.63235,7,0,0,0,Accomack_ Virginia_ US
3,16001,Ada,Idaho,US,2020-04-01 21:58:49,43.452656,-116.241554,195,3,0,0,Ada_ Idaho_ US
4,19001,Adair,Iowa,US,2020-04-01 21:58:49,41.330757,-94.47106,1,0,0,0,Adair_ Iowa_ US


##### Display Random 6 records

In [12]:
covid19Dataframe.Sample(6)

index,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,<null>,,,Bhutan,2020-04-01 21:58:34,27.5142,90.4336,4,0,0,4,Bhutan
1,55003,Ashland,Wisconsin,US,2020-04-01 21:58:49,46.31957,-90.67837,1,0,0,0,Ashland_ Wisconsin_ US
2,34007,Camden,New Jersey,US,2020-04-01 21:58:49,39.803436,-74.96389,289,3,0,0,Camden_ New Jersey_ US
3,46057,Hamlin,South Dakota,US,2020-04-01 21:58:49,44.67386,-97.18829,1,0,0,0,Hamlin_ South Dakota_ US
4,<null>,,Puerto Rico,US,2020-04-01 21:58:49,18.2208,-66.5901,286,11,0,0,Puerto Rico_ US
5,45053,Jasper,South Carolina,US,2020-04-01 21:58:49,32.43172,-81.02487,4,0,0,0,Jasper_ South Carolina_ US


##### Display Dataset Statistics such as Total, Max, Min, Mean of items in a column

In [13]:
covid19Dataframe.Description()

index,Description,FIPS,Lat,Long_,Confirmed,Deaths,Recovered,Active
0,Length (excluding null values),2171.0,2484.0,2484.0,2485.0,2485.0,2485.0,2485.0
1,Max,99999.0,71.7069,178.065,110574.0,13155.0,63326.0,80572.0
2,Min,0.0,-42.8821,-159.59668,0.0,0.0,0.0,-6.0
3,Mean,26224.918,35.639614,-77.22377,375.29376,18.83662,77.73722,194.90987


##### Display Dataset type information for each column

In [14]:
covid19Dataframe.Info()

index,Info,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,DataType,System.Single,System.String,System.String,System.String,System.String,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.String
1,Length (excluding null values),2171,2485,2485,2485,2485,2484,2484,2485,2485,2485,2485,2485


### 5. Data Cleaning

Data Cleaning is another important activity in which remove the irrelevant data present in our dataset. This irrelevant data can be due missing values, invalid values or an outlier. The columns with less significance is removed for better analysis and prediction of our data. In order to keep this notebook simple, we'll use one of the techniques to remove invalid data. In this we are going to remove invalid Active cases such as the ones having negative values. We could use DataFrame API's such as DropNull to remove rows with null values, FillNull to fill null values with other such as mean, average. We can transform DataFrame and remove some of the unnecessary columns.

#### Remove invalid Active cases

In order to check for invalid active cases, we'll use DataFrame Filter to retrive active column values whose value is less than 0.0

In [15]:
PrimitiveDataFrameColumn<bool> invalidActiveFilter = covid19Dataframe.Columns[ACTIVE].ElementwiseLessThan(0.0);
var invalidActiveDataFrame = covid19Dataframe.Filter(invalidActiveFilter);
display(invalidActiveDataFrame)

index,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,<null>,,Hainan,China,2020-03-24 04:29:15,19.1959,109.7453,168,6,168,-6,Hainan_ China


**From above table, we could see some there were 168 confirmed and recovered cases with 6 deaths which seems invalid.**
In order to remove it, we'll apply a filter to DataFrame to get active values greater than or equal to 0.0. Let's remove it in next step

In [16]:
PrimitiveDataFrameColumn<bool> activeFilter = covid19Dataframe.Columns[ACTIVE].ElementwiseGreaterThanOrEqual(0.0);
covid19Dataframe = covid19Dataframe.Filter(activeFilter);
display(covid19Dataframe.Description());

index,Description,FIPS,Lat,Long_,Confirmed,Deaths,Recovered,Active
0,Length (excluding null values),2171.0,2483.0,2483.0,2484.0,2484.0,2484.0,2484.0
1,Max,99999.0,71.7069,178.065,110574.0,13155.0,63326.0,80572.0
2,Min,0.0,-42.8821,-159.59668,0.0,0.0,0.0,0.0
3,Mean,26235.475,35.646233,-77.29904,375.37723,18.841787,77.70088,194.99074


**We have removed cases with negative active value, same can be seem in above table with minimum value for Active cases as zero**

### 6. Visualization

Visualization of data helps business owners make better decisions. The DataFrame maintains data in a tabular format. In order to prepare data for different plots, we have used DataFrame features such as Sum, GroupBy, OrderBy, OrderByDescending etc. For visualization, we have used open source library called as [XPlot.Plotly](https://fslab.org/XPlot/plotly.html). Different plots have been used such as Bar, Pie and Line/Scatter Graph. 

#### Global

##### Collect Data

In [17]:
var confirmed = covid19Dataframe.Columns[CONFIRMED];
var deaths = covid19Dataframe.Columns[DEATHS];
var recovered = covid19Dataframe.Columns[RECOVERED];

var totalConfirmed = Convert.ToDouble(confirmed.Sum());
var totalDeaths = Convert.ToDouble(deaths.Sum());
var totaRecovered = Convert.ToDouble(recovered.Sum());

##### Confirmed Vs Deaths Vs Receovered cases

In [18]:
display(Chart.Plot(
    new Graph.Pie()
    {
        values = new double[]{totalConfirmed, totalDeaths, totaRecovered},
        labels = new string[] {CONFIRMED, DEATHS, RECOVERED}
    }
));

##### Top 5 Countries with Confirmed cases

In [19]:
// Get the data
var countryConfirmedGroup = covid19Dataframe.GroupBy(COUNTRY).Sum(CONFIRMED).OrderByDescending(CONFIRMED);
var topCountriesColumn = countryConfirmedGroup.Columns[COUNTRY];
var topConfirmedCasesByCountry = countryConfirmedGroup.Columns[CONFIRMED];

HashSet<string> countries = new HashSet<string>(TOP_COUNT);
HashSet<long> confirmedCases = new HashSet<long>(TOP_COUNT);
for(int index = 0; index < TOP_COUNT; index++)
{
    countries.Add(topCountriesColumn[index].ToString());
    confirmedCases.Add(Convert.ToInt64(topConfirmedCasesByCountry[index]));
}

In [20]:
var title = "Top 5 Countries : Confirmed";
var series1 = new Graph.Bar{
        x = countries.ToArray(),
        y = confirmedCases.ToArray()
    };

var chart = Chart.Plot(new []{series1});
chart.WithTitle(title);
display(chart);

##### Top 5 Countries with Deaths

In [21]:
// Get the data
var countryDeathsGroup = covid19Dataframe.GroupBy(COUNTRY).Sum(DEATHS).OrderByDescending(DEATHS);
var topCountriesColumn = countryDeathsGroup.Columns[COUNTRY];
var topDeathCasesByCountry = countryDeathsGroup.Columns[DEATHS];

HashSet<string> countries = new HashSet<string>(TOP_COUNT);
HashSet<long> deathCases = new HashSet<long>(TOP_COUNT);
for(int index = 0; index < TOP_COUNT; index++)
{
    countries.Add(topCountriesColumn[index].ToString());
    deathCases.Add(Convert.ToInt64(topDeathCasesByCountry[index]));
}

In [22]:
var title = "Top 5 Countries : Deaths";
var series1 = new Graph.Bar{
        x = countries.ToArray(),
        y = deathCases.ToArray()
    };

var chart = Chart.Plot(new []{series1});
chart.WithTitle(title);
display(chart);

##### Top 5 Countries with Recovered cases

In [23]:
// Get the data
var countryRecoveredGroup = covid19Dataframe.GroupBy(COUNTRY).Sum(RECOVERED).OrderByDescending(RECOVERED);
var topCountriesColumn = countryRecoveredGroup.Columns[COUNTRY];
var topRecoveredCasesByCountry = countryRecoveredGroup.Columns[RECOVERED];

HashSet<string> countries = new HashSet<string>(TOP_COUNT);
HashSet<long> recoveredCases = new HashSet<long>(TOP_COUNT);
for(int index = 0; index < TOP_COUNT; index++)
{
    countries.Add(topCountriesColumn[index].ToString());
    recoveredCases.Add(Convert.ToInt64(topRecoveredCasesByCountry[index]));
}

In [24]:
var title = "Top 5 Countries : Recovered";
var series1 = new Graph.Bar{
        x = countries.ToArray(),
        y = recoveredCases.ToArray()
    };

var chart = Chart.Plot(new []{series1});
chart.WithTitle(title);
display(chart);

##### Number of Confirmed cases over Time

In [25]:
var confirmedOverTimeGroup = covid19Dataframe.GroupBy(LAST_UPDATE).Sum(CONFIRMED).OrderBy(LAST_UPDATE);
var confirmedColumn = confirmedOverTimeGroup.Columns[CONFIRMED];
var timeSeriesColumn = confirmedOverTimeGroup.Columns[LAST_UPDATE];

var count = confirmedOverTimeGroup.Rows.Count;

List<string> timeSeriesConfirmed = new List<string>();
List<long> confirmedSeries = new List<long>();
for(int index = 0; index < count; index++)
{
    var time = timeSeriesColumn[index].ToString();
    var confirmedCount = Convert.ToInt64(confirmedColumn[index]);

    // display($"Index: {index}, Time: {time}, Confirmed: {confirmedCount}");

    timeSeriesConfirmed.Add(time);
    confirmedSeries.Add(confirmedCount);
}

In [26]:
var title = "Number of Confirmed Cases over Time";
var confirmedTimeGraph = new Graph.Scattergl()
    {
        x = timeSeriesConfirmed.ToArray(),
        y = confirmedSeries.ToArray(),
        mode = "lines+markers"
    };

var chart = Chart.Plot(confirmedTimeGraph);
chart.WithTitle(title);
display(chart);

###### Looking at the plot above, it seems confirmed cases have shot up in the last week of March.

##### Number of Deaths over Time

In [27]:
var deathsOverTimeGroup = covid19Dataframe.GroupBy(LAST_UPDATE).Sum(DEATHS).OrderBy(LAST_UPDATE);
var deathsColumn = deathsOverTimeGroup.Columns[DEATHS];
var timeSeriesColumn = deathsOverTimeGroup.Columns[LAST_UPDATE];

var count = deathsOverTimeGroup.Rows.Count;

List<string> timeSeries = new List<string>();
List<long> deathSeries = new List<long>();
for(int index = 0; index < count; index++)
{
    var time = timeSeriesColumn[index].ToString();
    var death = Convert.ToInt64(deathsColumn[index]);

    // display($"Index: {index}, Time: {time}, Deaths: {death}");

    timeSeries.Add(timeSeriesColumn[index].ToString());
    deathSeries.Add(Convert.ToInt64(deathsColumn[index]));
}

In [28]:

var title = "Number of Deaths over Time";
var deathTimeGraph = new Graph.Scattergl()
    {
        x = timeSeries.ToArray(),
        y = deathSeries.ToArray(),
        mode = "lines+markers"
    };

var chart = Chart.Plot(deathTimeGraph);
chart.WithTitle(title);
display(chart);

##### Number of Recovered cases over Time

In [29]:
var recoveredOverTimeGroup = covid19Dataframe.GroupBy(LAST_UPDATE).Sum(RECOVERED).OrderBy(LAST_UPDATE);
var recoveredColumn = recoveredOverTimeGroup.Columns[RECOVERED];
var timeSeriesColumn = recoveredOverTimeGroup.Columns[LAST_UPDATE];

var count = recoveredOverTimeGroup.Rows.Count;

List<string> timeSeries = new List<string>();
List<long> recoveredSeries = new List<long>();
for(int index = 0; index < count; index++)
{
    var time = timeSeriesColumn[index].ToString();
    var recoveredCount = Convert.ToInt64(recoveredColumn[index]);

    // display($"Index: {index}, Time: {time}, Recovered: {recoveredCount}");

    timeSeries.Add(time);
    recoveredSeries.Add(recoveredCount);
}

In [30]:
var title = "Number of Recovered cases over Time";
var recoveredTimegraph = new Graph.Scattergl()
    {
        x = timeSeries.ToArray(),
        y = recoveredSeries.ToArray(),
        mode = "lines+markers"
    };

var chart = Chart.Plot(recoveredTimegraph);
chart.WithTitle(title);
display(chart);

#### India

##### Collect Data

##### Confirmed Vs Deaths Vs Receovered cases

In [31]:
PrimitiveDataFrameColumn<bool> indiaFilter = covid19Dataframe.Columns[COUNTRY].ElementwiseEquals(INDIA);
var indiaDataFrame = covid19Dataframe.Filter(indiaFilter);
display(indiaDataFrame.Head((int)indiaDataFrame.Rows.Count));
            
var indiaConfirmed = indiaDataFrame.Columns[CONFIRMED];
var indiaDeaths = indiaDataFrame.Columns[DEATHS];
var indiaRecovered = indiaDataFrame.Columns[RECOVERED];

var indiaTotalConfirmed = Convert.ToDouble(indiaConfirmed.Sum());
var indiaTotalDeaths = Convert.ToDouble(indiaDeaths.Sum());
var indiaTotaRecovered = Convert.ToDouble(indiaRecovered.Sum());

index,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,<null>,,,India,2020-04-01 21:58:34,20.593683,78.96288,1998,58,148,1792,India


In [32]:
display(Chart.Plot(
    new Graph.Pie()
    {
        values = new double[]{indiaTotalConfirmed, indiaTotalDeaths, indiaTotaRecovered},
        labels = new string[] {CONFIRMED, DEATHS, RECOVERED}
    }
));

## Conclusion

I hope you have enjoyed reading the notebook, and might have got some idea on the powerful framework ML.Net and various features of DataFrame. ML.Net is a very fast emerging framework for .Net developers which abstracts lot of complexity present in the field of Data science and Machine Learning. The focus of this notebook is data analysis and there is nothing present from a Machine Learning perspective such as making a prediction. I'll plan to write something on applying Machine learning using ML.Net on this dataset where we'll explore on IDataView, Model, training etc.

Feedback/Suggestion are welcome. Please reach out to me through below channels

**Contact**

**Email    :** praveenraghuvanshi@gmail.com  
**LinkedIn :** https://in.linkedin.com/in/praveenraghuvanshi  
**Github   :** https://github.com/praveenraghuvanshi1512  
**Twitter  :** @praveenraghuvan



## References
- [Using ML.NET in Jupyter notebooks](https://devblogs.microsoft.com/cesardelatorre/using-ml-net-in-jupyter-notebooks/)
- [An Introduction to DataFrame](https://devblogs.microsoft.com/dotnet/an-introduction-to-dataframe/)
- [DataFrame - Sample](https://github.com/dotnet/interactive/blob/master/NotebookExamples/csharp/Samples/HousingML.ipynb)
- [Getting started with ML.NET in Jupyter Notebooks](https://xamlbrewer.wordpress.com/2020/02/20/getting-started-with-ml-net-in-jupyter-notebooks/)
- [Tips and tricks for C# Jupyter notebook](https://ewinnington.github.io/posts/jupyter-tips-csharp)
- [Jupyter notebooks with C# and R running](https://github.com/ewinnington/noteb)
- [Data analysis using F# and Jupyter notebook — Samuele Resca](https://medium.com/@samueleresca/data-analysis-using-f-and-jupyter-notebook-samuele-resca-66a229e25306)
- [Exploring the C# Dataframe API](https://www.youtube.com/watch?v=FI3VxXClJ7Y) by Jon Wood

#  ******************** Be Safe **********************