![BAE logo](images/bae_logo.png)

# Hands-On Exercise 14.1: Analyze Data

## Objective

With LINQ to Objects, C# becomes a viable tool for data analytics---even if it can't quite rival Python, R or even (fellow .NET language) F# in this area.

This exercises will demonstrate these capabilities by using C# to analyze a dataset of English premier league markets from the 2020/2021 season.

The data source will be the (flattened) JSON document created in the earlier XML to JSON conversion exercise.

#### Create the new project.

Create a .NET 6.0 console project called `Football`. This will use the new template which starts with an empty file---i.e. without a `Program` class.

This "scripting" approach is actually a pretty good fit for data analysis work, which tends to be more iterative and not as focused on creating maintainable code.

#### Reference namespaces.

In [None]:
using System.IO;
using System.Linq;
using System.Text.Json;

#### Load the dataset.

Create a data model/class (`Match`) that represents a football match. This class was created in the earlier exercise and is defined as follows.

<font color="red">**Classes are defined at the _end_ of `Program.cs`**.</font>

In [None]:
public class Match
{
    public int Id { get; set; }
    public DateTime Kickoff { get; set; }
    public string HomeTeam { get; set; } = string.Empty;
    public string AwayTeam { get; set; } = string.Empty;
    public int HomeTeamGoals { get; set; }
    public int AwayTeamGoals { get; set; }
    public int HalfTimeHomeTeamGoals { get; set; }
    public int HalfTimeAwayTeamGoals { get; set; }
    public decimal HomeWinPrice { get; set; }
    public decimal DrawPrice { get; set; }
    public decimal AwayWinPrice { get; set; }
}

Load (deserialize) the `epl_2020_2021.json` JSON document (in `C:\Course\510D\Data\`) as a list of `Match` objects.

Remember the property names in the JSON document are camel-cased. Consider using a null-forgiving operator to reassure the compiler that we definitely have matches in the document.

#### Answer...

In [None]:
// Change to C:\Course\510D\Data\epl_2020_2021.json for exercise 
string json = File.ReadAllText(@"data/epl_2020_2021.json");

var serializerOptions = new JsonSerializerOptions
{
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
};

var matches = JsonSerializer.Deserialize<IEnumerable<Match>>(json, serializerOptions)!.ToList();

#### Calculate the number of wins by team

Find the five teams that had the most _home_ wins during the season. Order them in descending number of home wins. Store the result in `topHomeTeamWins`.

#### Answer...

In [None]:
var topHomeTeamWins = matches
    .Where(x => x.HomeTeamGoals > x.AwayTeamGoals)
    .GroupBy(x => x.HomeTeam)
    .Select(g => new { Team = g.Key, Wins = g.Count() })
    .OrderByDescending(x => x.Wins)
    .Take(5)
    .ToList();

Display the top five teams by number of home wins.

#### Answer...

In [None]:
foreach (var x in topHomeTeamWins)
{
    Console.WriteLine($"{x.Team} had {x.Wins} home wins");
}

Find the five teams that had the most _away_ wins during the season. Order them in descending number of home wins. Store the result in `topAwayTeamWins`.

#### Answer...

In [None]:
var topAwayTeamWins = matches
    .Where(x => x.HomeTeamGoals < x.AwayTeamGoals)
    .GroupBy(x => x.AwayTeam)
    .Select(g => new { Team = g.Key, Wins = g.Count() })
    .OrderByDescending(x => x.Wins)
    .Take(5)
    .ToList();

Display the top five teams by number of away wins.

#### Answer...

In [None]:
foreach (var x in topAwayTeamWins)
{
    Console.WriteLine($"{x.Team} had {x.Wins} away wins");
}

Which teams are in `both` lists?|

#### Answer...

In [None]:
foreach (string team in topHomeTeamWins.Select(h => h.Team).Intersect(topAwayTeamWins.Select(a => a.Team)))
{
    Console.WriteLine(team);
}

Calculate how many _draws_ there were during the season?

#### Answer...

In [None]:
int numberOfDraws = matches.Count(x => x.HomeTeamGoals == x.AwayTeamGoals);
    
Console.WriteLine($"There were {numberOfDraws} draws");

#### Analyze the betting prices.

Find the _away_ team (and opponent) that the betting markets considered _least_ likely to win the match during the season.

#### Answer...

In [None]:
var leastFavoredAway = matches
    .OrderByDescending(x => x.AwayWinPrice)
    .First();
 
Console.WriteLine($"{leastFavoredAway.AwayTeam} was very unfavored away against {leastFavoredAway.HomeTeam} @ {leastFavoredAway.AwayWinPrice}");

Which match did the betting markets consider most likely to end in a draw during the season?

#### Answer...

In [None]:
var mostLikelyDraw = matches
    .OrderBy(x => x.DrawPrice)
    .First();

Console.WriteLine($"{mostLikelyDraw.HomeTeam} vs {mostLikelyDraw.AwayTeam} was considered a likely draw @ {mostLikelyDraw.DrawPrice}");

Where there any _away_ teams that were considered underdogs by the betting markets, but won the the match?

Order them, if any, by descending order of price.

#### Answer...

In [None]:
var underdogs = matches
    .Where(x => x.AwayWinPrice > x.HomeWinPrice && x.AwayTeamGoals > x.HomeTeamGoals)
    .OrderByDescending(x => x.AwayWinPrice)
    .ToList();

foreach (var underdog in underdogs) 
{
    Console.WriteLine($"{underdog.AwayTeam} unexpectedly beat {underdog.HomeTeam} despite being considered a {underdog.AwayWinPrice} underdog");
}

#### Recreate the final league table (i.e. point tally per team) for the season.

Review the final [2020/2021 English Premier League final table](https://en.wikipedia.org/wiki/2020%E2%80%9321_Premier_League#League_table) on Wikipedia. The **Pts** column is the most important one---it determines the final positions.

Create a list of anonymous objects containing _home_ teams and the points they won in each match. Store the result in `homeTeamPoints`.

#### Answer...

In [None]:
var homeTeamPoints =
    (from x in matches
    let points = x.HomeTeamGoals > x.AwayTeamGoals ? 3 : (x.HomeTeamGoals == x.AwayTeamGoals ? 1 : 0)
    select new { Team = x.HomeTeam, Points = points }).ToList();

Create a list of anonymous objects containing _away_ teams and the points they won in each match. Store the result in `awayTeamPoints`.

#### Answer...

In [None]:
var awayTeamPoints =
    (from x in matches
    let points = x.AwayTeamGoals > x.HomeTeamGoals ? 3 : (x.AwayTeamGoals == x.HomeTeamGoals ? 1 : 0)
    select new { Team = x.AwayTeam, Points = points }).ToList();

Concatenate `homeTeamPoints` and `awayTeamPoints` to create a _combined_ list called `teamPoints`.

#### Answer...

In [None]:
var teamPoints = homeTeamPoints.Concat(awayTeamPoints).ToList();

Group, aggregated and sort `teamPoints` to recreate the final league table. Store the results in `leagueTable`.

#### Answer...

In [None]:
var leagueTable = teamPoints
    .GroupBy(x => x.Team)
    .Select(g => new { Team = g.Key, Points = g.Sum(x => x.Points) })
    .OrderByDescending(x => x.Points)
    .ToList();

Output the contents of `leagueTable`. Compare them with the _actual_ league table? Do they match?

#### Answer...

In [None]:
foreach (var team in leagueTable) {
    Console.WriteLine($"{team.Team}: {team.Points}");
}

Where the final point tally is the same (e.g. Leeds United and Everton) goal difference (**GD** in the Wikipedia table) is used as a tie-breaker. We don't have that data, so the order of tied teams may differ in your results.

Extract and display the teams that qualified for the Champions League (the top four teams in the league automatically qualify).

#### Answer...

In [None]:
foreach (var team in leagueTable.Take(4)) {
    Console.WriteLine($"{team.Team} qualified for the Champions League");
}

## Congratulations! You have successfully completed the exercise. Continue to the bonus if you have more time.

# Bonus (Optional)

The .NET Interactive notebooks that are used for the manuals are ideal for doing data analysis.

Create a new notebook and use it to analyze **F1** data.

#### Create a new notebook.

Use **File | New | Notebook** to create a new .NET Interactive notebook.

<font color="red">**When prompted to select a kernel _make sure_ you choose _.NET (C#)_ from the list.**</font>

Right-click on the new notebook's name (`Untitled.ipynb`) and change it to `f1_analysis.ipynb`.

#### Load the F1 dataset.

In the first cell, create a data model/class called `RaceResult` to hold an F1 race result. The class definition should be as follows.

In [None]:
public class RaceResult
{
    public string Country { get; set; }
    public string Circuit { get; set; }
    public string CircuitReference { get; set; }
    public int Year { get; set; }
    public int Round { get; set; }
    public string Driver { get; set; }
    public string DriverCode { get; set; }
    public string DriverNationality { get; set; }
    public string Constructor { get; set; }
    public int GridPosition { get; set; }
    public string FastestLap { get; set; }
    public string Result { get; set; }
    public double Points { get; set; }
}

<font color="red">**Make sure to run the cell whenever you add new code.**</font>

Load (deserialize) the JSON document in `C:\Course\510D\Data\f1_results.json` as a list of `RaceResult` objects.

#### Answer...

In [None]:
// Change to C:\Course\510D\Data\f1_results.json for exercise 
string json = File.ReadAllText(@"data/f1_results.json");

var serializerOptions = new JsonSerializerOptions
{
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
};

var raceResults = JsonSerializer.Deserialize<IEnumerable<RaceResult>>(json, serializerOptions);

#### Analyze the F1 data.

Using the data, determine the following.

- Top three drivers in the 2017 season
- Top three constructors in the 2017 season
- Driver with the most pole positions
- Drivers who have never won an F1 race

...or any other statistic that interests you.

#### Find the top three drivers in the 2017 season.

#### Answer...

In [None]:
raceResults
    .Where(x => x.Year == 2017)
    .GroupBy(x => x.Driver)
    .Select(g => new { Driver = g.Key, Points = g.Sum(x => x.Points) })
    .OrderByDescending(x => x.Points)
    .Take(3)

#### Find the top three constructors in the 2017 season.

#### Answer...

In [None]:
raceResults
    .Where(x => x.Year == 2017)
    .GroupBy(x => x.Constructor)
    .Select(g => new { Constructor = g.Key, Points = g.Sum(x => x.Points) })
    .OrderByDescending(x => x.Points)
    .Take(3)

#### Find the driver with the most pole positions.

#### Answer...

In [None]:
raceResults
    .Where(x => x.GridPosition == 1)
    .GroupBy(x => x.Driver)
    .Select(g => new { Driver = g.Key, PolePositions = g.Count() })
    .OrderByDescending(x => x.PolePositions)
    .First()

#### Find drivers who have never won an F1 race.

#### Answer...

In [None]:
raceResults
    .GroupBy(x => x.Driver)
    .Where(g => !g.Select(x => x.Result).Contains("1"))
    .Select(g => g.Key)
    .OrderBy(x => x)

## Congratulations! You have completed the bonus.