# Premier league: How has VAR impacted the rankings?

There has been much debate about the video assistant referee when it was introduced last year (in 2019).
The goal is to lead to fairer refereeing, but concerns are high on whether this will really be the case and the fact it could break the rythm of the game.

We will let football – or soccer depending on where you read that notebook – analysts answer this question, but one thing we can look at is how has VAR impacted the league so far.

This is what we will do in this notebook, alongside some other simulations we found interesting.

## 1. Importing the data

In [1]:
import pandas
import os.path
import wget
#note that you need to install wget (conda install python-wget) to get the data of this notebook

In [2]:
# checking if file has already been downloaded 
def download_source(cwd: str, url: str, filename: str):
    """This function downloads data file from the given url to the working directory (cwd).

    Args:
        cwd: current working directory.
        url: url for the source file
        filename: filename of the downloaded file
    """
    if os.path.isfile(cwd + "/" + filename):
        print(filename + ' already downloaded.')
    else:
        wget.download(url, out=cwd)

In [3]:
cwd = os.getcwd()
download_source(cwd, 'http://data.atoti.io/notebooks/premier-league/goals.csv', "goals.csv")
download_source(cwd, 'http://data.atoti.io/notebooks/premier-league/matches.csv', "matches.csv")

goals.csv already downloaded.
matches.csv already downloaded.


### Importing and formatting the matches calendar with pandas

We will first import the matches calendar of the 2019/2020 season.

In [4]:
matches_df = pandas.read_csv("matches.csv", sep=";")
matches_df.head()

Unnamed: 0,League,Day,Home,Away
0,Premier League 2019/2020,1,Liverpool,Norwich
1,Premier League 2019/2020,1,West Ham,Manchester City
2,Premier League 2019/2020,1,Bournemouth,Sheffield
3,Premier League 2019/2020,1,Burnley,Southampton
4,Premier League 2019/2020,1,Crystal Palace,Everton


To be able to use this calendar more efficiently in a cube we will "flatten" this table so that for every league, day and team we have a line in the new table.

In [5]:
flat_matches_df = matches_df.iloc[0:0]
del flat_matches_df['Home']
del flat_matches_df['Away']
flat_matches_df.insert(0,"Team",None,True)
flat_matches_df.insert(0,"IsPlayingHome",None,True)
flat_matches_df.insert(0,"Opponent",None,True)

In [6]:
# For each line in the calendar, we now get two lines
for index, row in matches_df.iterrows():
    flat_matches_df = flat_matches_df.append({"Team": row.Home, "League": row.League, "Day": row.Day, "IsPlayingHome": True, "Opponent": row.Away }, ignore_index = True) 
    flat_matches_df = flat_matches_df.append({"Team": row.Away, "League": row.League, "Day": row.Day, "IsPlayingHome": False, "Opponent": row.Home }, ignore_index = True) 

In [7]:
flat_matches_df.head()

Unnamed: 0,Opponent,IsPlayingHome,Team,League,Day
0,Norwich,True,Liverpool,Premier League 2019/2020,1
1,Liverpool,False,Norwich,Premier League 2019/2020,1
2,Manchester City,True,West Ham,Premier League 2019/2020,1
3,West Ham,False,Manchester City,Premier League 2019/2020,1
4,Sheffield,True,Bournemouth,Premier League 2019/2020,1


### Importing and formatting the goals with pandas

We will now import the goals data, this is what we will use to compute the results of the games and the rankings.

In [8]:
goals_df = pandas.read_csv("goals.csv", sep=";")
goals_df.head()

Unnamed: 0,League,Day,Half,Minute,Scorer,Team,IsCancelledAfterVAR,IsPenalty,IsOwnGoal
0,Premier League 2019/2020,1,1,42,Origi,Liverpool,N,N,N
1,Premier League 2019/2020,1,1,28,van Dijk,Liverpool,N,N,N
2,Premier League 2019/2020,1,1,19,Mohamed Salah,Liverpool,N,N,N
3,Premier League 2019/2020,1,1,7,Hanley,Norwich,N,N,Y
4,Premier League 2019/2020,1,2,64,Pukki,Norwich,N,N,N


Looks like booleans are formatted by "Y" and "N" in the data, let's change that

In [9]:
boolean_mapping = { 'Y': True, 'N': False }
goals_df["IsCancelledAfterVAR"] = goals_df["IsCancelledAfterVAR"].map(boolean_mapping)
goals_df["IsPenalty"] = goals_df["IsPenalty"].map(boolean_mapping)
goals_df["IsOwnGoal"] = goals_df["IsOwnGoal"].map(boolean_mapping)
goals_df.head()

Unnamed: 0,League,Day,Half,Minute,Scorer,Team,IsCancelledAfterVAR,IsPenalty,IsOwnGoal
0,Premier League 2019/2020,1,1,42,Origi,Liverpool,False,False,False
1,Premier League 2019/2020,1,1,28,van Dijk,Liverpool,False,False,False
2,Premier League 2019/2020,1,1,19,Mohamed Salah,Liverpool,False,False,False
3,Premier League 2019/2020,1,1,7,Hanley,Norwich,False,False,True
4,Premier League 2019/2020,1,2,64,Pukki,Norwich,False,False,False


Note that in this data we also have all the goals that were later cancelled by VAR during a game.

## 2. Computing the rankings and other metrics in Atoti

### Starting Atoti

In [10]:
import atoti
# from atoti.config import create_config
# config = create_config(metadata_db="./metadata.db")
# session = atoti.create_session(config=config)
session = atoti.create_session()
#session.load_all_data()

Welcome to Atoti 0.3.1!

By using this community edition, you agree with the license available at https://www.atoti.io/eula.
Browse the official documentation at https://docs.atoti.io.
Join the community at https://www.atoti.io/register.

You can hide this message by setting the ATOTI_HIDE_EULA_MESSAGE environment variable to True.


### Loading data into stores and joining them

In [11]:
matches_store = session.read_pandas(flat_matches_df, keys=["League","Day","Team"])

In [12]:
goals_store = session.read_pandas(goals_df, keys=["League","Day","Minute", "Scorer"], store_name="goals_store")

We will now for each match retrieve all the goals that were scored during the match.  
Atoti does outer left many to many joins by default.

In [13]:
matches_store.join(goals_store, mapping={"League": "League", "Day": "Day", "Team": "Team"})

### Creating a cube

We will create the cube on the matches store because some matches or teams end with no goal and we still want to have a line for these in the pivot tables.  
In auto mode like below, a hierarchy will be created for each non numeric column, and average and sum measures for each numeric column. This can later be edited, or you could also define all hierarchies/measures by yourself switching to manual mode.

In [14]:
matches_cube = session.create_cube(matches_store, "MatchesCube")

We will give a name to ou measures/levels/hierarchies

In [15]:
m = matches_cube.measures
level = matches_cube.levels
h = matches_cube.hierarchies

### Computing the rankings from the goals

In [16]:
m["Goal Value"] = 1.0
m["Goal Value"].visible = False

Computing a first measure to count all the goals scored.

Same measure, but this time excluding goals cancelled after VAR.

In [17]:
m["Player goals"] = atoti.agg.sum(m["Goal Value"], 
                                                         scope=atoti.scope.origin("League", "Day", "Team","Minute","Scorer"))

Some teams have not scored any goal during some matches. This the above measures will return `None` in those cases, we will replace that by a zero.

In [18]:
m["Team Goals (incl OG)"] = atoti.agg.sum(atoti.where(m["Player goals"] == None,
                                                                      0.0,
                                                                      m["Player goals"]),
                                                          scope=atoti.scope.origin("League","Team","Day"))

Let's check that the measure we just defined work by visualizing them at match/team level:

In [20]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

We need to take into account the own goals. In that case the goal is scored by a player of our own team.

In [20]:
m["Team Own Goals"] = atoti.filter(m["Team Goals (incl OG)"], level["IsOwnGoal"] == True)

In [21]:
m["Team Goals"] = m["Team Goals (incl OG)"] - m["Team Own Goals"]

For a particular match, the `Opponent Goals` are equal to the `Team Goals` if we switch to the data fact where Team is replaced by Opponent and Opponent by Team

In [22]:
m["Opponent Goals (technical)"] = atoti.at(m["Team Goals"], {level["Team"]: level["Opponent"], level["Opponent"]: level["Team"]})
m["Opponent Goals (technical)"].visible = False

In [23]:
m["Opponent Goals"] = atoti.agg.sum(m["Opponent Goals (technical)"], scope = atoti.scope.origin("Team", "Opponent"))

In [24]:
m["Opponent Own Goals (technical)"] = atoti.at(atoti.agg.sum(atoti.where(m["Team Own Goals"] == None, 0.0, m["Team Own Goals"])), {level["Team"]: level["Opponent"], level["Opponent"]: level["Team"]})
m["Opponent Own Goals (technical)"].visible = False

In [25]:
m["Opponent Own Goals"] = atoti.agg.sum(m["Opponent Own Goals (technical)"], scope = atoti.scope.origin("Team", "Opponent"))

We now have the team goals plus those of the opponent for each match. Now remeber VAR cancelled goals are also included, let's create new measures taking that into account.

In [26]:
m["Valid team goals"] = atoti.filter(m["Team Goals"], level["IsCancelledAfterVAR"] == False)
m["Valid opponent goals"] = atoti.filter(m["Opponent Goals"], level["IsCancelledAfterVAR"] == False)

We can visualize that in details, there are already 4 goals cancelled by VAR on the first day of the season !

In [28]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

We are now going to add two measures `Team Score` and `Opponent Score` to compute the result of a particular match.  
Since the previous measures are empty if there are 0 goals, and we still want the 0-0 results to appear we will add up zeroes for those cases.

In [66]:
m["Team Score"] = atoti.agg.sum(atoti.where(m["Team Goals"] + m["Opponent Own Goals"] == None,
                                            0.0, 
                                            m["Team Goals"] + m["Opponent Own Goals"]
                                           ), 
                                scope=atoti.scope.origin("League","Day","Team"))

In [67]:
m["tmptmp"] = m["Team Goals"] + m["Opponent Own Goals"]

In [70]:
m["opppppp"] = m["Opponent Goals"] + m["Team Own Goals"]

In [68]:
m["Opponent Score"] = atoti.agg.sum(atoti.where(m["Opponent Goals"] + m["Team Own Goals"] == None, 
                                                0.0, 
                                                m["Opponent Goals"] + m["Team Own Goals"]
                                               ), 
                                    scope=atoti.scope.origin("League","Day","Team"))

In [73]:
m["ozadaozij"] = atoti.agg.sum(atoti.where(m["opppppp"] == None, 
                                                0.0, 
                                                m["opppppp"]
                                               ),
                              scope=atoti.scope.origin("League","Day","Team"))

In [62]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

Now that for each game we have the number of goals of a team and the number of goals the opponent scored, we can compute how many points the team has earned.  

In [29]:
m["Points for victory"] = 3.0
m["Points for tie"] = 1.0
m["Points for loss"] = 0.0

In [30]:
m["Points (no VAR)"] = atoti.agg.sum(atoti.where(m["Team Goals"] + m["Opponent Own Goals"] > m["Opponent Goals"] + m["Team Own Goals"],
                                        m["Points for victory"],
                                        atoti.where(m["Team Goals"] + m["Opponent Own Goals"] == m["Opponent Goals"] + m["Team Own Goals"],
                                                     m["Points for tie"],
                                                     m["Points for loss"])
                                       ),
                            scope=atoti.scope.origin("League","Day","Team"))

In [32]:
m["Actual Points"] = atoti.filter(m["Points (no VAR)"], level["IsCancelledAfterVAR"] == False)

And here we have our ranking. 

## Rankings and VAR impact

Color rules were added to show teams that benefited from the VAR in green and those who lost championship points because of it in red.

In [34]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

#### Add comments on the result here once the data is complete

Since the ranking is computed from the goal level, we can perform any kind of simulation we want using simple UI filters.  
You can filter the pivot table above to see what would happen if we only keep the first half of the games ? If we only keep matches played home ? What if we filter out Vardy, would Leicester lose some places ?

## Evolution of the rankings over time

Atoti also enables you to define cumulative sums over a hierarchy, we will use that to see how the team rankings evolved during the season.  
In the same way, the measure printed in the chart below is computed in real time and currently prints the cumulative points including VAR refused goals. You can filter the chart on `IsCancelledAfterVAR = False` to get the rankings evolution with the actual points.

In [35]:
m["Points cumulative sum"] = atoti.agg.sum(m["Actual Points"],
                                          scope=atoti.scope.cumulative(level["Day"]))

In [37]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

We have no values for some days where the team has not scored any points and thus does not have a value for measure `Actual Points`.  
We can fix that by taking the previous existing value in those cases.

In [38]:
m["Points smooth cumulative sum"] = atoti.where(m["Points cumulative sum"] == None,
                                              2.0, #atoti.shift(m["Points cumulative sum"], on=level["Day"],period=1),
                                               m["Points cumulative sum"])

In [40]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

## Players most impacted by the VAR

Until now we looked at most results at team level, but since the data exists at goal level, we could have a look at which players are most impacted by the VAR.

In [41]:
m["Valid player goals"] = atoti.filter(m["Player goals"], level["IsCancelledAfterVAR"] == False)

In [43]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

Let's have a look at it in a chart

In [45]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

#### Add analysis here once the data is complete

## Simulation of a different scoring system

Although we are all used to a scoring system giving 3 points for a victory, 1 for a tie and 0 per lost match this was not always the case. Before the 1990's many european leagues only gave 2 points per victory, reason for the change being to encourage teams to score more goals during the games.  
The premier league gifts us well with plenty of goals scored (take it from someone watching the French ligue 1), but how different would the results be with the old scoring system ?  

Atoti enables us to very easily simulate this, we will simply created a new scenario where we replace the number of points given for a victory, we will first setup a simulation on that measure.

In [46]:
scoring_system_simulation = matches_cube.setup_simulation("Scoring system simulations", replace=[m["Points for victory"]])

And create a new scenario where we give it another value

In [47]:
scoring_system_simulation.scenarios["Old system"] = 2.0

And that's it, no need to define anything else, all the measures will be re-computed with the new value in the new scenario.  
Let's compare the rankings between the two scoring systems.

In [49]:
matches_cube.visualize()

Install the Atoti JupyterLab extension to see this widget.

#### add analysis here once the data is complete 

In [50]:
session.url

'http://localhost:53486'