<a id='top'></a>

# StatsBomb Data Parsing
##### Notebook to parse and engineer the JSON data from the [StatsBomb Open Data GitHub repository](https://github.com/statsbomb/open-data) using [pandas](http://pandas.pydata.org/), to create datasets ready for visualisation in [Tableau](https://public.tableau.com/profile/edd.webster) and further analysis in a [PowerPoint slide deck]().

### By [Edd Webster](https://www.twitter.com/eddwebster)
Notebook first written: 29/10/2021<br>
Notebook last updated: 04/11/2021

![StatsBomb](../../img/logos/stats-bomb-logo.png)

Click [here](#section5) to jump straight to the Exploratory Data Analysis section and skip the [Task Brief](#section2), [Data Sources](#section3), [Data Engineering](#section4), [Data Aggregation](#section5), and [Subsetted DataFrames](#section6) sections.

___


## <a id='import_libraries'>Introduction</a>
This notebook parses pubicly available [StatsBomb](https://statsbomb.com/) Event data, using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames.

For more information about this notebook and the author, I'm available through all the following channels:
*    [eddwebster.com](https://www.eddwebster.com/);
*    edd.j.webster@gmail.com;
*    [@eddwebster](https://www.twitter.com/eddwebster);
*    [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);
*    [github/eddwebster](https://github.com/eddwebster/); and
*    [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster).

The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/football_analytics) and a static version of this notebook can be found [here](https://nbviewer.org/github/eddwebster/football_analytics/blob/master/notebooks/2_data_parsing/Parma%20Calcio%201913%20-%20StatsBomb%20Data%20Parsing%20and%20Engineering.ipynb).

___

## <a id='notebook_contents'>Notebook Contents</a>
1.    [Notebook Dependencies](#section1)<br>
2.    [Project Brief](#section2)<br>
3.    [Data Sources](#section3)<br>
      1.    [Introduction](#section3.1)<br>
      2.    [Read in the Datasets](#section3.2)<br>
      3.    [Join the Datasets](#section3.3)<br>
      4.    [Initial Data Handling](#section3.4)<br>
5.    [Summary](#section5)<br>
6.    [Next Steps](#section6)<br>
7.    [References](#section7)<br>

___

<a id='section1'></a>

## <a id='#section1'>1. Notebook Dependencies</a>

This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:
*    [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;
*    [`NumPy`](http://www.numpy.org/) for multidimensional array computing; and
*    [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation.

All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/).

### Import Libraries and Modules

In [58]:
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv

# Import Dependencies
%matplotlib inline

# Math Operations
import numpy as np
from math import pi

# Datetime
import datetime
from datetime import date
import time

# Data Preprocessing
import pandas as pd
import pandas_profiling as pp
import os
import re
import chardet
import random
from io import BytesIO
from pathlib import Path

# Reading Directories
import glob
import os

# Working with JSON
import json
from pandas.io.json import json_normalize

# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

# Progress Bar
from tqdm import tqdm

# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML

# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

print("Setup Complete")

Setup Complete


In [26]:
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))

Python: 3.7.6
NumPy: 1.20.3
pandas: 1.3.2
matplotlib: 3.4.2


### Defined Variables

In [27]:
# Define today's date
today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')

### Defined Filepaths

In [28]:
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..')
data_dir = os.path.join(base_dir, 'data')
data_dir_sb = os.path.join(base_dir, 'data', 'sb')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')

### Create Directory Structure

In [None]:
# make the directory structure
for folder in ['combined', 'competitions', 'events', 'matches']:
    path = os.path.join(data_dir_sb, 'raw', folder)
    if not os.path.exists(path):
        os.mkdir(path)

### Custom Functions

In [30]:
# Define custom functions for used in the notebook

## Function to read JSON files that also handles the encoding of special characters e.g. accents in names of players and teams
def read_json_file(filename):
    with open(filename, 'rb') as json_file:
        return BytesIO(json_file.read()).getvalue().decode('unicode_escape')

    
## Function to flatten pandas DataFrames with nested JSON columns. Source: https://stackoverflow.com/questions/39899005/how-to-flatten-a-pandas-dataframe-with-some-columns-as-json
def flatten_nested_json_df(df):

    df = df.reset_index()

    print(f"original shape: {df.shape}")
    print(f"original columns: {df.columns}")


    # search for columns to explode/flatten
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()

    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    print(f"lists: {list_columns}, dicts: {dict_columns}")
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            print(f"flattening: {col}")
            # explode dictionaries horizontally, adding new columns
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            print(f"exploding: {col}")
            # explode lists vertically, adding new columns
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        # check if there are still dict o list fields to flatten
        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()

        print(f"lists: {list_columns}, dicts: {dict_columns}")

    print(f"final shape: {df.shape}")
    print(f"final columns: {df.columns}")
    return df

### Notebook Settings

In [32]:
# Display all columns of displayed pandas DataFrames
pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment=None

---

<a id='section2'></a>

## <a id='#section2'>2. Notebook Brief</a>
This Jupyter notebook is part of a series of notebooks, to scrape, parse, engineer, and unify datasets, culminating with basic modeling.

This particular notebook is the **StatsBomb Data Parsing** notebook, that takes raw JSON data downloaded from the StatsBomb Open Data GitHub Repository and converts this to event level data that is saved as a CSV file.

Links to these notebooks in the [`football_analytics`](https://github.com/eddwebster/football_analytics) GitHub repository can be found at the following:
*    [1. Webscraping](https://github.com/eddwebster/football_analytics/tree/master/notebooks/1_data_scraping)
     +    [TransferMarket Player Bio and Status Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Bio%20and%20Status%20Web%20Scraping.ipynb)
     +    [Capology Player Salary Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/Capology%20Player%20Salary%20Web%20Scraping.ipynb)
*    [2. Data Parsing](https://github.com/eddwebster/football_analytics/tree/master/notebooks/2_data_parsing)
     +    [StatsBomb Data Parsing](https://github.com/eddwebster/football_analytics/blob/master/notebooks/2_data_parsing/ELO%20Team%20Ratings%20Data%20Parsing.ipynb)
*    [3. Data Engineering](https://github.com/eddwebster/football_analytics/tree/master/notebooks/3_data_engineering)
     +    [StatsBomb Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/FBref%20Player%20Stats%20Data%20Engineering.ipynb)
     +    [TransferMarket Player Bio and Status Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Bio%20and%20Status%20Data%20Engineering.ipynb)
     +    [Capology Player Salary Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb)
*    [4. Data Unification](https://github.com/eddwebster/football_analytics/tree/master/notebooks/4_data_unification)
*    [5. Modeling and Data Analysis]()

**Notebook Conventions**:<br>
*    Variables that refer a `DataFrame` object are prefixed with `df_`.
*    Variables that refer to a collection of `DataFrame` objects (e.g., a list, a set or a dict) are prefixed with `dfs_`.

---

<a id='section3'></a>

## <a id='#section3'>3. Data Sources</a>

### <a id='#section3.1'>3.1. Introduction</a>

#### <a id='#section3.1.1'>3.1.1. About StatsBomb</a>
[StatsBomb](https://statsbomb.com/) are a football analytics and data company.

![title](../../img/logos/stats-bomb-logo.png)

Before conducting our EDA, the data needs to be imported as a DataFrame in the Data Sources section [Section 3](#section3) and Cleaned in the Data Engineering section [Section 4](#section4).

We'll be using the [pandas](http://pandas.pydata.org/) library to import our data to this workbook as a DataFrame.

#### <a id='#section3.1.2'>3.1.2. About the StatsBomb publicly available data</a>
The complete data set contains:
- 7 competitions;
- 879 matches;
- 3,161,917 events; and
- z players.

The datasets we will be using are:
- competitions;
- matches;
- events;
- lineups; and
- tactics;

The data needs to be imported as a DataFrame in the Data Sources section [Section 3](#section3) and cleaned in the Data Engineering section [Section 4](#section4).

### <a id='#section3.3'>3.3. Reading In and Parsing the JSON Data</a>
The following cells read in the `JSON` files into a [pandas](https://pandas.pydata.org/) `DataFrame` object with some basic Data Engineering to flatten the data and select only the columns of interest ensuring that the Jupyter otebook does not crash on a standard laptop.

#### <a id='#section3.3.1.'>3.3.1. Competitions</a>

##### Data dictionary

In [33]:
# ADD MARKDOWN TABLE OF DATA HERE

##### Read in JSON files

In [34]:
# Show files in directory
print(glob.glob(os.path.join(data_dir_sb, 'raw', 'competitions/*')))

['../../data/sb/raw/competitions/competitions_wc2018.csv', '../../data/sb/raw/competitions/competitions.csv', '../../data/sb/raw/competitions/competitions_male.csv']


In [35]:
# Read in exported CSV file if exists, if not, read in JSON file

## 
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions.csv')):
    json_competitions = read_json_file(os.path.join(data_dir_sb, 'open-data', 'data', 'competitions.json'))
    df_competitions_flat = pd.read_json(json_competitions)

##     
else:
    df_competitions_flat = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions.csv'))    


# Display DataFrame
df_competitions_flat

Unnamed: 0,competition_id,season_id,country_name,competition_name,competition_gender,season_name,match_updated,match_available
0,16,4,Europe,Champions League,male,2018/2019,2020-10-25T12:33:27.855343,2020-10-25T12:33:27.855343
1,16,1,Europe,Champions League,male,2017/2018,2021-01-23T21:55:30.425330,2021-01-23T21:55:30.425330
2,16,2,Europe,Champions League,male,2016/2017,2020-08-26T12:33:15.869622,2020-07-29T05:00
3,16,27,Europe,Champions League,male,2015/2016,2020-08-26T12:33:15.869622,2020-07-29T05:00
4,16,26,Europe,Champions League,male,2014/2015,2020-08-26T12:33:15.869622,2020-07-29T05:00
5,16,25,Europe,Champions League,male,2013/2014,2020-08-26T12:33:15.869622,2020-07-29T05:00
6,16,24,Europe,Champions League,male,2012/2013,2020-08-26T12:33:15.869622,2020-07-29T05:00
7,16,23,Europe,Champions League,male,2011/2012,2020-08-26T12:33:15.869622,2020-07-29T05:00
8,16,22,Europe,Champions League,male,2010/2011,2020-07-29T05:00,2020-07-29T05:00
9,16,21,Europe,Champions League,male,2009/2010,2020-07-29T05:00,2020-07-29T05:00


In [36]:
df_competitions_flat.shape

(37, 8)

##### Identify the Competition of Interest
For our analysis, we only want to take the players that have played in the **male** competitions.

In [37]:
# Filter DataFrame for rows where 'competition_gender' is equal to 'male'
df_competitions_flat = df_competitions_flat.loc[df_competitions_flat['competition_id'] == 43]

In [38]:
df_competitions_flat

Unnamed: 0,competition_id,season_id,country_name,competition_name,competition_gender,season_name,match_updated,match_available
17,43,3,International,FIFA World Cup,male,2018,2020-10-25T14:03:50.263266,2020-10-25T14:03:50.263266


##### Export DataFrame

In [39]:
# Export DataFrame as a CSV file

##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions_wc2018.csv')):
    df_competitions_flat.to_csv(os.path.join(data_dir_sb, 'raw', 'competitions', 'competitions_wc2018.csv'), index=None, header=True)

##     
else:
    pass

#### <a id='#section3.3.2.'>3.3.2. Matches</a>

##### Data Dictionary

In [40]:
# ADD MARKDOWN TABLE OF DATA HERE

##### Define competitions
The following cell lists the competitions to be included in the dataset. Dataset includes data for seven different competitions - 5 domestic and 2 international.

In [41]:
# Define a list to select only the competitions of interest. 

# Flatmap all Competition IDs to use all available competitions
lst_competitions = df_competitions_flat['competition_id'].unique().tolist()

"""
# Define list of competitions
lst_competitions = [2,     # Premier League
                    11,    # La Liga
                    16,    # Champions League
                   #37,    # FA Women's Super League
                    43,    # FIFA World Cup
                   #49,    # NWSL
                   #72,    # Women's World Cup
                   ]

"""

# Display list of competitions
lst_competitions

[43]

In [42]:
# Display the number of competitions
len(lst_competitions)

1

##### Read in JSON files

In [43]:
# Show files in directory
print(glob.glob(os.path.join(data_dir_sb, 'raw', 'matches/*')))

['../../data/sb/raw/matches/matches.csv', '../../data/sb/raw/matches/matches_male.csv', '../../data/sb/raw/matches/matches_wc2018.csv']


Steps: 
*   Loop through match files for the select competitions.
*   Take the separate JSON file each representing the matches for the selected competitions. This file is called {match_id}.json.
*   Read JSON file as a pandas DataFrame.
*   Append the DataFrames to a list.
*   Finally, concatenate all the separate DataFrames into one DataFrame of matches.

In [44]:
# Read in selected matches

## Read in exported CSV file if exists, if not, read in JSON file
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_wc2018.csv')):
    
    ### Create empty list for DataFrames
    dfs_matches_all = []
    
    ### Loop through the selected competitions
    for competition in lst_competitions:

        ### Create empty list for DataFrames
        dfs_matches_competition = []
        
        #### Show files in directory
        lst_filepaths = list(glob.glob(data_dir_sb + '/open-data/data/matches/' + str(competition) + '/*'))
            
        for filepath in lst_filepaths:
        
            ##### Open the JSON filepath with defined Competition and Season IDs
            try:

                ###### Import all StatsBomb JSON Match data for the mens matches
                with open(filepath) as f:
                    json_sb_match_data = json.load(f)

                ###### Flatten the JSON Match data
                df_matches_flat = json_normalize(json_sb_match_data)

                ###### Append each Match data to 
                dfs_matches_competition.append(df_matches_flat)

                ## Concatenate DataFrames to one DataFrame
                df_matches_competition = pd.concat(dfs_matches_competition)

            #####
            except:
                pass

        ## Concatenate DataFrames to one DataFrame
        dfs_matches_all.append(df_matches_competition)
            
    ## Concatenate DataFrames to one DataFrame
    df_matches_flat = pd.concat(dfs_matches_all)
    
##
else:    
    df_matches_flat = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_wc2018.csv'))
    
    
## Display DataFrame
df_matches_flat.head()

Unnamed: 0,match_id,match_date,kick_off,home_score,away_score,match_status,last_updated,match_week,competition.competition_id,competition.country_name,competition.competition_name,season.season_id,season.season_name,home_team.home_team_id,home_team.home_team_name,home_team.home_team_gender,home_team.home_team_group,home_team.country.id,home_team.country.name,home_team.managers,away_team.away_team_id,away_team.away_team_name,away_team.away_team_gender,away_team.away_team_group,away_team.country.id,away_team.country.name,away_team.managers,metadata.data_version,competition_stage.id,competition_stage.name,stadium.id,stadium.name,stadium.country.id,stadium.country.name,referee.id,referee.name,referee.country.id,referee.country.name
0,7581,2018-07-01,20:00:00.000,1,1,available,2020-07-29T05:00,4,43,International,FIFA World Cup,3,2018,785,Croatia,male,,56,Croatia,"[{'id': 307, 'name': 'Zlatko Dalić', 'nickname...",776,Denmark,male,,61,Denmark,"[{'id': 641, 'name': 'Åge Fridtjof Hareide', '...",1.0.2,33,Round of 16,4263.0,Stadion Nizhny Novgorod,188.0,Russia,730.0,N. Pitana,,
1,7549,2018-06-22,17:00:00.000,2,0,available,2020-07-29T05:00,2,43,International,FIFA World Cup,3,2018,775,Nigeria,male,Group D,166,Nigeria,"[{'id': 636, 'name': 'Gernot Rohr', 'nickname'...",793,Iceland,male,Group D,104,Iceland,"[{'id': 648, 'name': 'Heimir Hallgrímsson', 'n...",1.0.2,10,Group Stage,4257.0,Volgograd Arena,188.0,Russia,739.0,M. Conger,,
2,7555,2018-06-24,20:00:00.000,0,3,available,2020-07-29T05:00,2,43,International,FIFA World Cup,3,2018,789,Poland,male,Group H,182,Poland,"[{'id': 542, 'name': 'Adam Nawałka', 'nickname...",769,Colombia,male,Group H,49,Colombia,"[{'id': 634, 'name': 'José Néstor Pekerman', '...",1.0.2,10,Group Stage,4258.0,Kazan' Arena (Kazan'),188.0,Russia,740.0,C. Ramos,147.0,Mexico
3,7529,2018-06-16,21:00:00.000,2,0,available,2020-07-29T05:00,1,43,International,FIFA World Cup,3,2018,785,Croatia,male,Group D,56,Croatia,"[{'id': 307, 'name': 'Zlatko Dalić', 'nickname...",775,Nigeria,male,Group D,166,Nigeria,"[{'id': 636, 'name': 'Gernot Rohr', 'nickname'...",1.0.2,10,Group Stage,4260.0,Stadion Kaliningrad,255.0,International,738.0,Sandro Ricci,,
4,7548,2018-06-22,14:00:00.000,2,0,available,2020-07-29T05:00,2,43,International,FIFA World Cup,3,2018,781,Brazil,male,Group E,31,Brazil,"[{'id': 547, 'name': 'Adenor Leonardo Bacchi',...",795,Costa Rica,male,Group E,54,Costa Rica,"[{'id': 646, 'name': 'Óscar Antonio Ramírez He...",1.0.2,10,Group Stage,4726.0,Saint-Petersburg Stadium,255.0,International,287.0,B. Kuipers,160.0,Netherlands


In [45]:
df_matches_flat.shape

(64, 38)

In [46]:
# Shot outcomes types and their frequency
df_matches_flat.groupby(['competition.competition_name', 'season.season_name']).match_id.count()

competition.competition_name  season.season_name
FIFA World Cup                2018                  64
Name: match_id, dtype: int64

There are 64 games in the World Cup that can be used as part of data analysis. This is quite a small dataset but it's the only complete mens ompetition available.

##### Convert `match_id` column to list
List used as reference of matches to parse for Events, Lineups, and Tactics data - iteration through list comprehension.

In [47]:
# Flatmap all Match IDs to use all available matches
lst_matches = df_matches_flat['match_id'].tolist()

# Display the number of matches
len(lst_matches)

64

##### Export DataFrame

In [48]:
# Export DataFrame as a CSV file

##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_wc2018.csv')):
    df_matches_flat.to_csv(os.path.join(data_dir_sb, 'raw', 'matches', 'matches_wc2018.csv'), index=None, header=True)

##    
else:
    pass

#### <a id='#section3.3.3.'>3.3.3. Events</a>

##### Data dictionary

The [StatsBomb](https://statsbomb.com/) dataset has one hundred and fourteen features (columns) with the following definitions and data types:

| Feature     | Data type    |
|------|-----|
| `id`    | `object`
| `index`    | `object`
| `period`    | `object`
| `timestamp`    | `object`
| `minute`    | `object`
| `second`    | `object`
| `possession`    | `object`
| `duration`    | `object`
| `type.id`    | `object`
| `type.name`    | `object`
| `possession_team.id`    | `object`
| `possession_team.name`    | `object`
| `play_pattern.id`    | `object`
| `play_pattern.name`    | `object`
| `team.id`    | `object`
| `team.name`    | `object`
| `tactics.formation`    | `object`
| `tactics.lineup`    | `object`
| `related_events`    | `object`
| `location`    | `object`
| `player.id`    | `object`
| `player.name`    | `object`
| `position.id`    | `object`
| `position.name`    | `object`
| `pass.recipient.id`    | `object`
| `pass.recipient.name`    | `object`
| `pass.length`    | `object`
| `pass.angle`    | `object`
| `pass.height.id`    | `object`
| `pass.height.name`    | `object`
| `pass.end_location`    | `object`
| `pass.type.id`    | `object`
| `pass.type.name`    | `object`
| `pass.body_part.id`    | `object`
| `pass.body_part.name`    | `object`
| `carry.end_location`    | `object`
| `under_pressure`    | `object`
| `duel.type.id`    | `object`
| `duel.type.name`    | `object`
| `out`    | `object`
| `miscontrol.aerial_won`    | `object`
| `pass.outcome.id`    | `object`
| `pass.outcome.name`    | `object`
| `ball_receipt.outcome.id`    | `object`
| `ball_receipt.outcome.name`    | `object`
| `pass.aerial_won`    | `object`
| `counterpress`    | `object`
| `off_camera`    | `object`
| `dribble.outcome.id`    | `object`
| `dribble.outcome.name`    | `object`
| `dribble.overrun`    | `object`
| `ball_recovery.offensive`    | `object`
| `shot.statsbomb_xg`    | `object`
| `shot.end_location`    | `object`
| `shot.outcome.id`    | `object`
| `shot.outcome.name`    | `object`
| `shot.type.id`    | `object`
| `shot.type.name`    | `object`
| `shot.body_part.id`    | `object`
| `shot.body_part.name`    | `object`
| `shot.technique.id`    | `object`
| `shot.technique.name`    | `object`
| `shot.freeze_frame`    | `object`
| `goalkeeper.end_location`    | `object`
| `goalkeeper.type.id`    | `object`
| `goalkeeper.type.name`    | `object`
| `goalkeeper.position.id`    | `object`
| `goalkeeper.position.name`    | `object`
| `pass.straight`    | `object`
| `pass.technique.id`    | `object`
| `pass.technique.name`    | `object`
| `clearance.head`    | `object`
| `clearance.body_part.id`    | `object`
| `clearance.body_part.name`    | `object`
| `pass.switch`    | `object`
| `duel.outcome.id`    | `object`
| `duel.outcome.name`    | `object`
| `foul_committed.advantage`    | `object`
| `foul_won.advantage`    | `object`
| `pass.cross`    | `object`
| `pass.assisted_shot_id`    | `object`
| `pass.shot_assist`    | `object`
| `shot.one_on_one`    | `object`
| `shot.key_pass_id`    | `object`
| `goalkeeper.body_part.id`    | `object`
| `goalkeeper.body_part.name`    | `object`
| `goalkeeper.technique.id`    | `object`
| `goalkeeper.technique.name`    | `object`
| `goalkeeper.outcome.id`    | `object`
| `goalkeeper.outcome.name`    | `object`
| `clearance.aerial_won`    | `object`
| `foul_committed.card.id`    | `object`
| `foul_committed.card.name`    | `object`
| `foul_won.defensive`    | `object`
| `clearance.right_foot`    | `object`
| `shot.first_time`    | `object`
| `pass.through_ball`    | `object`
| `interception.outcome.id`    | `object`
| `interception.outcome.name`    | `object`
| `clearance.left_foot`    | `object`
| `ball_recovery.recovery_failure`    | `object`
| `shot.aerial_won`    | `object`
| `pass.goal_assist`    | `object`
| `pass.cut_back`    | `object`
| `pass.deflected`    | `object`
| `clearance.other`    | `object`
| `pass.outswinging`    | `object`
| `substitution.outcome.id`    | `object`
| `substitution.outcome.name`    | `object`
| `substitution.replacement.id`    | `object`
| `substitution.replacement.name`    | `object`
| `block.deflection`    | `object`
| `block.offensive`    | `object`
| `injury_stoppage.in_chain`    | `object`

For a full list of definitions, see the official documentation [[link](https://statsbomb.com/stat-definitions/)].

##### Read in JSON files

In [49]:
# Show files in directory
print(glob.glob(os.path.join(data_dir_sb, 'raw', 'events/*')))

['../../data/sb/raw/events/events_male.csv']


Steps: 
*   Loop through the matches files for the selected match(es)
*   Take the separate JSON file each representing theevents match for the selected matches. This file is called {match_id}.json.
*   Read the corresponding JSON matches files using the auxillary function
*   Read JSON file as a pandas DataFrame
*   Append the DataFrames to a list
*   Finally, concatenate all the separate DataFrames into one DataFrame

In [50]:
# Read in exported CSV file if exists, if not, read in JSON file

##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'events', 'events_wc2018.csv')):

    ### Create empty list for DataFrames
    dfs_events = []

    ### Loop through event files for the selected matches and append DataFrame to dfs_events list
    for match_id in lst_matches:
        
        #### 
        with open(data_dir_sb + '/open-data/data/events/' + str(match_id) + '.json') as f:
            event = json.load(f)
           #match_id = str(match_id)
            df_event_flat = json_normalize(event)
            df_event_flat['match_id'] = match_id
            dfs_events.append(df_event_flat)    

    ### Concatenate DataFrames to one DataFrame
    df_events = pd.concat(dfs_events)
    
    ### Flatten the nested columns
    df_events_flat = flatten_nested_json_df(df_events)
    
    
##
else:    
    df_events_flat = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'events', 'events_wc2018.csv'))
    
    
## Display DataFrame
df_events_flat.head()

  app.launch_new_instance()


original shape: (227886, 123)
original columns: Index(['level_0', 'id', 'index', 'period', 'timestamp', 'minute', 'second',
       'possession', 'duration', 'type.id',
       ...
       'injury_stoppage.in_chain', 'shot.one_on_one', 'block.save_block',
       'pass.miscommunication', 'bad_behaviour.card.id',
       'bad_behaviour.card.name', 'shot.open_goal', 'shot.deflected',
       'shot.redirect', 'shot.follows_dribble'],
      dtype='object', length=123)
lists: [], dicts: []
final shape: (227886, 123)
final columns: Index(['level_0', 'id', 'index', 'period', 'timestamp', 'minute', 'second',
       'possession', 'duration', 'type.id',
       ...
       'injury_stoppage.in_chain', 'shot.one_on_one', 'block.save_block',
       'pass.miscommunication', 'bad_behaviour.card.id',
       'bad_behaviour.card.name', 'shot.open_goal', 'shot.deflected',
       'shot.redirect', 'shot.follows_dribble'],
      dtype='object', length=123)


Unnamed: 0,level_0,id,index,period,timestamp,minute,second,possession,duration,type.id,type.name,possession_team.id,possession_team.name,play_pattern.id,play_pattern.name,team.id,team.name,tactics.formation,tactics.lineup,related_events,location,player.id,player.name,position.id,position.name,pass.recipient.id,pass.recipient.name,pass.length,pass.angle,pass.height.id,pass.height.name,pass.end_location,pass.body_part.id,pass.body_part.name,pass.type.id,pass.type.name,under_pressure,carry.end_location,pass.outcome.id,pass.outcome.name,pass.aerial_won,duel.type.id,duel.type.name,ball_receipt.outcome.id,ball_receipt.outcome.name,pass.switch,pass.assisted_shot_id,pass.goal_assist,shot.statsbomb_xg,shot.end_location,shot.key_pass_id,shot.outcome.id,shot.outcome.name,shot.body_part.id,shot.body_part.name,shot.type.id,shot.type.name,shot.technique.id,shot.technique.name,shot.freeze_frame,goalkeeper.outcome.id,goalkeeper.outcome.name,goalkeeper.body_part.id,goalkeeper.body_part.name,goalkeeper.type.id,goalkeeper.type.name,goalkeeper.position.id,goalkeeper.position.name,goalkeeper.technique.id,goalkeeper.technique.name,shot.first_time,counterpress,foul_committed.offensive,foul_won.defensive,pass.cross,goalkeeper.end_location,clearance.aerial_won,dribble.outcome.id,dribble.outcome.name,duel.outcome.id,duel.outcome.name,pass.deflected,block.offensive,block.deflection,dribble.overrun,pass.shot_assist,interception.outcome.id,interception.outcome.name,miscontrol.aerial_won,ball_recovery.recovery_failure,foul_committed.advantage,foul_won.advantage,dribble.nutmeg,shot.aerial_won,pass.backheel,50_50.outcome.id,50_50.outcome.name,ball_recovery.offensive,substitution.outcome.id,substitution.outcome.name,substitution.replacement.id,substitution.replacement.name,foul_committed.type.id,foul_committed.type.name,pass.through_ball,pass.technique.id,pass.technique.name,foul_committed.card.id,foul_committed.card.name,foul_committed.penalty,foul_won.penalty,match_id,pass.cut_back,injury_stoppage.in_chain,shot.one_on_one,block.save_block,pass.miscommunication,bad_behaviour.card.id,bad_behaviour.card.name,shot.open_goal,shot.deflected,shot.redirect,shot.follows_dribble
0,0,0aa135b8-37b4-4482-adc7-f02e85a19bec,1,1,00:00:00.000,0,0,1,0.0,35,Starting XI,785,Croatia,1,Regular Play,785,Croatia,4141.0,"[{'player': {'id': 3444, 'name': 'Danijel Suba...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,
1,1,086b7750-936b-4ffd-9a00-bfd72c7a0f26,2,1,00:00:00.000,0,0,1,0.0,35,Starting XI,785,Croatia,1,Regular Play,776,Denmark,4411.0,"[{'player': {'id': 3815, 'name': 'Kasper Schme...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,
2,2,53f0c3f9-129e-47b5-ba77-ae9d214df56f,3,1,00:00:00.000,0,0,1,0.0,18,Half Start,785,Croatia,1,Regular Play,785,Croatia,,,[49233ae2-594f-43c9-a58c-a6a0b8f99ee2],,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,
3,3,49233ae2-594f-43c9-a58c-a6a0b8f99ee2,4,1,00:00:00.000,0,0,1,9.813,18,Half Start,785,Croatia,1,Regular Play,776,Denmark,,,[53f0c3f9-129e-47b5-ba77-ae9d214df56f],,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,
4,4,ec5ba260-9bd3-4c5a-b7a5-9f9168ea905d,5,1,00:00:01.013,0,1,2,1.64,30,Pass,776,Denmark,9,From Kick Off,776,Denmark,,,[797a8169-c17f-4dbe-aa71-406cf0cf1bd6],"[61.0, 41.0]",3043.0,Christian Dannemann Eriksen,25.0,Secondary Striker,3027.0,Mathias Jattah-Njie Jørgensen,24.33105,-2.976444,1.0,Ground Pass,"[37.0, 37.0]",40.0,Right Foot,65.0,Kick Off,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,


In [51]:
df_events_flat.shape

(227886, 123)

##### Export DataFrame

In [52]:
# Export DataFrame as a CSV file

## 
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'events', 'events_wc2018.csv')):
    df_events_flat.to_csv(os.path.join(data_dir_sb, 'raw', 'events', 'events_wc2018.csv'), index=None, header=True)

##     
else:
    pass

### <a id='#section3.4'>3.4. Join the Datasets</a>
The final step of the data parsing is to join the `Matches` DataFrame and the `Competition` DataFrames to the `Events` DataFrame. The `Events` data is the base DataFrame in which we join the other tables via `match_id` and `competition.competition_id`.

In [53]:
# Read in exported CSV file if exists, if not, merge the individual DataFrames
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_wc2018.csv')):
    
    # Join the Matches DataFrame to the Events DataFrame
    df_events_matches = pd.merge(df_events_flat, df_matches_flat, left_on=['match_id'], right_on=['match_id'])

    # Join the Competitions DataFrame to the Events-Matches DataFrame
    df_events_matches_competitions = pd.merge(df_events_matches, df_competitions_flat, left_on=['competition.competition_id', 'season.season_id'], right_on=['competition_id', 'season_id'])
    
else:    
    df_events_matches_competitions = pd.read_csv(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_wc2018.csv'))
    
    
# Display DataFrame
df_events_matches_competitions.head()

Unnamed: 0,level_0,id,index,period,timestamp,minute,second,possession,duration,type.id,type.name,possession_team.id,possession_team.name,play_pattern.id,play_pattern.name,team.id,team.name,tactics.formation,tactics.lineup,related_events,location,player.id,player.name,position.id,position.name,pass.recipient.id,pass.recipient.name,pass.length,pass.angle,pass.height.id,pass.height.name,pass.end_location,pass.body_part.id,pass.body_part.name,pass.type.id,pass.type.name,under_pressure,carry.end_location,pass.outcome.id,pass.outcome.name,pass.aerial_won,duel.type.id,duel.type.name,ball_receipt.outcome.id,ball_receipt.outcome.name,pass.switch,pass.assisted_shot_id,pass.goal_assist,shot.statsbomb_xg,shot.end_location,shot.key_pass_id,shot.outcome.id,shot.outcome.name,shot.body_part.id,shot.body_part.name,shot.type.id,shot.type.name,shot.technique.id,shot.technique.name,shot.freeze_frame,goalkeeper.outcome.id,goalkeeper.outcome.name,goalkeeper.body_part.id,goalkeeper.body_part.name,goalkeeper.type.id,goalkeeper.type.name,goalkeeper.position.id,goalkeeper.position.name,goalkeeper.technique.id,goalkeeper.technique.name,shot.first_time,counterpress,foul_committed.offensive,foul_won.defensive,pass.cross,goalkeeper.end_location,clearance.aerial_won,dribble.outcome.id,dribble.outcome.name,duel.outcome.id,duel.outcome.name,pass.deflected,block.offensive,block.deflection,dribble.overrun,pass.shot_assist,interception.outcome.id,interception.outcome.name,miscontrol.aerial_won,ball_recovery.recovery_failure,foul_committed.advantage,foul_won.advantage,dribble.nutmeg,shot.aerial_won,pass.backheel,50_50.outcome.id,50_50.outcome.name,ball_recovery.offensive,substitution.outcome.id,substitution.outcome.name,substitution.replacement.id,substitution.replacement.name,foul_committed.type.id,foul_committed.type.name,pass.through_ball,pass.technique.id,pass.technique.name,foul_committed.card.id,foul_committed.card.name,foul_committed.penalty,foul_won.penalty,match_id,pass.cut_back,injury_stoppage.in_chain,shot.one_on_one,block.save_block,pass.miscommunication,bad_behaviour.card.id,bad_behaviour.card.name,shot.open_goal,shot.deflected,shot.redirect,shot.follows_dribble,match_date,kick_off,home_score,away_score,match_status,last_updated,match_week,competition.competition_id,competition.country_name,competition.competition_name,season.season_id,season.season_name,home_team.home_team_id,home_team.home_team_name,home_team.home_team_gender,home_team.home_team_group,home_team.country.id,home_team.country.name,home_team.managers,away_team.away_team_id,away_team.away_team_name,away_team.away_team_gender,away_team.away_team_group,away_team.country.id,away_team.country.name,away_team.managers,metadata.data_version,competition_stage.id,competition_stage.name,stadium.id,stadium.name,stadium.country.id,stadium.country.name,referee.id,referee.name,referee.country.id,referee.country.name,competition_id,season_id,country_name,competition_name,competition_gender,season_name,match_updated,match_available
0,0,0aa135b8-37b4-4482-adc7-f02e85a19bec,1,1,00:00:00.000,0,0,1,0.0,35,Starting XI,785,Croatia,1,Regular Play,785,Croatia,4141.0,"[{'player': {'id': 3444, 'name': 'Danijel Suba...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,,2018-07-01,20:00:00.000,1,1,available,2020-07-29T05:00,4,43,International,FIFA World Cup,3,2018,785,Croatia,male,,56,Croatia,"[{'id': 307, 'name': 'Zlatko Dalić', 'nickname...",776,Denmark,male,,61,Denmark,"[{'id': 641, 'name': 'Åge Fridtjof Hareide', '...",1.0.2,33,Round of 16,4263.0,Stadion Nizhny Novgorod,188.0,Russia,730.0,N. Pitana,,,43,3,International,FIFA World Cup,male,2018,2020-10-25T14:03:50.263266,2020-10-25T14:03:50.263266
1,1,086b7750-936b-4ffd-9a00-bfd72c7a0f26,2,1,00:00:00.000,0,0,1,0.0,35,Starting XI,785,Croatia,1,Regular Play,776,Denmark,4411.0,"[{'player': {'id': 3815, 'name': 'Kasper Schme...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,,2018-07-01,20:00:00.000,1,1,available,2020-07-29T05:00,4,43,International,FIFA World Cup,3,2018,785,Croatia,male,,56,Croatia,"[{'id': 307, 'name': 'Zlatko Dalić', 'nickname...",776,Denmark,male,,61,Denmark,"[{'id': 641, 'name': 'Åge Fridtjof Hareide', '...",1.0.2,33,Round of 16,4263.0,Stadion Nizhny Novgorod,188.0,Russia,730.0,N. Pitana,,,43,3,International,FIFA World Cup,male,2018,2020-10-25T14:03:50.263266,2020-10-25T14:03:50.263266
2,2,53f0c3f9-129e-47b5-ba77-ae9d214df56f,3,1,00:00:00.000,0,0,1,0.0,18,Half Start,785,Croatia,1,Regular Play,785,Croatia,,,[49233ae2-594f-43c9-a58c-a6a0b8f99ee2],,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,,2018-07-01,20:00:00.000,1,1,available,2020-07-29T05:00,4,43,International,FIFA World Cup,3,2018,785,Croatia,male,,56,Croatia,"[{'id': 307, 'name': 'Zlatko Dalić', 'nickname...",776,Denmark,male,,61,Denmark,"[{'id': 641, 'name': 'Åge Fridtjof Hareide', '...",1.0.2,33,Round of 16,4263.0,Stadion Nizhny Novgorod,188.0,Russia,730.0,N. Pitana,,,43,3,International,FIFA World Cup,male,2018,2020-10-25T14:03:50.263266,2020-10-25T14:03:50.263266
3,3,49233ae2-594f-43c9-a58c-a6a0b8f99ee2,4,1,00:00:00.000,0,0,1,9.813,18,Half Start,785,Croatia,1,Regular Play,776,Denmark,,,[53f0c3f9-129e-47b5-ba77-ae9d214df56f],,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,,2018-07-01,20:00:00.000,1,1,available,2020-07-29T05:00,4,43,International,FIFA World Cup,3,2018,785,Croatia,male,,56,Croatia,"[{'id': 307, 'name': 'Zlatko Dalić', 'nickname...",776,Denmark,male,,61,Denmark,"[{'id': 641, 'name': 'Åge Fridtjof Hareide', '...",1.0.2,33,Round of 16,4263.0,Stadion Nizhny Novgorod,188.0,Russia,730.0,N. Pitana,,,43,3,International,FIFA World Cup,male,2018,2020-10-25T14:03:50.263266,2020-10-25T14:03:50.263266
4,4,ec5ba260-9bd3-4c5a-b7a5-9f9168ea905d,5,1,00:00:01.013,0,1,2,1.64,30,Pass,776,Denmark,9,From Kick Off,776,Denmark,,,[797a8169-c17f-4dbe-aa71-406cf0cf1bd6],"[61.0, 41.0]",3043.0,Christian Dannemann Eriksen,25.0,Secondary Striker,3027.0,Mathias Jattah-Njie Jørgensen,24.33105,-2.976444,1.0,Ground Pass,"[37.0, 37.0]",40.0,Right Foot,65.0,Kick Off,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7581,,,,,,,,,,,,2018-07-01,20:00:00.000,1,1,available,2020-07-29T05:00,4,43,International,FIFA World Cup,3,2018,785,Croatia,male,,56,Croatia,"[{'id': 307, 'name': 'Zlatko Dalić', 'nickname...",776,Denmark,male,,61,Denmark,"[{'id': 641, 'name': 'Åge Fridtjof Hareide', '...",1.0.2,33,Round of 16,4263.0,Stadion Nizhny Novgorod,188.0,Russia,730.0,N. Pitana,,,43,3,International,FIFA World Cup,male,2018,2020-10-25T14:03:50.263266,2020-10-25T14:03:50.263266


In [54]:
print('No. rows in Events DataFrame BEFORE join to Matches and Competitions DataFrames: {}'.format(len(df_events_flat)))
print('No. rows in DataFrame AFTER join: {}\n'.format(len(df_events_matches_competitions)))
print('-'*10+'\n')
print('Variance in rows before and after join: {}\n'.format(len(df_events_matches_competitions) - len(df_events_flat)))

No. rows in Events DataFrame BEFORE join to Matches and Competitions DataFrames: 227886
No. rows in DataFrame AFTER join: 227886

----------

Variance in rows before and after join: 0



##### Export DataFrame

In [55]:
# Export DataFrame as a CSV file

##
if not os.path.exists(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_wc2018.csv')):
    df_events_matches_competitions.to_csv(os.path.join(data_dir_sb, 'raw', 'combined', 'combined_wc2018.csv'), index=None, header=True)

##    
else:
    pass

### <a id='#section3.4'>3.4. Initial Data Handling</a>
Let's quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods.

#### <a id='#section3.4.1'>3.4.1. Summary Report</a>
Initial step of the data handling and Exploratory Data Analysis (EDA) is to create a quick summary report of the dataset using [pandas Profiling Report](https://github.com/pandas-profiling/pandas-profiling).

In [None]:
# Summary of the data using pandas Profiling Report
#pp.ProfileReport(df_events_matches_competitions)

#### <a id='#section3.4.2'>3.4.2. Further Inspection</a>
The following commands go into more bespoke summary of the dataset. Some of the commands include content covered in the [pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) summary above, but using the standard [pandas](https://pandas.pydata.org/) functions and methods that most peoplem will be more familiar with.

First check the quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods.

In [None]:
# Display the first five rows of the DataFrame, df_events_matches_competitions
df_events_matches_competitions.head()

In [None]:
# Display the last five rows of the DataFrame, df_events_matches_competitions
df_events_matches_competitions.tail()

In [None]:
# Print the shape of the DataFrame, df_events_matches_competitions
print(df_events_matches_competitions.shape)

In [None]:
# Print the column names of the DataFrame, df_events_matches_competitions
print(df_events_matches_competitions.columns)

In [None]:
# Data types of the features of the raw DataFrame, df_events_matches_competitions
df_events_matches_competitions.dtypes

In [None]:
# Displays all columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df_events_matches_competitions.dtypes)

Full details of these attributes and their data types can be found in the [Data Dictionary](section3.3.1).

In [None]:
# Counts of missing values
null_value_stats = df_events_matches_competitions.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]

## <a id='#section4'>4. Summary</a>
This notebook parses JSON data from the [StatsBomb Open Data GitHub repository](https://github.com/statsbomb/open-data) using [pandas](http://pandas.pydata.org/), to create several datasets for visualisation in [Tableau](https://public.tableau.com/profile/edd.webster).

## <a id='#section5'>5. Next Steps</a>
The next stage is to engineer this DataFrame into a form that can then be visualised in Tableau.

## <a id='#section5'>5. References</a>
*    [StatsBomb](https://statsbomb.com/) data
*    [StatsBomb](https://github.com/statsbomb/open-data/tree/master/data) open data GitHub repository

---

***Visit my website [eddwebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***

[Back to the top](#top)