<a id='top'></a>

# Second Spectrum Data Visualisation
##### Notebook to visualise and analyse engineered [Second Spectrum](https://www.secondspectrum.com/index.html) Tracking data using [pandas](http://pandas.pydata.org/) and [matplotlib](https://matplotlib.org/).

### By [Edd Webster](https://www.twitter.com/eddwebster)
Notebook first written: 20/01/2022<br>
Notebook last updated: 01/02/2022

![Pitch Control Screenshot](../../img/pitch_control_screenshot.png)

![Watford F.C.](../../img/logos/second_spectrum_logo.jpeg)

![Second Spectrum](../../img/club_badges/premier_league/watford_fc_logo_small.png)

Click [here](#section4) to jump straight into the Data Analysis section and skip the [Notebook Brief](#section2) and [Data Sources](#section3) sections.

___


## <a id='introduction'>Introduction</a>
This notebook analyses and visualises [Second Spectrum](https://www.secondspectrum.com/index.html) Tracking data for two matches, that have been provided by [Watford F.C](https://www.watfordfc.com/), using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames and [matplotlib](https://matplotlib.org/) for data visualisation.

For more information about this notebook and the author, I am available through all the following channels:
*    [eddwebster.com](https://www.eddwebster.com/);
*    edd.j.webster@gmail.com;
*    [@eddwebster](https://www.twitter.com/eddwebster);
*    [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);
*    [github/eddwebster](https://github.com/eddwebster/); and
*    [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster).

A static version of this notebook can be found [here](https://nbviewer.org/github/eddwebster/statsbomb/blob/main/notebooks/StatsBomb%20Data%20Engineering.ipynb). This notebook has an accompanying [`watford`](https://github.com/eddwebster/watford) GitHub repository and for my full repository of football analysis, see my [`football_analysis`](https://github.com/eddwebster/football_analytics) GitHub repository.

___

## <a id='notebook_contents'>Notebook Contents</a>
1.    [Notebook Dependencies](#section1)<br>
2.    [Notebook Brief](#section2)<br>
3.    [Data Sources](#section3)<br>
      1.    [Introduction](#section3.1)<br>
      2.    [Data Dicitonary](#section3.2)<br>
      3.    [Import the Data](#section3.3)<br>
      4.    [Initial Data Handling](#section3.4)<br>
4.    [Exploratory Data Analysis (EDA)](#section4)<br>
5.    [Data Analysis](#section5)<br>
      1.    [Crystal Palace (1) vs. (1) Brighton & Hove Albion (27/09/2021)](#section5.1)<br>
      2.    [Crystal Palace (2) vs. (2) Leicester City (03/10/2021)](#section5.2)<br>
6.    [Summary](#section6)<br>
7.    [Next Steps](#section7)<br>
8.    [References](#section8)<br>

___

<a id='section1'></a>

## <a id='#section1'>1. Notebook Dependencies</a>

This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:
*    [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;
*    [`NumPy`](http://www.numpy.org/) for multidimensional array computing;
*    [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation; and 
*    [`matplotlib`](https://matplotlib.org/) for data visualisation.

All packages used for this notebook can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/).

### Import Libraries and Modules

In [None]:
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv

# Import Dependencies
%matplotlib inline

# Math Operations
import numpy as np
import math
from math import pi

# Datetime
import datetime
from datetime import date
import time

# Data Preprocessing
import pandas as pd
import pandas_profiling as pp
import os
import re
import chardet
import random
from io import BytesIO
from pathlib import Path

# Kloppy
from kloppy import secondspectrum

# Reading Directories
import glob
import os

# Working with JSON
import json
from pandas import json_normalize

# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from matplotlib.colors import LinearSegmentedColormap
from matplotlib import patches
import seaborn as sns
import missingno as msno
import moviepy.editor as mpy
from moviepy.video.io.bindings import mplfig_to_npimage

# Requests and downloads
import tqdm
import requests

# Machine Learning
import scipy.signal as signal
from scipy.spatial import Voronoi, voronoi_plot_2d, Delaunay

# Display in Jupyter
from IPython.display import Image, Video, YouTubeVideo
from IPython.core.display import HTML

# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

# Print message
print('Setup Complete')

In [None]:
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))

### Defined Filepaths

In [None]:
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..')
data_dir = os.path.join(base_dir, 'data')
data_dir_second_spectrum = os.path.join(base_dir, 'data', 'second_spectrum')
data_dir_opta = os.path.join(base_dir, 'data', 'opta')
scripts_dir = os.path.join(base_dir, 'scripts')
scripts_dir_second_spectrum = os.path.join(base_dir, 'scripts', 'second_spectrum')
scripts_dir_metrica_sports = os.path.join(base_dir, 'scripts', 'metrica_sports')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
fig_dir_second_spectrum = os.path.join(base_dir, 'img', 'fig', 'second_spectrum')
video_dir = os.path.join(base_dir, 'video')
video_dir_second_spectrum = os.path.join(base_dir, 'video', 'fig', 'second_spectrum')

### Defined Variables

In [None]:
# Define variables

## Define pitch dimensions
pitch_length = 106.0
pitch_width = 68

## Team colours
colour_crystal_palace = 'r'
colour_leicester_city = 'b'
colour_brighton_and_hove_albion = 'c'

#colour_crystal_palace = '#C4122E'
#colour_leicester_city = '#003090'
#colour_brighton_and_hove_albion = '#0057B8'

### Custom Libraries for Tracking Data
Custom libraries for working with the [Second Spectrum](https://www.secondspectrum.com/index.html) data, that were initially written by [Laurie Shaw](https://twitter.com/EightyFivePoint), to work with the [Metrica Sports](https://metrica-sports.com/) data. See the following for his original code [[link](https://github.com/Friends-of-Tracking-Data-FoTD/LaurieOnTracking)].

The modifications to this data include the ability to create Pitch Control models without Tracking data.

In [None]:
# Custom libraries for working with Tracking data

## Define path of scripts
sys.path.insert(0, os.path.abspath(scripts_dir_second_spectrum))

## Second Spectrum scripts - custom scripts derived from Laurie Shaw's Metrica scripts
import Second_Spectrum_IO as sio
import Second_Spectrum_Viz as sviz
import Second_Spectrum_Velocities as svel
import Second_Spectrum_PitchControl as spc
import Second_Spectrum_EPV as sepv

In [None]:
"""
## Laurie Shaw's custom libraries for working with Metrica Sports data
import Metrica_IO as mio
import Metrica_Viz as mviz
import Metrica_Velocities as mvel
import Metrica_PitchControl as mpc
import Metrica_EPV as mepv
"""

### Custom Functions
Functions built on top of the Second Spectrum scripts, designed for use in this notebook.

In [None]:
# Define function to estimate the frame number for a timestamp in the data
def estimate_frame_no(minutes, seconds, frame_rate=25):
    
    '''
    Function to estimate the frame number for a timestamp in the data.
    '''
    
    frame_no = (minutes * 60 * frame_rate) + (seconds * frame_rate)
    
    return frame_no

In [None]:
# Define function to 
def create_tracking_data_video(df_tracking_home,
                               df_tracking_away,
                               frame_start,
                               video_length_frames,
                               filename
                              ):
    
    """
    Function to create Tracking data videos.
    """
    
    ## Create MP4 video of tracking data
    if not os.path.exists(video_dir_second_spectrum + f'/{filename}_{frame_start}_{frame_start+video_length_frames}.mp4'):
        sviz.save_match_clip(hometeam=df_tracking_home.iloc[frame_start:frame_start+video_length_frames],
                             awayteam=df_tracking_away.iloc[frame_start:frame_start+video_length_frames],
                             fpath=video_dir_second_spectrum,
                             fname=f'/{filename}_{frame_start}_{frame_start+video_length_frames}',
                             frames_per_second=25,
                             team_colors=('r','b'),
                             field_dimen = (106.0,68.0),
                             include_player_velocities=False,
                             PlayerMarkerSize=10,
                             PlayerAlpha=0.7
                            )
    else:
        pass

In [None]:
# Define function to create Pitch Control visualisation for individual frames
def create_pitch_control_frame(frame_idx,
                               tracking_home,
                               tracking_away,
                               colour_home,
                               colour_away,
                               filename
                              ):
    
    """
    Function to create Pitch Control visualisation for individual frames
    """
    
    ## Define cmap
    cmap=LinearSegmentedColormap.from_list('mycmap', [colour_away, 'white', colour_home])
    
    ## Create MP4 video of tracking data
    if not os.path.exists(fig_dir_second_spectrum + f'/pitch_control_frame_{filename}_{frame_idx}.mp4'):
        PPCF, xgrid, ygrid = spc.generate_pitch_control_for_event(frame_idx, tracking_home, tracking_home, tracking_away, params, GK_numbers, n_grid_cells_x=50)
        sviz.plot_pitchcontrol_for_event(frame_idx,
                                         tracking_home,
                                         tracking_home=tracking_home,
                                         tracking_away=tracking_away,
                                         PPCF=PPCF,
                                         cmap=cmap,
                                         include_player_velocities=False,
                                         annotate=True
                                        )
        
        plt.savefig(fig_dir_second_spectrum + f'/pitch_control_frame_{filename}_{frame_idx}.png', dpi=300, format='png', transparent=False, bbox_inches='tight')
            
    else:
        pass

In [None]:
"""
# NOT WORKING PERFECTLY AT THE MOMENT, CODE USED BUT NOT IN A FUNCTION CURRENTLY

# Visualise player positions using generate_pitch_control_for_event function from spc library
def make_frame(t):
    t2 = int(math.ceil(t*f+0.0001)-1)
    #PPCF,xgrid,ygrid = spc.generate_pitch_control_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, params, GK_numbers, field_dimen = (pitch_length, pitch_width), n_grid_cells_x=50)
    #fig, ax = sviz.plot_pitchcontrol_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, PPCF, cmap, annotate=False)
    PPCF, xgrid, ygrid = spc.generate_pitch_control_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, params, GK_numbers, field_dimen = (pitch_length, pitch_width), n_grid_cells_x=50)
    fig, ax = sviz.plot_pitchcontrol_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, PPCF, cmap, include_player_velocities=False, annotate=True)
    image = mplfig_to_npimage(fig)
    return image    # returns a 8-bit RGB array


# Define function to create Pitch Control visualisation for individual frames
def create_pitch_control_video(starting_frame,
                               end_frame,
                               f,
                               tracking_home,
                               tracking_away,
                               colour_home,
                               colour_away,
                               pitch_length,
                               pitch_width,
                               filename
                              ):
    
"""
"""
    # Function to create Pitch Control visualisation for a video of frames
"""
"""
    
    ## Create colour map
    cmap = LinearSegmentedColormap.from_list('mycmap', [colour_away, 'white', colour_home]),
    
    ## Create MP4 video of tracking data
    if not os.path.exists(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4'):
        clip = mpy.VideoClip(make_frame, duration=((end_frame-starting_frame)/f)).set_fps(f)
        clip.write_videofile(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4')
            
    else:
        pass
"""

In [None]:
# Define function to get the top speeds for a Tracking data DataFrame
def get_max_speeds(df):
    top_speeds = {}

    v_columns = [i for i in df.columns if '_speed' in i]

    for i in v_columns:
        p = i.split('_')[1]

        #max_ind = tracking_home[i].idxmax()    
        top_speeds['Player_' + p] = df[i].max()
    return top_speeds

In [None]:
# Define a function to generate a bespoke physical summary of all the players for the the team of interest
def create_physical_report(df_tracking,
                           team,
                           match,
                           filename
                          ):
    
    ## Data Engineering
    
    ### Drop player that did not play
    temp = df_tracking.dropna(axis=1, how='all')

    ### Create DataFrame to identify the jersey number of the players
    df_players = np.unique([ c.split('_')[1] for c in df_tracking.columns if c[:4] == team])

    ### Create DataFrame where each row is a player
    df_summary = pd.DataFrame(index=df_players)
    
    
    
    ## Calculate minutes played for each player

    ### Create emplty list for minutes
    lst_minutes = []

    ### Cycle through each player's jersey number in the team and look for the first and last time for each player
    for player in df_players:
        
        #### Search for first and last frames that we have a position observation for each player (when a player is not on the pitch positions are NaN)
        column = f'{team}_' + player + '_x' # use player x-position coordinate
        try:
            player_minutes = (df_tracking[column].last_valid_index() - df_tracking[column].first_valid_index() + 1 ) / 25 / 60     # convert to minutes
        except:
            player_minutes = 0
        lst_minutes.append(player_minutes)
        
    ### Create column for the minute played
    df_summary['Minutes Played'] = lst_minutes

    ### Sort values by minutes played descending
    df_summary = df_summary.sort_values(['Minutes Played'], ascending=False)


    
    ## Calculate total distance covered for each player

    ### Create empty list for distance
    lst_distance = []

    ### Cycle through each player's jersey number in the team and multiple their speed at any given instance by 40ms to get total distance and divide by 1,000 to get this in km
    for player in df_summary.index:
        column = f'{team}_' + player + '_speed'
        df_player_distance = df_tracking[column].sum()/25./1000    # speed time time. Convert to km
        lst_distance.append(df_player_distance)

    ### Create column for the distance in km
    df_summary['Distance [km]'] = lst_distance


    """    
    ## Make a simple bar chart of distance covered for each player
    plt.subplots()
    ax = df_summary['Distance [km]'].plot.bar(rot=0)
    ax.set_xlabel('Player')
    ax.set_ylabel('Distance covered [km]')
    """
    
    
    ## Calculate total distance covered for each player for different types of movement

    ### Create empty list for distance
    lst_distance = []

    ### Cycle through each player's jersey number in the team and multiple their speed at any given instance by 40ms to get total distance and divide by 1,000 to get this in km
    for player in df_summary.index:
        column = f'{team}_' + player + '_speed'
        df_player_distance = df_tracking[column].sum()/25./1000    # speed time time. Convert to km
        lst_distance.append(df_player_distance)
    
    ### Create column for the distance in km
    df_summary['Distance [km]'] = lst_distance
    
    
    
    ## Calculate total distance covered for each player for different types of movement

    ### Create empty lists for distances of different movements
    lst_walking = []
    lst_jogging = []
    lst_running = []
    lst_sprinting = []

    ### Cycle through each player's jersey number in the team and 
    for player in df_summary.index:
        column = f'{team}_' + player + '_speed'
        ### Walking (less than 2m/s)
        player_distance = df_tracking.loc[df_tracking[column] <2, column].sum()/25./1000
        lst_walking.append(player_distance)
        ### Jogging (between 2 and 4 m/s)
        player_distance = df_tracking.loc[(df_tracking[column] >= 2) & (df_tracking[column] < 4), column].sum()/25./1000
        lst_jogging.append(player_distance)
        ### Running (between 4 and 7 m/s)
        player_distance = df_tracking.loc[(df_tracking[column] >= 4) & (df_tracking[column] < 7), column].sum()/25./1000
        lst_running.append(player_distance)
        ### Sprinting (greater than 7 m/s)
        player_distance = df_tracking.loc[df_tracking[column] >= 7, column].sum()/25./1000
        lst_sprinting.append(player_distance)

    ### Assign each movement list to a column in the Summary DataFrame
    df_summary['Walking [km]'] = lst_walking
    df_summary['Jogging [km]'] = lst_jogging
    df_summary['Running [km]'] = lst_running
    df_summary['Sprinting [km]'] = lst_sprinting
    
    
    """    
    ## Make a clustered bar chart of distance covered for each player at each speed
    ax = df_summary[['Walking [km]', 'Jogging [km]', 'Running [km]', 'Sprinting [km]']].plot.bar(colormap='coolwarm')
    ax.set_xlabel('Player')
    ax.set_ylabel('Distance covered [m]')
    """    
    
    ## Reset index
    df_summary = df_summary.reset_index(drop=False)
   
    ## Rename columns
    df_summary = df_summary.rename(columns={'index': 'JerseyNo'})   
    
    
    ## Add columns for team and match
    df_summary['Team'] = team
    df_summary['Match'] = match
    
    
    
    ## Determine the number of sustained sprints per match

    ### Create an empty list for the number of sprints
    nsprints = []

    ###
    sprint_threshold = 7 # minimum speed to be defined as a sprint {m/s}
    sprint_window = 1 * 25

    ### Create a list of the individual playue
    lst_players = df_summary['JerseyNo'].unique().tolist()

    ###
    for player in lst_players:
        column = f'{team}_' + player + '_speed'
        # trick here is to convolve speed with a window of size 'sprint_window', and find number of occassions that sprint was sustained for at least one window length
        # diff helps us to identify when the window starts
        player_sprints = np.diff(1 * (np.convolve(1 * (df_tracking[column] >= sprint_threshold), np.ones(sprint_window), mode='same') >= sprint_window))
        nsprints.append(np.sum(player_sprints == 1 ))

    ### Add column for the number of sprints
    df_summary['# sprints'] = nsprints
    
    
    
    ## Estimate the top speed of each player
    
    ### Create dictionaries of the top speeds
    dict_top_speeds = get_max_speeds(df_tracking)
    
    ### 
    df_top_speeds = pd.DataFrame.from_dict(dict_top_speeds , orient='index', columns=['Top Speed'])
    df_top_speeds = df_top_speeds.reset_index(drop=False)
    df_top_speeds = df_top_speeds.rename(columns={'index': 'Player'})
    df_top_speeds['Player'] = df_top_speeds['Player'].str.replace('Player_', '')
    
    ### 
    df_summary = pd.merge(df_summary, df_top_speeds, left_on=['JerseyNo'], right_on=['Player'], how='left')
    
    
    
    """    
    ## Save figure
    if not os.path.exists(fig_dir_second_spectrum + f'/physical_summary_{filename}.png'):
        plt.savefig(fig_dir_second_spectrum + f'/physical_summary_{filename}.png', dpi=300, format='png', transparent=False, bbox_inches='tight')
    else:
        pass
    """    
    
    
    
    
    ## Export DataFrame
    if not os.path.exists(os.path.join(data_dir_second_spectrum, 'engineered', 'reports', f'physical_report_data_{filename}_{team}.csv')):
        df_summary.to_csv(os.path.join(data_dir_second_spectrum, 'engineered', 'reports', f'physical_report_data_{filename}_{team}.csv'), index=None, header=True)
    else:
        pass

    
    ## Return DataFrame
    return df_summary

In [None]:
# Define a function to plot the trajectories of sprints for a selected player
def visualise_sprints(df_tracking,
                      team,
                      player,
                      sprint_threshold,
                      sprint_window,
                      dot_colour,
                      line_colour
                     ):
    
    player = str(player)
    
    ## Plot the trajectories for the sprints of the player of interest
    column = f'{team}_' + player + '_speed' # speed
    column_x = f'{team}_' + player + '_x' # x position
    column_y = f'{team}_' + player + '_y' # y position

    ## same trick as before to find start and end indices of windows of size 'sprint_window' in which player speed was above the sprint_threshold
    player_sprints = np.diff(1 * (np.convolve(1 * (df_tracking[column] >= sprint_threshold), np.ones(sprint_window), mode='same') >= sprint_window))
    player_sprints_start = np.where(player_sprints == 1)[0] - int(sprint_window/2) + 1 # adding sprint_window/2 because of the way that the convolution is centred
    player_sprints_end = np.where(player_sprints == -1)[0] + int(sprint_window/2) + 1

    ## Print frames in which player 10 started sprints
    #print(player_sprints_start)

    ## Print frames in which player 10 started sprints
    #print(player_sprints_end)


    ## Now plot all the sprints
    fig, ax = sviz.plot_pitch()
    for s,e in zip(player_sprints_start, player_sprints_end):
        ax.plot(df_tracking[column_x].iloc[s], df_tracking[column_y].iloc[s], 'ro')
        ax.plot(df_tracking[column_x].iloc[s:e+1], df_tracking[column_y].iloc[s:e+1], 'r')

    ## Save figure
    #if not os.path.exists(fig_dir_signality + '/hammarby_if_elfsborg_22072019_sprints_player_2.png'):
    #    plt.savefig(fig_dir_signality + '/hammarby_if_elfsborg_22072019_sprints_player_2.png', dpi=300, format='png', transparent=False, bbox_inches='tight')
    #else:
    #    pass

In [None]:
# Define a function to engineer Opta event data for just shots
def events_shots_only(df_events):
    
    ## Filter only for shots
    df_shots = df_events[df_events['isShot'] == 1]

    
    ## Select columns of interest
    
    ### Define list of columns 
    lst_cols_shots = ['event_id', 'period_id', 'min', 'sec', 'x', 'y', 'event_name', 'Home Team', 'Away Team', 'FirstName', 'LastName', 'FullName', 'Position', 'JerseyNo', 'outcome', 'isShot', 'isGoal', 'xG', 'xT']
    
    ### Filter shots DataFrame for only columns of interest
    df_shots = df_shots[lst_cols_shots]

    
    ## Return DataFrame
    return df_shots

### Notebook Settings

In [None]:
# Display all columns of displayed pandas DataFrames
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)
pd.options.mode.chained_assignment = None

---

<a id='section2'></a>

## <a id='#section2'>2. Notebook Brief</a>
This notebook analyses and visualises [Second Spectrum](https://www.secondspectrum.com/index.html) using [pandas](http://pandas.pydata.org/) and [matplotlib](https://matplotlib.org/), for two Premier League matches featuring Crystal Palace during the 21/22 season.

The two datasets of Tracking data are visualised and analysed in congunction with the corresponding [Opta Event data](https://www.statsperform.com/opta/) by [Stats Perform](https://www.statsperform.com/), as part of an opposition analysis piece of Crystal Palace. These matches are: 
*   [27/09/2021: Crystal Palace (1) vs. (1) Brighton & Hove Albion](https://www.bbc.co.uk/sport/football/58620544) (g2210324)
*   [03/10/2021: Crystal Palace (2) vs. (2) Leicester City](https://www.bbc.co.uk/sport/football/58667896) (g2210334)

Output of this notebook include [matplotlib](https://matplotlib.org/) data visualisations and DataFrames exported as CSV files, that can be further analysed and explored using dashboarding tools such as Tableau.


**Notebook Conventions**:<br>
*    Variables that refer a `DataFrame` object are prefixed with `df_`.
*    Variables that refer to a collection of `DataFrame` objects (e.g., a list, a set or a dict) are prefixed with `dfs_`.

---

<a id='section3'></a>

## <a id='#section3'>3. Data Sources</a>

<a id='section3.1'></a>

### <a id='#section3.1'>3.1. Introduction</a>
[Second Spectrum](https://www.secondspectrum.com/index.html) are a  football analytics, data provider ... 

![Second Spectrum](../../img/logos/second_spectrum_logo.jpeg)

The tracking data represents the location of every player on the pitch with a temporal frequency of 25 Hz and the corresponding match time for each tracking frame is specified.

UPDATE THIS

<a id='section3.2'></a>

### <a id='#section3.2'>3.2. Data Dictionary</a>
The [Second Spectrum](https://www.secondspectrum.com/index.html) Tracking dataset has fourteen features (columns) with the following definitions and data types:

| Feature         | Data type     | Definition     |
|-----------------|---------------|----------------|
| `Frame`         | object        |                |
| `Period`        | object        |                |
| `Time [s]`      | object        |                |
| `Home_11_x`     | object        |                |	
| `Home_11_y`     | object        |                |
| `Away_8_x`      | object        |	               |
| `Away_8_y`      | object        |                |

UPDATE THIS

<a id='section3.2'></a>

### <a id='#section3.2'>3.2. Import Data</a>
The following cells read in the previously engineered `CSV` files as [pandas](https://pandas.pydata.org/) DataFrames. The Tracking data for Home and Away teams have been saved as separate `CSV` files, to be compatible with the Second Spectrum custom scripts.

A static version of the Data Engineering notebook that munges the original data into a form ready for analysis can be found [here](https://nbviewer.org/github/eddwebster/watford/blob/main/notebooks/2_data_engineering/Second%20Spectrum%20Data%20Engineering.ipynb) in the [Data Engineering]() subfolder of the accompanying [`watford`](https://github.com/eddwebster/watford) GitHub repository 

In [None]:
# Show files in directory
print(glob.glob(os.path.join(data_dir_second_spectrum, 'engineered', 'data/*.csv')))

In [None]:
# Read in engineered Tracking data CSV files as  pandas DataFrames

## 27/09/2021: Crystal Palace (1) vs. (1) Brighton & Hove Albion (g2210324)
df_tracking_home_cry_bri = pd.read_csv(os.path.join(data_dir_second_spectrum, 'engineered', 'data', 'g2210324_SecondSpectrum_Trackingdata_Home.csv'))
df_tracking_away_cry_bri = pd.read_csv(os.path.join(data_dir_second_spectrum, 'engineered', 'data', 'g2210324_SecondSpectrum_Trackingdata_Away.csv'))

## 03/10/2021: Crystal Palace (2) vs. (2) Leicester City (g2210334)
df_tracking_home_cry_lei = pd.read_csv(os.path.join(data_dir_second_spectrum, 'engineered', 'data', 'g2210334_SecondSpectrum_Trackingdata_Home.csv'))
df_tracking_away_cry_lei = pd.read_csv(os.path.join(data_dir_second_spectrum, 'engineered', 'data', 'g2210334_SecondSpectrum_Trackingdata_Away.csv'))

In [None]:
# Read in engineered Event data CSV files as  pandas DataFrames
df_cry_bri_events = pd.read_csv(os.path.join(data_dir_opta, 'engineered', 'F24', 'f24-8-2021-2210334-eventdetails.csv'), index_col=None).drop(['Unnamed: 0'], axis=1)
df_cry_lei_events = pd.read_csv(os.path.join(data_dir_opta, 'engineered', 'F24', 'f24-8-2021-2210324-eventdetails.csv'), index_col=None).drop(['Unnamed: 0'], axis=1)

<a id='section3.3'></a>

### <a id='#section3.3'>3.3. Initial Data Handling</a>
First check the quality of the dataset by looking first and last rows in pandas using the [`head()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [`tail()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods.

In [None]:
# Display the first five rows of the DataFrame, df_tracking_home_cry_lei
df_tracking_home_cry_lei.head()

In [None]:
# Display the last five rows of the DataFrame, 
df_tracking_home_cry_lei.tail()

In [None]:
# Print the shape of the DataFrame, df_tracking_home_cry_lei
print(df_tracking_home_cry_lei.shape)

In [None]:
# Print the column names of the DataFrame, df_tracking_home_cry_lei
print(df_tracking_home_cry_lei.columns)

In [None]:
# Data types of the features of the raw DataFrame, df_tracking_home_cry_lei
df_tracking_home_cry_lei.dtypes

Full details of these attributes and their data types is discussed further in the [Data Dictionary](section3.2.2).

In [None]:
# Displays all columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df_tracking_home_cry_lei.dtypes)

In [None]:
# Info for the raw DataFrame, df_tracking_home_cry_lei
df_tracking_home_cry_lei.info()

In [None]:
# Plot visualisation of the missing values for each feature of the raw DataFrame, df_tracking_home_cry_lei
msno.matrix(df_tracking_home_cry_lei, figsize = (30, 7))

In [None]:
# Counts of missing values
null_value_stats = df_tracking_home_cry_lei.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]

---

<a id='section4'></a>

## <a id='#section4'>4. Exploratory Data Analysis (EDA)</a>
The following EDA section explores the Tracking data for [Crystal Palace vs. Leicester City](https://www.bbc.co.uk/sport/football/58620544) (g2210334). The 

<a id='section4.1'></a>

### <a id='#section4.1'>4.1. Visualisation of Player Positions</a>

##### Kick Off
Frame = 20.

In [None]:
"""
# Visualise player positions using plot_events function from sviz library
fig, ax = sviz.plot_frame(frame_idx=20,
                          hometeam=df_tracking_home_cry_lei,
                          awayteam=df_tracking_away_cry_lei,
                          team_colors=(colour_crystal_palace, colour_leicester_city),
                          field_dimen=(pitch_length, pitch_width),
                          PlayerMarkerSize=12,
                          PlayerAlpha=0.7, 
                          include_player_velocities=False,
                          annotate=True,
                          filepath=fig_dir_second_spectrum + f'/player_positions_'
                         )
"""

##### First Goal (Iheanacho 31' minutes)
*    Timestamp = 31min
*    Frame = 46,255

In [None]:
# Estimate frame number from timestamp
estimate_frame_no(31, 0)

In [None]:
"""
# Visualise player positions using plot_events function from sviz library
fig, ax = sviz.plot_frame(frame_idx=46_255,
                          hometeam=df_tracking_home_cry_lei,
                          awayteam=df_tracking_away_cry_lei,
                          team_colors=(colour_crystal_palace, colour_leicester_city),
                          field_dimen=(pitch_length, pitch_width),
                          PlayerMarkerSize=12,
                          PlayerAlpha=0.7, 
                          include_player_velocities=False,    # not working currently
                          annotate=True,
                          filepath=fig_dir_second_spectrum + f'/player_positions_'
                         )
"""

Iheanacho was 1-on-1 with the goalkeeper to score the first of the match.

<a id='section4.2'></a>

### <a id='#section4.2'>4.2. Visualisation of the First 60 Seconds of the Match</a>

In [None]:
# Plotting tracking data - the first 60 seconds (i.e. 60 x 90 = 1500 frames)

## Define end frame for which to stop tracking player's movement
start_frame_idx = 0
end_frame_idx = 1_500

## Define player numbers shirt numbers
gk = '13'
lb = '3'
lcb = '6'
rcb = '16'
rb = '2'

## Define variables using in 'plot_pitch' function
tracking_home = df_tracking_home_cry_lei
tracking_away = df_tracking_away_cry_lei

## Visualise player positions using plot_events function from sviz library
fig, ax = sviz.plot_pitch()
ax.plot(tracking_home[f'Home_{gk}_x'].iloc[start_frame_idx:end_frame_idx], tracking_home[f'Home_{gk}_y'].iloc[start_frame_idx:end_frame_idx], 'b', MarkerSize=1)
ax.plot(tracking_home[f'Home_{lb}_x'].iloc[start_frame_idx:end_frame_idx], tracking_home[f'Home_{lb}_y'].iloc[start_frame_idx:end_frame_idx], 'g', MarkerSize=1)
ax.plot(tracking_home[f'Home_{lcb}_x'].iloc[start_frame_idx:end_frame_idx], tracking_home[f'Home_{lcb}_y'].iloc[start_frame_idx:end_frame_idx], 'k', MarkerSize=1)
ax.plot(tracking_home[f'Home_{rcb}_x'].iloc[start_frame_idx:end_frame_idx], tracking_home[f'Home_{rcb}_y'].iloc[start_frame_idx:end_frame_idx], 'r', MarkerSize=1)
ax.plot(tracking_home[f'Home_{rb}_x'].iloc[start_frame_idx:end_frame_idx], tracking_home[f'Home_{rb}_y'].iloc[start_frame_idx:end_frame_idx], 'c', MarkerSize=1)

## Save figure
if not os.path.exists(fig_dir_second_spectrum + f'/player_positions_backfive_first_60_seconds.png'):
    plt.savefig(fig_dir_second_spectrum + f'/player_positions_backfive_first_60_seconds.png', dpi=300, format='png', transparent=False, bbox_inches='tight')
else:
    pass

<a id='section4.3'></a>

### <a id='#section4.3'>4.3. Visualisations of the Goals</a>

The first goal of the match is was ccored by Iheanacho (31')
*    Timestamp = 31min
*    Frame = 46,255

In [None]:
# Estimate frame number from timestamp
estimate_frame_no(31, 0)

In [None]:
create_tracking_data_video(df_tracking_home=df_tracking_home_cry_lei,
                           df_tracking_away=df_tracking_away_cry_lei,
                           frame_start=46_055,
                           video_length_frames=250,
                           filename='tracking_clip_first_goal_cry_lei'
                          )

In [None]:
# Embed shot in the notebook
#Video('../../video/fig/second_spectrum/tracking_video_first_goal_cry_lei_46055_46555.mp4', width=770, height=530)

Crystal Palace lose the ball and Iheanacho scores a 1-on-1.

<a id='section4.4'></a>

### <a id='#section4.4'>4.4. Pitch Control</a>
Using the Tracking data, we can use this to build your Pitch Control models in Python and demonstrate how it can be used to evaluate a player's passing options using tracking and event data. But what is a Pitch Control model?

Definition:
**Pitch control at a given location is the probability that a player (or team) will gain control of the ball if it moves directly to that location.**

Pitch control measures the probability that a team will retain possession of the ball if they pass it to another location on the field. It can be used to evaluate passing options for a player, and quantify the probability of success.

The method described here is based on work by [William Spearman](https://twitter.com/the_spearman), as described in his Friend of Tracking video tutorial: [[link](https://www.youtube.com/watch?v=X9PrwPyolyU)] and see below.

Also see Spearman's paper "Beyond Expected Goals" published at the 2018 MIT Sloan Sports Analytics Conference [[link](http://www.sloansportsconference.com/wp-content/uploads/2018/02/2002.pdf)].

In the video below, Spearman explains the following for Pitch Control:
*    The principles behind pitch control models.
*    How they can be used to investigate player positioning.
*    How to extend them to account for ball motion.
*    How to combine pitch control models with measures of danger.
*    Defining 'off-ball scoring opportunity'
*    Extensions to pitch control.

In [None]:
# Liverpool FC data scientist William Spearman's masterclass in pitch control
YouTubeVideo('X9PrwPyolyU', width=800, height=470)

In [None]:
# Determine the Pitch Control model parameters
params = spc.default_model_params()

In [None]:
# Print model parameters
params

In [None]:
# Define the numbers of the goalkeepers
GK_numbers = [13, 1]

In [None]:
# Find goalkeeper numbers using 'find_goalkeeper' function from mio library - required for 'generate_pitch_control_for_event' function from mpc library
#GK_numbers = [sio.find_goalkeeper(df_tracking_home), mio.find_goalkeeper(df_tracking_away)]

##### Of a single frame

In [None]:
create_pitch_control_frame(frame_idx=46_255,
                           tracking_home=df_tracking_home_cry_lei,
                           tracking_away=df_tracking_away_cry_lei,
                           colour_home=colour_crystal_palace,
                           colour_away=colour_leicester_city,
                           filename='first_goal_cry_lei'
                          )

##### For a video

In [None]:
"""
create_pitch_control_video(starting_frame=46_055-200,
                           end_frame=46_055+200,
                           f=25,
                           tracking_home = df_tracking_home_cry_lei,
                           tracking_away = df_tracking_away_cry_lei,
                           colour_home=colour_crystal_palace,
                           colour_away=colour_leicester_city,
                           pitch_length=100.88880157470703,
                           pitch_width=67.97039794921875,
                           filename='first_goal_cry_lei'
                          )
"""

In [None]:
# Embed shot in the notebook
#Video('../../video/fig/second_spectrum/pitch_control_clip_first_goal_cry_lei_46255_46257.mp4', width=770, height=530, embed=True)

<a id='section4.5'></a>

### <a id='#section4.5'>4.5. Measuring the Physical Performance of Players</a>
Generating a bespoke physical summary of all the players for the the team of interest.

In [None]:
df_physical_summary_home_cry_bri = create_physical_report(df_tracking=df_tracking_home_cry_bri,
                                                          team='Home',    # 'Home' or 'Away'
                                                          match='Crystal Palace (1) vs. (1) Brighton & Hove Albion (27/09/2021)',
                                                          filename='crystal_palace_brighton_and_hove_albion'
                                                         )

In [None]:
df_physical_summary_home_cry_bri.head()

As can probably be expected, it's the midfielders and forwards that do more sprinting.

<a id='section4.6'></a>

### <a id='#section4.6'>4.6. Plot the trajectories for a selected player's sprint</a> 

In [None]:
visualise_sprints(df_tracking=df_tracking_home_cry_bri,
                  team='Home',
                  player=11,
                  sprint_threshold=7,
                  sprint_window=1*25
                 )

<a id='section4.7'></a>

### <a id='#section4.7'>4.7. Valuing Player Actions through an Expected Possession Value (EPV) model</a>

##### Get the EPV surface

In [None]:
#home_attack_direction = sio.find_playing_direction (df_tracking_home_cry_lei, 'Home')

# shooting right-to-left = -1, shooting left-to-right = 1
home_attack_direction = 1

##### Get the EPV surface
The `plot_EPV` function from the `mviz` library uses the `find_playing_direction` function, initialised in the Data Engineering section

In [None]:
EPV = sepv.load_EPV_grid(os.path.join(data_dir, 'reference', 'epv', f'EPV_grid.csv'))

In [None]:
EPV

In [None]:
sviz.plot_EPV(EPV, field_dimen=[106.0, 68], attack_direction=home_attack_direction)

In [None]:
"""
# Calculate value-added for assist and plot expected value surface

### Define DataFrames
df_events = df_tracking_home_cry_lei
df_tracking_home = df_tracking_home_cry_lei
df_tracking_away = df_tracking_away_cry_lei

## Define Event Number
event_number = 822 # away team first goal


## Calculate EEPV added and the EPV difference from the Pitch Control using the 'calculate_epv_added' function
EEPV_added, EPV_diff = sepv.calculate_epv_added(event_number, df_events, df_tracking_home, df_tracking_away, GK_numbers, EPV, params)


## Calculate the full Pitch Control surface at the moment the pass is made and multiple this by the EPV surface at that instance
PPCF, xgrid, ygrid = spc.generate_pitch_control_for_event(event_number, df_events, df_tracking_home, df_tracking_away, params, GK_numbers, field_dimen = (106.,68.,), n_grid_cells_x = 50, offsides=True)


## Create figures for Event

### Visualise EPV added
fig,ax = sviz.plot_EPV_for_event(event_number, df_events, df_tracking_home, df_tracking_away, PPCF, EPV, annotate=True, autoscale=True)
fig.suptitle('Pass EPV added: %1.3f' % EEPV_added, y=0.95 )

### Visualise Pitch Control
sviz.plot_pitchcontrol_for_event(event_number, df_events, df_tracking_home, df_tracking_away, PPCF, annotate=True)
"""

In [None]:
# FINISH THIS

---

<a id='section5'></a>

## <a id='#section5'>5. Data Analysis</a>

<a id='section5.1'></a>

### <a id='#section5.1'>5.1. Crystal Palace (1) vs. (1) Brighton & Hove Albion (27/09/2021)</a>

##### Match Highlights

In [None]:
# Crystal Palace 1-1 Brighton & Hove Albion

## Locally saved video (can be displayed in the notebook)
Video('../../video/match_highlights/27092021 - Crystal Palace (1) vs. (1) Brighton & Hove Albion.mp4', width=770, height=530, embed=True)

# YouTube video (blocked from display in Jupyter notebook)
#YouTubeVideo('SFcyhpx0tww', width=800, height=470)

##### Visualisation of Attacking Chances

In [None]:
# Filter Events DataFrame using the custom 'events_shots_only' function
df_cry_bri_shots = events_shots_only(df_cry_bri_events)

In [None]:
df_cry_bri_shots.head(50)

##### First Goal (Zaha 46' minutes)
*    Timestamp = 46m
*    Frame = 

##### Second Goal (Maupay 95' minutes)
*    Timestamp = 95m
*    Frame = 

##### Visualise Sprints of Players for Interest

In [None]:
# Wilfred Zaha
visualise_sprints(df_tracking=df_tracking_home_cry_bri,
                  team='Home',
                  player=11,    # Wilfred Zaha
                  sprint_threshold=7,
                  sprint_window=1*25,
                  dot_colour='ro',
                  line_colour='r'
                 )

In [None]:
# Conor Gallagher
visualise_sprints(df_tracking=df_tracking_home_cry_bri,
                  team='Home',
                  player=23,    # Conor Gallagher
                  sprint_threshold=7,
                  sprint_window=1*25,
                  dot_colour='ro',
                  line_colour='r'
                 )

##### Physical Performance

In [None]:
df_physical_summary_home_cry_bri = create_physical_report(df_tracking=df_tracking_home_cry_bri,
                                                          team='Home',    # 'Home' or 'Away'
                                                          match='Crystal Palace (1) vs. (1) Brighton & Hove Albion (27/09/2021)',
                                                          filename='crystal_palace_brighton_and_hove_albion'
                                                         )

In [None]:
df_physical_summary_home_cry_bri.head()

<a id='section5.2'></a>

### <a id='#section5.2'>5.2. Crystal Palace (2) vs. (2) Leicester City (03/10/2021)</a>

##### Match Highlights

In [None]:
# Crystal Palace 1-1 Brighton

## Locally saved video (can be displayed in the notebook)
Video('../../video/match_highlights/27092021 - Crystal Palace (1) vs. (1) Brighton & Hove Albion.mp4', width=770, height=530, embed=True)

# YouTube video (blocked from display in Jupyter notebook)
#YouTubeVideo('Blkcd_N66PA', width=800, height=470)

Olise & Schlupp both came off the bench to help Palace get a point from 2-0 down, to finish 2-2.

Goals:
*   Iheanacho - 31' mins (Leicester)
*   Vardy - 36' mins (Leicester)
*   Olise - 60' mins (Crystal Palace)
*   Schlupp - 71' mins (Crystal Palace)

##### Visualisation of Attacking Chances

Determine when the shots take place in the game using Event data:

In [None]:
# Filter Events DataFrame using the custom 'events_shots_only' function
df_cry_lei_shots = events_shots_only(df_cry_lei_events)

In [None]:
df_cry_lei_shots.head(50)

##### First Goal (Iheanacho 31' minutes)
*    Timestamp = 30m51s
*    Frame = 46,255

In [None]:
# Estimate frame number from timestamp
estimate_frame_no(30, 51)

In [None]:
create_tracking_data_video(df_tracking_home=df_tracking_home_cry_lei,
                           df_tracking_away=df_tracking_away_cry_lei,
                           frame_start=46_055,
                           video_length_frames=250,
                           filename='tracking_clip_first_goal_cry_lei'
                          )

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/tracking_clip_first_goal_cry_lei_46055_46305.mp4', width=770, height=530, embed=True)

In [None]:
# Define the numbers of the goalkeepers
GK_numbers = [13, 1]

In [None]:
## Define variables used in 'generate_pitch_control' function
frame_idx = 46_055
starting_frame = 46_055
end_frame=46_055+250
f=25
tracking_home = df_tracking_home_cry_lei
tracking_away = df_tracking_away_cry_lei
colour_home=colour_crystal_palace
colour_away=colour_leicester_city
cmap = LinearSegmentedColormap.from_list('mycmap', [colour_away, 'white', colour_home])
pitch_length=100.88880157470703
pitch_width=67.97039794921875
filename='first_goal_cry_lei'

## Visualise player positions using generate_pitch_control_for_event function from spc library
def make_frame(t):
    t2 = int(math.ceil(t*f+0.0001)-1)
    PPCF, xgrid, ygrid = spc.generate_pitch_control_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, params, GK_numbers, field_dimen = (pitch_length, pitch_width), n_grid_cells_x=50)
    fig, ax = sviz.plot_pitchcontrol_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, PPCF, cmap, include_player_velocities=False, annotate=True)
    image = mplfig_to_npimage(fig)
    return image    # returns a 8-bit RGB array

## Create MP4 video of tracking data
if not os.path.exists(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4'):
    clip = mpy.VideoClip(make_frame, duration=((end_frame-starting_frame)/f)).set_fps(f)
    clip.write_videofile(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4')

else:
    pass

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/pitch_control_clip_first_goal_cry_lei_46055_46305.mp4', width=770, height=530, embed=True)

##### Second Goal (Jamie Vardy 37' minutes)
*    Timestamp = 36m54s
*    Frame = 55,375

In [None]:
# Estimate frame number from timestamp
estimate_frame_no(36, 55)

In [None]:
55375-250

In [None]:
create_tracking_data_video(df_tracking_home=df_tracking_home_cry_lei,
                           df_tracking_away=df_tracking_away_cry_lei,
                           frame_start=55125,
                           video_length_frames=250,
                           filename='tracking_clip_second_goal_cry_lei'
                          )

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/tracking_clip_second_goal_cry_lei_55125_55375.mp4', width=770, height=530, embed=True)

In [None]:
## Define variables used in 'generate_pitch_control' function
frame_idx=55_125
starting_frame=55_125
end_frame=55_125+250
f=25
tracking_home = df_tracking_home_cry_lei
tracking_away = df_tracking_away_cry_lei
colour_home=colour_crystal_palace
colour_away=colour_leicester_city
cmap = LinearSegmentedColormap.from_list('mycmap', [colour_away, 'white', colour_home])
pitch_length=100.88880157470703
pitch_width=67.97039794921875
filename='second_goal_cry_lei'

## Visualise player positions using generate_pitch_control_for_event function from spc library
def make_frame(t):
    t2 = int(math.ceil(t*f+0.0001)-1)
    PPCF, xgrid, ygrid = spc.generate_pitch_control_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, params, GK_numbers, field_dimen = (pitch_length, pitch_width), n_grid_cells_x=50)
    fig, ax = sviz.plot_pitchcontrol_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, PPCF, cmap, include_player_velocities=False, annotate=True)
    image = mplfig_to_npimage(fig)
    return image    # returns a 8-bit RGB array

## Create MP4 video of tracking data
if not os.path.exists(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4'):
    clip = mpy.VideoClip(make_frame, duration=((end_frame-starting_frame)/f)).set_fps(f)
    clip.write_videofile(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4')

else:
    pass

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/pitch_control_clip_second_goal_cry_lei_55125_55375.mp4', width=770, height=530, embed=True)

##### Third Goal (Olise 61' minutes)
*    Timestamp = 60m48s
*    Frame = 91,200

In [None]:
# Estimate frame number from timestamp
estimate_frame_no(60, 48)

In [None]:
91200+2825

In [None]:
92000 + (81*25)

In [None]:
create_tracking_data_video(df_tracking_home=df_tracking_home_cry_lei,
                           df_tracking_away=df_tracking_away_cry_lei,
                           frame_start=93980,
                           video_length_frames=300,
                           filename='tracking_clip_third_goal_cry_lei'
                          )

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/tracking_clip_third_goal_cry_lei_93980_94280.mp4', width=770, height=530, embed=True)

In [None]:
## Define variables used in 'generate_pitch_control' function
frame_idx=93_980
starting_frame=93_980
end_frame=93_980+300
f=25
tracking_home = df_tracking_home_cry_lei
tracking_away = df_tracking_away_cry_lei
colour_home=colour_crystal_palace
colour_away=colour_leicester_city
cmap = LinearSegmentedColormap.from_list('mycmap', [colour_away, 'white', colour_home])
pitch_length=100.88880157470703
pitch_width=67.97039794921875
filename='third_goal_cry_lei'

## Visualise player positions using generate_pitch_control_for_event function from spc library
def make_frame(t):
    t2 = int(math.ceil(t*f+0.0001)-1)
    PPCF, xgrid, ygrid = spc.generate_pitch_control_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, params, GK_numbers, field_dimen = (pitch_length, pitch_width), n_grid_cells_x=50)
    fig, ax = sviz.plot_pitchcontrol_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, PPCF, cmap, include_player_velocities=False, annotate=True)
    image = mplfig_to_npimage(fig)
    return image    # returns a 8-bit RGB array

## Create MP4 video of tracking data
if not os.path.exists(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4'):
    clip = mpy.VideoClip(make_frame, duration=((end_frame-starting_frame)/f)).set_fps(f)
    clip.write_videofile(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4')

else:
    pass

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/pitch_control_clip_third_goal_cry_lei_93980_94280.mp4', width=770, height=530, embed=True)

##### Forth Goal (Schlupp 72' minutes)
*    Timestamp = 71m20s
*    Frame = 109,630

In [None]:
93980+(626*25)

In [None]:
create_tracking_data_video(df_tracking_home=df_tracking_home_cry_lei,
                           df_tracking_away=df_tracking_away_cry_lei,
                           frame_start=109630,
                           video_length_frames=450,
                           filename='tracking_clip_forth_goal_cry_lei'
                          )

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/tracking_clip_forth_goal_cry_lei_109630_110080.mp4', width=770, height=530, embed=True)

In [None]:
## Define variables used in 'generate_pitch_control' function
frame_idx=109_630
starting_frame=109_630
end_frame=109_630+400
f=25
tracking_home = df_tracking_home_cry_lei
tracking_away = df_tracking_away_cry_lei
colour_home=colour_crystal_palace
colour_away=colour_leicester_city
cmap = LinearSegmentedColormap.from_list('mycmap', [colour_away, 'white', colour_home])
pitch_length=100.88880157470703
pitch_width=67.97039794921875
filename='forth_goal_cry_lei'

## Visualise player positions using generate_pitch_control_for_event function from spc library
def make_frame(t):
    t2 = int(math.ceil(t*f+0.0001)-1)
    PPCF, xgrid, ygrid = spc.generate_pitch_control_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, params, GK_numbers, field_dimen = (pitch_length, pitch_width), n_grid_cells_x=50)
    fig, ax = sviz.plot_pitchcontrol_for_event(t2+starting_frame, tracking_home, tracking_home, tracking_away, PPCF, cmap, include_player_velocities=False, annotate=True)
    image = mplfig_to_npimage(fig)
    return image    # returns a 8-bit RGB array

## Create MP4 video of tracking data
if not os.path.exists(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4'):
    clip = mpy.VideoClip(make_frame, duration=((end_frame-starting_frame)/f)).set_fps(f)
    clip.write_videofile(video_dir_second_spectrum + f'/pitch_control_clip_{filename}_{starting_frame}_{end_frame}.mp4')

else:
    pass

In [None]:
# Embed shot in the notebook
Video('../../video/fig/second_spectrum/pitch_control_clip_forth_goal_cry_lei_109630_110030.mp4', width=770, height=530, embed=True)

##### Visualise Sprints of Players for Interest

In [None]:
# Wilfred Zaha
visualise_sprints(df_tracking=df_tracking_home_cry_lei,
                  team='Home',
                  player=11,    # Wilfred Zaha
                  sprint_threshold=7,
                  sprint_window=1*25,
                  dot_colour='ro',
                  line_colour='r'
                 )

In [None]:
# Conor Gallagher
visualise_sprints(df_tracking=df_tracking_home_cry_lei,
                  team='Home',
                  player=23,    # Conor Gallagher
                  sprint_threshold=7,
                  sprint_window=1*25,
                  dot_colour='ro',
                  line_colour='r'
                 )

##### Physical Performance

In [None]:
df_physical_summary_home_cry_lei = create_physical_report(df_tracking=df_tracking_home_cry_lei,
                                                          team='Home',    # 'Home' or 'Away'
                                                          match='Crystal Palace (2) vs. (2) Leicester City (03/10/2021)',
                                                          filename='crystal_palace_leicester_city'
                                                         )

In [None]:
df_physical_summary_home_cry_lei.head()

---

<a id='section6'></a>

## <a id='#section6'>6. Summary</a>
This notebook analyses and visualises [Second Spectrum](https://www.secondspectrum.com/index.html) Tracking data using [pandas](http://pandas.pydata.org/) for data engineering and [matplotlib](https://matplotlib.org/) for data visualisation.

---

<a id='section7'></a>

## <a id='#section7'>7. Next Steps</a>
The next stage is to..

---

<a id='section8'></a>

## <a id='#section8'>8. References</a>
*    [Second Spectrum](https://www.secondspectrum.com/index.html) data
*    [Laurie Shaw](https://twitter.com/EightyFivePoint)'s Metrica Sports Tracking data series for #FoT - [Introduction](https://www.youtube.com/watch?v=8TrleFklEsE), [Measuring Physical Performance](https://www.youtube.com/watch?v=VX3T-4lB2o0), [Pitch Control modelling](https://www.youtube.com/watch?v=5X1cSehLg6s), and [Valuing Actions](https://www.youtube.com/watch?v=KXSLKwADXKI). See the following for code [[link](https://github.com/Friends-of-Tracking-Data-FoTD/LaurieOnTracking)]
*    Laurie Shaw's Metrica Sports Tracking data series for [Friends of Tracking](https://www.youtube.com/channel/UCUBFJYcag8j2rm_9HkrrA7w) (see the following for code [[link](https://github.com/Friends-of-Tracking-Data-FoTD/LaurieOnTracking)]):
     +    [Introduction](https://www.youtube.com/watch?v=8TrleFklEsE);
     +    [Measuring Physical Performance](https://www.youtube.com/watch?v=VX3T-4lB2o0);
     +    [Pitch Control modelling](https://www.youtube.com/watch?v=5X1cSehLg6s); and
     +    [Valuing Actions](https://www.youtube.com/watch?v=KXSLKwADXKI).
*    [Demystifying Tracking data Sportlogiq webinar](https://www.youtube.com/watch?v=miEWHSTYvX4) by Sam Gregory and Devin Pleuler
*    [Will Spearman's masterclass in Pitch Control](https://www.youtube.com/watch?v=X9PrwPyolyU&list=PL38nJNjpNpH-l59NupDBW7oG7CmWBgp7Y) for Friends of Tracking
*    [How Tracking Data is Used in Football and What are the Future Challenges](https://www.youtube.com/watch?v=kHTq9cwdkGA) with Javier Fernández, Sudarshan 'Suds' Gopaladesikan, Laurie Shaw, Will Spearman and David Sumpter for Friends of Tracking.
*    [Introduction to tracking data in football](https://www.youtube.com/watch?v=fYqEnoOV9Po) by David Sumpter for Friends of Tracking

---

***Visit my website [eddwebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***

[Back to the top](#top)