<a id='top'></a>

# Signality Tracking Data Engineering of Hammarby vs. Örebrö 30.09.2019
##### Notebook to explores three sample matches of Spatiotemporal Tracking data from [Signality](https://www.signality.com/).

### By [Edd Webster](https://www.twitter.com/eddwebster)
Notebook first written: 17/01/2020<br>
Notebook last updated: 17/01/2021

![title](../../../../img/signality_screenshot.png)

---

## <a id='import_libraries'>Introduction</a>
This notebook is a short Exploratory Data Analysis (EDA) of [Signality](https://www.signality.com/) Spatiotemporal Tracking and corresponding Event data with [Python](https://www.python.org/) using [pandas](http://pandas.pydata.org/) DataFrames and [matplotlib](https://matplotlib.org/contents.html?v=20200411155018) visualisations.

For more information about this notebook and the author, I'm available through all the following channels:
*    [eddwebster.com](https://www.eddwebster.com/);
*    edd.j.webster@gmail.com;
*    [@eddwebster](https://www.twitter.com/eddwebster);
*    [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);
*    [github/eddwebster](https://github.com/eddwebster/);
*    [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster);
*    [kaggle.com/eddwebster](https://www.kaggle.com/eddwebster); and
*    [hackerrank.com/eddwebster](https://www.hackerrank.com/eddwebster).

![title](../../../../img/fifa21eddwebsterbanner.png)

The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/football_analytics) and a static version of this notebook can be found [here](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/A%29%20Web%20Scraping/TransferMarkt%20Web%20Scraping%20and%20Parsing.ipynb).

___

<a id='sectioncontents'></a>

## <a id='notebook_contents'>Notebook Contents</a>
1.    [Notebook Dependencies](#section1)<br>
2.    [Project Brief](#section2)<br>
3.    [Data Sources](#section3)<br>
      1.    [Introduction](#section3.1)<br>
      2.    [Data Dictionary](#section3.2)<br>
      3.    [Creating the DataFrame](#section3.3)<br>
      4.    [Initial Data Handling](#section3.4)<br>
      5.    [Export the Raw DataFrame](#section3.5)<br>         
4.    [Data Engineering](#section4)<br>
      1.    [Introduction](#section4.1)<br>
      2.    [Columns of Interest](#section4.2)<br>
      3.    [String Cleaning](#section4.3)<br>
      4.    [Converting Data Types](#section4.4)<br>
      5.    [Export the Engineered DataFrame](#section4.5)<br>
5.    [Exploratory Data Analysis (EDA)](#section5)<br>
      1.    [...](#section5.1)<br>
      2.    [...](#section5.2)<br>
      3.    [...](#section5.3)<br>
6.    [Summary](#section6)<br>
7.    [Next Steps](#section7)<br>
8.    [Bibliography](#section8)<br>

---

## <a id='#section1'>1. Notebook Dependencies</a>

This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:
*    [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;
*    [`NumPy`](http://www.numpy.org/) for multidimensional array computing;
*    [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation;
*    [`Beautifulsoup`](https://pypi.org/project/beautifulsoup4/) for web scraping; and
*    [`matplotlib`](https://matplotlib.org/contents.html?v=20200411155018) for data visualisations;

All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/).

### Import Libraries and Modules

In [1]:
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv
import pprint as pp

# Import Dependencies
%matplotlib inline

# Math Operations
import numpy as np
import math
from math import pi

# Datetime
import datetime
from datetime import date
import time

# Data Preprocessing
import pandas as pd
import re
import os
from collections import Counter, defaultdict
import random
from io import BytesIO
from pathlib import Path

# Reading directories
import glob
import os
from os.path import basename

# Working with JSON
import json
from pandas.io.json import json_normalize

# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from matplotlib import patches
import seaborn as sns
plt.style.use('seaborn-whitegrid')
import missingno as msno
import moviepy.editor as mpy
from moviepy.video.io.bindings import mplfig_to_npimage

# Progress Bar
from tqdm import tqdm

# Fran Peralta's custom libraries for working with Signality data
from Libraries import Functions_PreprocessTrackingData as funcs
from Libraries import Dictionaries as dicts

# ML libraries
import scipy.signal as signal
from scipy.spatial import Voronoi, voronoi_plot_2d

# Display in Jupyter
from IPython.display import Image, Video, YouTubeVideo
from IPython.core.display import HTML

# Ignore Warnings
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

print('Setup Complete')

Setup Complete


### Defined Filepaths

In [2]:
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..', '..', '..')
data_dir = os.path.join(base_dir, 'data')
data_dir_metrica = os.path.join(base_dir, 'data', 'metrica')
data_dir_signality = os.path.join(base_dir, 'data', 'signality')
data_dir_signality_tracking = os.path.join(base_dir, 'data', 'signality', 'raw', '2019', 'tracking_data')
scripts_dir = os.path.join(base_dir, 'scripts')
scripts_dir_signality = os.path.join(base_dir, 'scripts', 'signality')
scripts_dir_metrica = os.path.join(base_dir, 'scripts', 'metrica')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
fig_dir_signality = os.path.join(base_dir, 'img', 'fig', 'signality')
video_dir = os.path.join(base_dir, 'video')
video_dir_signality = os.path.join(base_dir, 'video', 'signality')

In [3]:
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))
print('Seaborn: {}'.format(sns.__version__))

Python: 3.7.6
NumPy: 1.18.0
pandas: 1.2.0
matplotlib: 3.3.2
Seaborn: 0.11.1


### Defined Variables

In [4]:
# Define today's date
today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')

# Define pitch dimensions
pitch_length = 106.0
pitch_width = 68

### Custom Libraries for Tracking Data

In [5]:
# Custom libraries for working with Signality data based on Laurie Shaw's Metric Sports libraries for Metrica Sports data

## Define path of scripts
sys.path.insert(0, os.path.abspath(scripts_dir))

## Signality scripts - custom scripts derived from Laurie Shaw's Metrica scripts
import Signality_IO as sio
import Signality_Velocities as svel

### Notebook Settings

In [6]:
pd.set_option('display.max_columns', None)

---

## <a id='#section2'>2. Project Brief</a>
This Jupyter notebook engineered scraped football data from [Signality](https://www.signality.com/) using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames, [matplotlib]() for plotting, [scipy]() for Machine Learning.

The data of player values produced in this notebook is exported to CSV. This data can be further analysed in Python, joined to other datasets, or explored using dashboarding tools such as Tableau or PowerBI, or explores in a spreadsheet such as Microsoft Excel or Google Sheets.

### <a id='#section2.1'>2.1. Goals</a>
*    Hammarby 2 vs. 0 Malmö FF on 20th October 2019 [[link](https://int.soccerway.com/matches/2019/10/20/sweden/allsvenskan/hammarby/malmo-fotbollsforening/2947351/)]
     - A. Kačaniklić 15' (1-0) (assist by N. Đurđić)
     - R. Magyar 88' (2-0) (assist by D. Bojanic)
*    Hammarby 5 vs. 1 Örebrö on 30th September 2019 [[link](https://int.soccerway.com/matches/2019/09/30/sweden/allsvenskan/hammarby/orebro-sportklubb-fotboll/2947335/)]
     - V. Prodell 11' (0-1), (assist by Yaser Kasim)
     - V. Rodić 39' (1-1), (assist by S. Sandberg)
     - N. Đurđić 57' (2-1), (assist by D. Bojanic)
     - V. Rodić 62' (3-1)
     - V. Rodić 80' (4-1), (assist by D. Widgren)
     - M. Solheim 90'+3 (5-1), (assist by N. Đurđić)

### <a id='#section2.2'>2.2. Highlights</a>

---

## <a id='#section3'>3. Data Sources</a>
[Signality](https://www.signality.com/) is a...

![title](../../../../img/signality_logo.png)

The tracking data represents the location of every player on the pitch with a temporal frequency of 25 Hz and the corresponding match time for each tracking frame is specified.

Before conducting our EDA, the data needs to be imported as a DataFrame in the Data Sources section [Section 3](#section3) and cleaned in the Data Engineering section [Section 4](#section4).


We'll be using the [pandas](http://pandas.pydata.org/) library to import our data to this workbook as a DataFrame.

### <a id='#section3.2.1'>3.2.1. Data Dictionaries</a>
The [Signality](https://www.signality.com/) Events dataset has fourteen features (columns) with the following definitions and data types:

| Feature     | Data type    |
|------|-----|
| `Team`    | object     |
| `Type`    | object     |
| `Subtype`    | object     |
| `Period`    | int64     |
| `Start Frame `    | int64     |
| `Start Time [s]`    | float64     |
| `End Frame`    | int64     |
| `End Time [s]`    | float64     |
| `From`    | object     |
| `To`    | object     |
| `Start X`    | float64     |
| `Start Y`    | float64     |
| `End X`    | float64     |
| `End Y`    | float64     |

For a full list of definitions, see the Metrica Sports documentation [[link](https://github.com/metrica-sports/sample-data/blob/master/documentation/events-definitions.pdf).

### <a id='#section3.2'>3.2. Import Data</a>

#### <a id='#section3.2.1'> 3.2.1. First Half

In [7]:
# Select the file from the three matches of Tracking data
#file_name = '20190722.Hammarby-IFElfsborg'
#file_name = '20191020.Hammarby-MalmöFF'
file_name = '20190930.Hammarby-Örebro'        # NB: 2nd half almost no players in field

# Names of the teams playing in OPTA format
home_team_name = 'Hammarby IF'
away_team_name = 'Örebro'
year = file_name[:4]

# First '.1' or second '.2' half of the match
data_file_name=file_name+'.1'

#Preprocesses the file in to the format we use.
if not os.path.exists(data_dir_signality_tracking + '/Preprocessed/' + data_file_name + '_preprocessed.pickle'):
    preprocessed = False
    [ball_position_not_transf, players_position_not_transf, players_team_id, events, players_jersey,
     info_match,names_of_players] = funcs.LoadDataHammarbyNewStructure2020(data_file_name, data_dir_signality_tracking + '/')
else:
    preprocessed = True
    [ball_position_not_transf,players_position_not_transf,players_team_id,events,players_jersey,
     info_match,names_of_players,players_in_play_list] = funcs.LoadDataHammarbyPreprocessed(data_file_name, data_dir_signality_tracking + '/')

frame_id = 1000
team_index = players_team_id[frame_id].astype(int).reshape(len(players_team_id[frame_id]),)
players_in_play = funcs.GetPlayersInPlay(players_position_not_transf,frame_id)

[players_position,ball_position] = funcs.TransformCoords(players_position_not_transf,ball_position_not_transf)


Loading data, this might take some seconds...
Data has been loaded



In [8]:
### Store data in dataframe with same format as Metrica data to be able to use parts of Laurie Shaw's code
tracking_first_half = np.array(players_position)
totalplayers = 36
tracking_first_half.shape = -1, totalplayers*2     # X and Y coordinate for every player
tracking_first_half = pd.DataFrame(tracking_first_half)
column_names = []                   
for i in range(totalplayers):       # Rename columns
    if i < totalplayers/2:
        column_names.append(f'Home_{players_jersey[i]}_x')
        column_names.append(f'Home_{players_jersey[i]}_y')
    else:
        column_names.append(f'Away_{players_jersey[i]}_x')
        column_names.append(f'Away_{players_jersey[i]}_y')
tracking_first_half.columns = column_names

In [9]:
# Convert coordinates
tracking_first_half.replace([-105, -68], float('nan'), inplace=True)
pitch_length = info_match['calibration']['pitch_size'][0]
pitch_width = info_match['calibration']['pitch_size'][1]
for i in range(totalplayers*2):
    if i % 2 == 0:
        tracking_first_half.iloc[:,i] = tracking_first_half.iloc[:,i] - pitch_length/2            # subtract pitch length to set centre circle to (0,0)       
    else:                                                                   
        tracking_first_half.iloc[:,i] = (tracking_first_half.iloc[:,i] - pitch_width/2) * -1      # also multiply by -1 to mirror horizontally
# split home & away team
tracking_home_first_half=tracking_first_half.iloc[:,0:totalplayers]       
tracking_away_first_half=tracking_first_half.iloc[:,totalplayers:totalplayers*2]

In [10]:
# Convert ball position coordinates
ball_position = pd.DataFrame(ball_position, index=tracking_home_first_half.index)
ball_position.columns = ['ball_x', 'ball_y']
ball_position.replace([np.inf], float('nan'), inplace=True)
ball_position.iloc[:,0] = ball_position.iloc[:,0] - pitch_length/2
ball_position.iloc[:,1] = (ball_position.iloc[:,1] - pitch_width/2) * -1

In [11]:
# Add ball position to dataframes
tracking_home_first_half = pd.concat([tracking_home_first_half, ball_position], 1)
tracking_away_first_half = pd.concat([tracking_away_first_half, ball_position], 1) 

In [12]:
tracking_home_first_half

Unnamed: 0,Home_24_x,Home_24_y,Home_14_x,Home_14_y,Home_16_x,Home_16_y,Home_26_x,Home_26_y,Home_31_x,Home_31_y,Home_32_x,Home_32_y,Home_77_x,Home_77_y,Home_25_x,Home_25_y,Home_2_x,Home_2_y,Home_4_x,Home_4_y,Home_13_x,Home_13_y,Home_3_x,Home_3_y,Home_6_x,Home_6_y,Home_20_x,Home_20_y,Home_19_x,Home_19_y,Home_11_x,Home_11_y,Home_40_x,Home_40_y,Home_22_x,Home_22_y,ball_x,ball_y
0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,,,,,,,,,,,,,,,,,18.650287,9.246784,,,19.985373,-15.616959,18.198566,-25.807018,11.422753,-7.315789,-0.201530,-8.956140,,,0.440918,19.695322,,,-0.020841,-19.066667,,
3,,,,,,,,,,,,,,,,,18.670363,9.266667,20.216252,-2.335088,19.995411,-15.607018,18.198566,-25.816959,11.422753,-7.315789,-0.241683,-8.916374,,,0.420841,19.705263,,,-0.030880,-19.066667,,
4,,,,,,,,,,,,,,,,,18.690440,9.286550,20.226291,-2.335088,20.005449,-15.587135,18.198566,-25.826901,11.422753,-7.325731,-0.301912,-8.876608,,,0.390727,19.725146,,,-0.050956,-19.076608,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
66688,,,,,,,,,,,,,,,30.445220,4.783041,-34.010325,33.514035,-38.698184,3.291813,-39.129828,-8.081287,-7.810516,4.116959,-29.121702,-16.362573,-36.038050,21.206433,-21.793786,13.243275,-21.482600,-3.736842,-37.985468,14.267251,-38.959178,0.428655,-6.746608,9.175367
66689,,,,,,,,,,,,,,,30.425143,4.743275,-33.960134,33.494152,-38.647992,3.271930,-39.069598,-8.091228,-7.629828,4.146784,-29.031358,-16.352632,-35.997897,21.176608,-21.793786,13.253216,-21.342065,-3.697076,-37.945315,14.287135,-38.908987,0.428655,-6.316453,8.974515
66690,,,,,,,,,,,,,,,30.405067,4.693567,-33.899904,33.474269,-38.597801,3.261988,-39.019407,-8.101170,-7.439101,4.176608,-28.941013,-16.352632,-35.947706,21.136842,-21.783748,13.253216,-21.201530,-3.667251,-37.895124,14.297076,-38.858795,0.428655,-5.895492,8.761130
66691,,,,,,,,,,,,,,,30.374952,4.643860,-33.849713,33.444444,-38.547610,3.252047,-38.969216,-8.111111,-7.248375,4.206433,-28.850669,-16.352632,-35.897514,21.107018,-21.773709,13.263158,-21.060994,-3.627485,-37.834895,14.307018,-38.808604,0.428655,-5.485080,8.551326


In [13]:
#### Add velocitiy & acceleration & distance to goal
tracking_home_first_half = svel.calc_player_velocities(tracking_home_first_half, file_name='20191020.Hammarby-MalmöFF', data_file_name=file_name+'.1', maxspeed=12, smoothing_v=True, smoothing_a=True, distance2goal=True, pitch_length=106, pitch_width=68)
#### We don't add distance to goal for Hammarby's opponent because other direction not implemented
tracking_away_first_half = svel.calc_player_velocities(tracking_away_first_half, file_name='20191020.Hammarby-MalmöFF', data_file_name=file_name+'.1', maxspeed=12, smoothing_v=True, smoothing_a=True, distance2goal=False, pitch_length=106, pitch_width=68)

UnboundLocalError: local variable 'dx2goal' referenced before assignment

In [None]:
tracking_home_first_half

In [None]:
tracking_away_first_half

#### <a id='#section3.2.2'> 3.2.2. Second Half

In [None]:
# Select the file from the three matches of Tracking data
#file_name = '20190722.Hammarby-IFElfsborg'
#file_name = '20191020.Hammarby-MalmöFF'
file_name = '20190930.Hammarby-Örebro'        # NB: 2nd half almost no players in field

# Names of the teams playing in OPTA format
home_team_name = 'Hammarby IF'
away_team_name = 'Örebro'
year = file_name[:4]

#First '.1' or second '.2' half of the match
data_file_name=file_name+'.2'

#Preprocesses the file in to the format we use.
if not os.path.exists(data_dir_signality_tracking + '/Preprocessed/' + data_file_name + '_preprocessed.pickle'):
    preprocessed = False
    [ball_position_not_transf, players_position_not_transf, players_team_id, events, players_jersey,
     info_match,names_of_players] = funcs.LoadDataHammarbyNewStructure2020(data_file_name, data_dir_signality_tracking + '/')
else:
    preprocessed = True
    [ball_position_not_transf,players_position_not_transf,players_team_id,events,players_jersey,
     info_match,names_of_players,players_in_play_list] = funcs.LoadDataHammarbyPreprocessed(data_file_name, data_dir_signality_tracking + '/')

frame_id = 1000
team_index = players_team_id[frame_id].astype(int).reshape(len(players_team_id[frame_id]),)
players_in_play = funcs.GetPlayersInPlay(players_position_not_transf,frame_id)

[players_position,ball_position] = funcs.TransformCoords(players_position_not_transf,ball_position_not_transf)

In [None]:
### Store data in dataframe with same format as Metrica data to be able to use parts of Laurie Shaw's code
tracking_second_half = np.array(players_position)
totalplayers = 36
tracking_second_half.shape = -1, totalplayers*2     # X and Y coordinate for every player
tracking_second_half = pd.DataFrame(tracking_second_half)
column_names = []                   
for i in range(totalplayers):       # Rename columns
    if i < totalplayers/2:
        column_names.append(f'Home_{players_jersey[i]}_x')
        column_names.append(f'Home_{players_jersey[i]}_y')
    else:
        column_names.append(f'Away_{players_jersey[i]}_x')
        column_names.append(f'Away_{players_jersey[i]}_y')
tracking_second_half.columns = column_names

In [None]:
# Convert coordinates
tracking_second_half.replace([-105, -68], float('nan'), inplace=True)
pitch_length = info_match['calibration']['pitch_size'][0]
pitch_width = info_match['calibration']['pitch_size'][1]
for i in range(totalplayers*2):
    if i % 2 == 0:
        tracking_second_half.iloc[:,i] = tracking_second_half.iloc[:,i] - pitch_length/2            # subtract pitch length to set centre circle to (0,0)       
    else:                                                                   
        tracking_second_half.iloc[:,i] = (tracking_second_half.iloc[:,i] - pitch_width/2) * -1      # also multiply by -1 to mirror horizontally
# split home & away team
tracking_home_second_half=tracking_second_half.iloc[:,0:totalplayers]       
tracking_away_second_half=tracking_second_half.iloc[:,totalplayers:totalplayers*2]

In [None]:
# Convert ball position coordinates
ball_position = pd.DataFrame(ball_position, index=tracking_home_second_half.index)
ball_position.columns = ['ball_x', 'ball_y']
ball_position.replace([np.inf], float('nan'), inplace=True)
ball_position.iloc[:,0] = ball_position.iloc[:,0] - pitch_length/2
ball_position.iloc[:,1] = (ball_position.iloc[:,1] - pitch_width/2) * -1

In [None]:
# Add ball position to DataFrames
tracking_home_second_half = pd.concat([tracking_home_second_half, ball_position], 1)
tracking_away_second_half = pd.concat([tracking_away_second_half, ball_position], 1) 

In [None]:
#### Add velocitiy & acceleration & distance to goal
tracking_home_second_half = svel.calc_player_velocities(tracking_home_second_half, file_name='20190930.Hammarby-Örebro', data_file_name=file_name+'.2', maxspeed=12, smoothing_v=True, smoothing_a=True, distance2goal=True, pitch_length=106, pitch_width=68)
#### We don't add distance to goal for Hammarby's opponent because other direction not implemented
tracking_away_second_half = svel.calc_player_velocities(tracking_away_second_half, file_name='20190930.Hammarby-Örebro', data_file_name=file_name+'.2', maxspeed=12, smoothing_v=True, smoothing_a=True, distance2goal=False, pitch_length=106, pitch_width=68)

In [None]:
tracking_home_second_half

In [None]:
tracking_away_second_half

#### <a id='#section3.2.3'> 3.2.3. Concatenate First and Second Half DataFrames

In [None]:
tracking_home_full = pd.concat([tracking_home_first_half, tracking_home_second_half])

In [None]:
tracking_away_full = pd.concat([tracking_away_first_half, tracking_away_second_half])

In [None]:
tracking_home_full

In [None]:
tracking_away_full

In [None]:
df_tracking_orebro_home = tracking_home_full

In [None]:
df_tracking_orebro_away = tracking_away_full

In [None]:
# Export DataFrame as a CSV file
df_tracking_orebro_home.to_csv(data_dir_signality + '/engineered/2019/tracking_data/hammerby_orebro_home.csv', index=None, header=True)
df_tracking_orebro_away.to_csv(data_dir_signality + '/engineered/2019/tracking_data/hammerby_orebro_away.csv', index=None, header=True)

#### <a id='#section3.2.2.'>3.2.2.  Preliminary Data Handling</a>
Due to the number of DataFrames - Home and Away, for three different matches, the following data handling is just for one DataFrame, for the Elfberg Home (`df_tracking_orebro_home`).

##### <a id='#section3.2.2.1.'>3.2.2.1. Tracking DataFrame</a>

Let's quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods.

In [None]:
# Display the first 5 rows of the raw DataFrame, df_tracking_orebro_home
df_tracking_orebro_home.head()

In [None]:
# Display the last 5 rows of the raw DataFrame, df_tracking_orebro_home
df_tracking_orebro_home.tail()

In [None]:
# Print the shape of the raw DataFrame, df_tracking_orebro_home
print(df_tracking_orebro_home.shape)

In [None]:
# Print the column names of the raw DataFrame, df_tracking_orebro_home
print(df_tracking_orebro_home.columns)

The dataset has fourteen features (columns). Full details of these attributes can be found in the [Data Dictionary](section3.3.1).

In [None]:
# Data types of the features of the raw DataFrame, df_tracking_orebro_home
df_tracking_orebro_home.dtypes

All fourteen of the columns have the object data type. Full details of these attributes and their data types can be found in the [Data Dictionary](section3.3.1).

In [None]:
# Info for the raw DataFrame, df_tracking_orebro_home
df_tracking_orebro_home.info()

In [None]:
# Description of the raw DataFrame, df_tracking_orebro_home showing some summary statistics for each numberical column in the DataFrame
df_tracking_orebro_home.describe()

In [None]:
# Plot visualisation of the missing values for each feature of the raw DataFrame, df_tracking_orebro_home
msno.matrix(df_tracking_orebro_home, figsize = (30, 7))

In [None]:
# Counts of missing values
null_value_stats = df_tracking_orebro_home.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]

---

## <a id='#section4'>4. Summary</a>
This notebook is to engineered the [Signality](https://www.signality.com/) Tracking football data with [Python](https://www.python.org/) for the match between Hammarby 5 vs. 2 IF Elfsborg on 22nd July 2019 using [pandas](http://pandas.pydata.org/) DataFrames.

---

## <a id='#section5'>5. Next Steps</a>
The step is to visualise this data and create Pitch Control models.

---

## <a id='#section6'>6. References</a>

---

***Visit my website [EddWebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***

[Back to the top](#top)