<a href="https://colab.research.google.com/github/akhilsrinath/soccer-analytics/blob/main/DataExtraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Data Extraction and Transformation**

Parsing and extracting raw data from StatsBomb and storing it in a Pandas DataFrame. 

In [1]:
import requests 
import pandas as pd 
import tqdm as tqdm 
import matplotlib.pyplot as plt

- requests: executing HTTP requests 
- pandas: data analysis and manipulation 
- tqdm: progress bar

In [2]:
## Locations for the raw statsbomb data 

base_url = "https://raw.githubusercontent.com/statsbomb/open-data/master/data/"
comp_url = base_url + "matches/{}/{}.json"
match_url = base_url + "events/{}.json"


In [3]:
def parse_data(competition_id, season_id):
  matches = requests.get(url=comp_url.format(competition_id, season_id)).json()
  match_ids = [m['match_id'] for m in matches]

  all_events = []
  for match_id in match_ids: 
    events = requests.get(url=match_url.format(match_id)).json()

    shots = [x for x in events if x['type']['name'] == 'Shot']
    for s in shots:
      attributes = {
          'match_id': match_id,
          'team': s['possession_team']['name'],
          'player': s['player']['name'],
          'x': s['location'][0],
          'y': s['location'][1],
          'outcome': s['shot']['outcome']['name'],
      }
      all_events.append(attributes)

  return pd.DataFrame(all_events)

In [4]:
competition_id = 43
season_id = 3

In [5]:
df = parse_data(competition_id, season_id)

In [6]:
df.head(10)

Unnamed: 0,match_id,team,player,x,y,outcome
0,7578,Uruguay,Edinson Roberto Cavani Gómez,97.0,32.0,Saved
1,7578,Egypt,Mahmoud Ibrahim Hassan,108.0,51.0,Saved
2,7578,Uruguay,Luis Alberto Suárez Díaz,109.0,55.0,Off T
3,7578,Uruguay,Edinson Roberto Cavani Gómez,102.0,23.0,Blocked
4,7578,Uruguay,José Martín Cáceres Silva,114.0,48.0,Wayward
5,7578,Uruguay,Luis Alberto Suárez Díaz,116.0,35.0,Off T
6,7578,Egypt,Marwan Mohsen,100.0,51.0,Saved
7,7578,Uruguay,Matías Vecino Falero,83.0,53.0,Off T
8,7578,Uruguay,Luis Alberto Suárez Díaz,88.0,38.0,Blocked
9,7578,Egypt,Abdalla Mahmoud El Said Bekhit,105.0,48.0,Wayward
