# Building the Shot Dataset for the first task: xG Model

This notebook aims to create a dataset focused on the shots taken by players, exploiting the `statsbombpy` library to explore the [StatsBomb dataset](https://github.com/statsbomb/open-data). The expected goals (xG) model will be developed using the created dataset as the basis.

This below are the steps taken in this notebook:
1. Load and analyze the relevant information from the StatsBomb dataset.
2. Examine some matches to comprehend how `Shot` events are structured
3. Determine the data to include in the model by looking at the competition available data and structure
4. Build the dataset based on the information retrieved from the dataset

All generated datasets will be kept in the `data/` folder.


## 1\) Analysis of the Dataset

In [None]:
from statsbombpy import sb
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

# Load competitions
competitions = sb.competitions()

# Prepare list for summary
competition_list = []

# Loop through competitions and count matches
for _, row in competitions.iterrows():
    comp_id = row['competition_id']
    season_id = row['season_id']
    matches_df = sb.matches(competition_id=comp_id, season_id=season_id)
    num_matches = matches_df.shape[0]

    competition_list.append({
        "Season": row['season_name'],
        "Competition": row['competition_name'],
        "Gender": row['competition_gender'].capitalize(),
        "Matches": num_matches
    })

# Create DataFrame
competitions_summary = pd.DataFrame(competition_list)

# Sort by season string (StatsBomb format) using the first year for sorting
competitions_summary['Season_Start'] = competitions_summary['Season'].str[:4].astype(int)
competitions_summary = competitions_summary.sort_values(
    by=['Season_Start', 'Competition'],
    ascending=[True, True]
).drop(columns='Season_Start')

# Pretty print table
print("Available Competitions in StatsBomb Open Data\n")
print(competitions_summary.to_string(index=False))


Available Competitions in StatsBomb Open Data

   Season             Competition Gender  Matches
     1958          FIFA World Cup   Male        2
     1962          FIFA World Cup   Male        1
1970/1971        Champions League   Male        1
     1970          FIFA World Cup   Male        6
1971/1972        Champions League   Male        1
1972/1973        Champions League   Male        1
1973/1974                 La Liga   Male        1
     1974          FIFA World Cup   Male        6
1977/1978            Copa del Rey   Male        1
     1977   North American League   Male        1
     1979      FIFA U20 World Cup   Male        1
     1981        Liga Profesional   Male        1
1982/1983            Copa del Rey   Male        1
1983/1984            Copa del Rey   Male        1
     1986          FIFA World Cup   Male        3
1986/1987                 Serie A   Male        1
1988/1989      UEFA Europa League   Male        3
     1990          FIFA World Cup   Male        1
199