# Exploration of Shots Across All Competitions  

In the code below, all **shot events** from each competition and each season present in the StatsBomb Open Data are collected. The objective is to determine:  

- the **total number of shots** in the dataset,  
- how many of them include a **freeze frame**,  
- and the proportion of shots with and without this contextual information

In [None]:
from statsbombpy import sb
import pandas as pd
from tqdm import tqdm

import warnings
warnings.filterwarnings("ignore")

# Retrieve all available competitions in StatsBomb Open Data
competitions = sb.competitions()

# Store all shots here
all_shots = []

# Iterate over each competition-season pair
for _, row in tqdm(competitions.iterrows(), total=len(competitions), desc="Processing competitions"):
    comp_id = row['competition_id']
    season_id = row['season_id']

    # Retrieve all matches for the given competition and season
    matches = sb.matches(comp_id, season_id)

    # Iterate through each match
    for match_id in tqdm(matches['match_id'], desc=f"Comp {comp_id}, Season {season_id}", leave=False):
        
        # Retrieve all events for this match
        events = sb.events(match_id=match_id)

        # Filter only "Shot" events
        shots = events[events['type'] == 'Shot']

        # If there are shots in this match, append them to our global list
        if not shots.empty:
            all_shots.append(shots)

# Concatenate all shots into a single DataFrame
shots_df = pd.concat(all_shots, ignore_index=True)

# Count statistics
total_shots = len(shots_df)
shots_with_ff = shots_df['shot_freeze_frame'].notna().sum()
shots_without_ff = total_shots - shots_with_ff

# Print summary statistics
print("----------------------------------")
print(f"Total shots         : {total_shots}")
print(f"With freeze frame   : {shots_with_ff}")
print(f"Without freeze frame: {shots_without_ff}")
print(f"Percentage with FF  : {shots_with_ff / total_shots:.2%}")


Processing competitions: 100%|██████████| 75/75 [42:10<00:00, 33.74s/it]  


----------------------------------
Total shots         : 88023
With freeze frame   : 86833
Without freeze frame: 1190
Percentage with FF  : 98.65%


#### Save the df with all the shots from all competitions and seasons

In [None]:
# Save the DataFrame to a CSV file
from pathlib import Path

# Define the source path for the saved DataFrame
src_path = Path("../task1_xg/data/shots_df.csv")
print(f"Saving {src_path.name}")

# Save the DataFrame
shots_df.to_csv(src_path, index=False)

Saving shots_df.csv


## Analysis of Shot Statsbomb xG

Below, the `shots_df.csv` file is loaded with the column `shot_statsbomb_xg` explicitly cast to type `float64`.  
This guarantees that numeric values are stored as floats, while invalid entries (e.g., empty strings or `"null"`) are automatically converted to `NaN`.   The column data type and the total number of missing values are then displayed to verify that no missing values are retrieved.

In [None]:
import pandas as pd
from pathlib import Path

import warnings
warnings.filterwarnings("ignore")

src_path = Path("../task1_xg/data/shots_df.csv")

# Force 'shot_statsbomb_xg' to float, invalid values will be converted to NaN
shots_df = pd.read_csv(src_path, dtype={"shot_statsbomb_xg": "float64"})

print("Column type:", shots_df["shot_statsbomb_xg"].dtype)
print("Total NaN values:", shots_df["shot_statsbomb_xg"].isna().sum())

Column type: float64
Total NaN values: 0
