# Loading Data for Partizan 2022/23 Analysis

## Overview
This notebook is the first step in analyzing the 2022-2023 EuroLeague season performance of the Partizan basketball team. The goal is to load the raw EuroLeague box score dataset, filter it for Partizan’s 2022/23 season data, and save the filtered data for further cleaning and analysis. This ensures we start with a focused dataset relevant to our project. The dataset is sourced from Kaggle and contains EuroLeague box scores from multiple seasons, including team and player statistics for all games.

The first step is to load the raw EuroLeague box score dataset from a CSV file. The dataset contains game-by-game statistics for all teams and players in the EuroLeague. Since column 10 contains mixed data types (strings and integers), we specify its type as a string to avoid loading errors.



In [None]:
# loading data
import pandas as pd

file_path = "../data/euroleague_box_score.csv"

# specify the data type for column 10 because it contains strings and integers
dtype_spec = {10: str}

df = pd.read_csv(file_path, dtype=dtype_spec)

The dataset is successfully loaded into a pandas DataFrame `df`. This DataFrame contains all EuroLeague box score data, including team and player statistics across multiple seasons. Specifying the data type for column 10 ensures that mixed data (e.g., strings like "DNP" or integers for stats) is handled correctly.

In [9]:
# filter the data for the Partizan team in the 2022 season
partizan_2022 = df[(df['team_id'] == 'PAR') & (df['season_code'] == 'E2022')]

partizan_2022.head()


Unnamed: 0,game_player_id,game_id,game,round,phase,season_code,player_id,is_starter,is_playing,team_id,...,total_rebounds,assists,steals,turnovers,blocks_favour,blocks_against,fouls_committed,fouls_received,valuation,plus_minus
12507,E2022_309_P009862,E2022_309,MAD-PAR,35,PLAYOFFS,E2022,P009862,1.0,1.0,PAR,...,3,6,0,2,0,0,1,6,33,7
17315,E2022_325_P009862,E2022_325,MAD-PAR,39,PLAYOFFS,E2022,P009862,1.0,1.0,PAR,...,2,3,0,2,0,1,2,7,26,-9
17316,E2022_288_P009862,E2022_288,PAR-MAD,32,REGULAR SEASON,E2022,P009862,0.0,1.0,PAR,...,2,6,3,0,0,1,3,3,18,24
17318,E2022_127_P009862,E2022_127,ULK-PAR,15,REGULAR SEASON,E2022,P009862,0.0,1.0,PAR,...,5,2,1,1,1,0,1,4,17,-5
17320,E2022_269_P009862,E2022_269,PAR-OLY,30,REGULAR SEASON,E2022,P009862,1.0,1.0,PAR,...,1,3,0,1,0,1,2,4,15,6


The filtered DataFrame `partizan_2022` contains only the data for Partizan in the 2022/23 season. Displaying the DataFrame confirms that the filtering worked as expected, showing columns like player names, game stats, and team information for Partizan’s games. This subset includes both player-level and team-level box scores, which will be separated in later steps.

In [None]:
# save the data to a CSV file
partizan_2022.to_csv('partizan_2022_raw.csv', index=False)

## Conclusion
This notebook successfully loaded the raw EuroLeague box score dataset, filtered it for Partizan’s 2022/23 season, and saved the filtered data to a new CSV file. The resulting `partizan_2022_raw.csv` file contains all relevant data for Partizan, ready for cleaning and validation in the next steps. This ensures that subsequent analyses focus only on the data we need, improving efficiency and accuracy.

### Next Steps
- Proceed to `cleaning_data.ipynb` to handle missing values, standardize formats, and prepare the data for analysis.