# Exploratory Data Analysis

## Audience/Stakeholders

- Clearly identify who your are writing your final report for

This final report on the performance of UVA men's basketball is intended for the Coaching Staff, as well as the Athletic Department. These stakeholders are involved in studying and making decisions about player developments, training loads, game preparations, and long-term performance strategies.

## Problem Statement

- Create a concise and compelling problem or question that guides your analysis.
  - Ex. "The men's basketball team had a worse performance this year. Did preparation around games change? How did playerload, jumps, high accelerations, and change of direction compare before games over the current season and the previous season?"
  
In today’s college basketball climate where players often jump from school to school, many programs have struggled to adapt. Many programs, like UVA, have historically relied on player development with the convention of players staying at a single school throughout their college career. Given the shifting landscape, we want to analyze whether or not a single season is enough for a player to truly develop and progress. We will look at the Catapult data of several players over the course of a season, tracking how their PlayerLoad/min and jump/acceleration band distributions evolve. We want to measure individual growth with the aim of driving scouting decisions. Should coaches scout players with the anticipation that significant player development can happen over the course of just one season or should they recruit more talented players with less room for growth under the assumption that one season isn’t enough for a player to mature?

## Important Variables

- List which ones are important for your analysis and why.

## Merging and Cleaning the Dataset

- Clean the data: Remove duplicates, handle missing values, correct data types

- Your final dataset should include only variables relevant to your problem

In [4]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Loading in data from season 1 and 2
s1 = pd.read_csv("../data/catapult season 1.csv")
s2 = pd.read_csv("../data/catapult season 2.csv")

# Adding column with season to each dataset
s1["Season"] = 1
s2["Season"] = 2

# Combining the two seasons of data into a single data frame
data = pd.concat([s1, s2])
# Ignoring the UserWarning raised by the above line. The warning raised tells me that not setting a specific format or handing errors could be problematic, but further
# examination of the Date column showed that is not, so I am hiding the warning from printing
warnings.simplefilter(action = "ignore", category = UserWarning)

# Converting date column to datetime
data["Date"] = pd.to_datetime(data["Date"])

data.head(1)

Unnamed: 0,Date,About,Position,Period Number,Period,Total Acceleration Efforts,Total Player Load,Player Load Per Minute,IMA Accel Low,IMA Decel Low,...,Session Total Jump,Session Jumps Per Minute,Total CoD Left,Total CoD Right,Total High IMA,Total IMA,IMA/Min,event-uuid,group-uuid,Season
0,2023-03-14,Athlete I,Guard,1,1. Pre Practice,0,87.437,4.1,3,17,...,95.0,1.05,269.0,306.0,89.0,899.0,,c4e1f0fe-b87a-42ca-8f41-b5b0e4cdfab3,c4e1f0fe-b87a-42ca-8f41-b5b0e4cdfab3,1


In [5]:
columns_of_interest = ["Date", "About", "Period", "Player Load Per Minute", "IMA Accel Low", "IMA Accel High", "IMA Accel Total", "IMA Jump Count Low Band", "IMA Jump Count Med Band", "IMA Jump Count High Band", "Season"]

data = data[columns_of_interest]

# Renaming periods which correspond to game halves
data.loc[data["Period"].str.contains("Period 1"), "Period"] = "Game Half 1"
data.loc[data["Period"].str.contains("Period 2"), "Period"] = "Game Half 2"

data.head()

Unnamed: 0,Date,About,Period,Player Load Per Minute,IMA Accel Low,IMA Accel High,IMA Accel Total,IMA Jump Count Low Band,IMA Jump Count Med Band,IMA Jump Count High Band,Season
0,2023-03-14,Athlete I,1. Pre Practice,4.1,3,1,4,0,17,5,1
1,2023-03-14,Athlete I,"2. Drill_Offense_ Flare, Rescreen_Half Court_4v4",8.2,4,1,7,4,1,0,1
2,2023-03-14,Athlete I,3. Drill_Defense_PCM_Half Court_4v4,9.9,1,1,4,0,7,0,1
3,2023-03-14,Athlete I,4. Drill_Defense_Lane Width Slide_Full Court_1v1,7.0,1,0,4,0,0,0,1
4,2023-03-14,Athlete I,5. Drill_Defense_Fake Game_Half Court_2v2,7.7,2,0,2,2,2,3,1


## Descriptive Statistics & Distributions

- Provide Summaries of important variables

- Use visualizations to explore distributions

## Examine Correlations (If Relevant)

- Interpret Findings: What variables appear related?

## Explore Relationship (If Relevant)

- Dig into potential causal or descriptive relationships

- Use visualizations and statistical summaries