#### Setup

In [None]:
%pip install -r  Requirements.txt

# Cougar Basketball Visualization Project
## By John Salmon

### The information for this project was acquired from sports reference. The data set and more information about it can be found [here](https://www.sports-reference.com/cbb/schools/washington-state/men/2024.html).

The goal of this project is to create interesting and insightful visualization about the WSU Men's basketball team's 2023-2024 season.

#### Import Data

In [None]:
import pandas as pd
gamelog = pd.read_csv('CougarBBallStats/gamelog.csv', index_col = 'G')
gamelog.head()

#### Cleaning

In [None]:
print(gamelog.columns)

In [None]:
#Fix Missing Column Names
gamelog.rename(columns = {'Unnamed: 2': 'Location'}, inplace = True)
gamelog.drop(columns = ['Unnamed: 23'], inplace = True) #Column is empty and used for spacing

#fix nan values in Location column
gamelog['Location'] = gamelog['Location'].fillna('H')

In [None]:
#Column Wise null and na counts
print('Null Count: ', gamelog.isnull().sum())
print('NA Count: ', gamelog.isna().sum())

In [None]:
#Column wise data type correction
print('Data Types: ', gamelog.dtypes)

cols_to_convert = ['']

#### Feature Engineering

In [None]:
#Score Difference Column
gamelog['Victory Margin'] = gamelog['Tm'] - gamelog['Opp.1']

#### Data Exploration

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
#Correlation Matrix
corr = gamelog.select_dtypes(include = ['int64', 'float64']).corr()

plt.figure(figsize = (30, 30))
sns.heatmap(corr,
            annot = True,
            cmap = 'Spectral',
            vmin = -1, vmax = 1,
            square = True,
            linewidths = 0.5)
plt.title('Correlation Matrix', fontsize = 20)
plt.show

This correlation matrix looks really cool and tells us some interesting things, but its important to remember that these correlations do not mean that these factors are having an effect on others, instead it means there is a possible link between the two.

In [None]:
#distplot of victory margin
sns.distplot(gamelog['Victory Margin'], color = 'crimson', bins = 35, hist_kws = {'alpha': 0.5})

In [None]:
gamelog_nums = gamelog.select_dtypes(include = ['float64', 'int64'])
gamelog_nums.hist(figsize = (16, 20), bins = 35, xlabelsize = 8, ylabelsize = 8, color = 'red')

#### Import Data From Other Teams

Next lets import the data from other college basketball teams for reference.


In [None]:
#Function for cleaning and adding features
def clean_and_add_columns(gamelog):
    '''This function takes a dataframe and performs the previous cleaning
    steps outlined in this noteboom'''
    
    #clean
    gamelog.rename(columns = {'Unnamed: 2': 'Location'}, inplace = True)
    gamelog.drop(columns = ['Unnamed: 23'], inplace = True) #Column is empty and used for spacing

    #fix nan values in Location column
    gamelog['Location'] = gamelog['Location'].fillna('H')
    
    #add columns
    gamelog['Victory Margin'] = gamelog['Tm'] - gamelog['Opp.1']
    

In [None]:
#Import and Clean Data From other teams
kansas = pd.read_csv('Kansas_Gamelog.csv', index_col = 'G') #team with most basketball games played in their history (cougs are 2nd)
uconn = pd.read_csv('UConn_Gamelog.csv', index_col = 'G') #winner of the 2024 Men's NCAA tournament
uw = pd.read_csv('UW_gamelog.csv', index_col = 'G') #as a former husky (grad 2023) they put the 'dog' in 'dogshit' (sorry dad I'm a cougar now)

new_data = [kansas, uconn, uw]
for df in new_data:
    clean_and_add_columns(df)