# Exploring the Impact of Player Statistics on Game Outcomes: A Case Study with Scottie Barnes

 Name: [Julian Cruzet]

 ID: [100870375]

The aim of this project is to investigate the relationship between a player statistics and the outcomes of basketball games, with a specific focus on Scottie Barnes. The dataset utilized for this analysis, sourced from [provide the source or repository link], contains comprehensive information about Barnes' performance in various games, including points scored (PTS), assists (AST), rebounds (TRB), blocks (BLK), steals (STL), and plus/minus (+/-).

Given the rich dataset, the primary question of interest revolves around understanding how Scottie Barnes' individual performance metrics influence the likelihood of a win or loss for the Toronto Raptors. Exploring this relationship could provide valuable insights into the significance of different player statistics in determining game outcomes.

This section will delve into the data, the motivation behind the analysis, and the key questions guiding the exploration.

## Data Loading and Preprocessing

- The dataset is loaded using the pandas library, and the 'WL' column is converted to string format.

- Next, the 'Outcome' column is created, representing a binary outcome (1 for a win, 0 for a loss). 

- Lastly, features and target variables are extracted, and missing values are filled with the mean.

In [None]:
import pandas as pd

# The CSV file is loaded
data = pd.read_csv("/Users/julians/Downloads/ScottieBarnesData.csv")

# The WL (win/loss) column is converted to string
data['WL'] = data['WL'].astype(str)

# Extracts the data from the WL column, 1 if a win, 0 otherwise
data['Outcome'] = data['WL'].apply(lambda x: 1 if x[0] == 'W' else 0)

# Extracts the featured data and the target 
features = data[['PTS', 'AST', 'TRB', 'BLK', 'STL', '+/-',]]
target = data['Outcome']

# Label encoding for the outcome column
label_encoder = LabelEncoder()
target = label_encoder.fit_transform(target)

# Fills in missing values with the mean of that column
# In some games, scottie did not play!
features.fillna(features.mean(), inplace=True)


In [None]:
# This cell contains the data splitting 

from sklearn.model_selection import train_test_split

# The data is split into both training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)


## Model Training and Evaluation

- The MLPClassifier is used to model the relationship between Scottie's stats and a game's outcomes.

- The model is trained using the training set given and evaluated with the test set.

- Feature importances are then analyzed to identify the most influential factors.

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, accuracy_score

# Creates and train the MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)

# Makes predictions on the test set
y_pred = mlp.predict(X_test)

# Prints the accuracy and classification reports
print("Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
# This cell contains the feature importance analysis and method to display the model

# Gets the feature importances from the now trained model
feature_importances = mlp.coefs_[0]

# Creates a DataFrame to display the feature importances
# This will show Scottie's stats and their importance
importance_df = pd.DataFrame({'Feature': features.columns, 'Importance': feature_importances.mean(axis=1)})

# Sorts the DataFrame by importance in descending order
importance_df = importance_df.sort_values(by='Importance', ascending=False)

# Displays the feature importances
print(importance_df)


## Data Visualization

- A pair plot is generated using the seaborn import and matplotlib import to visualize the relationships between Scottie's stats and their impact on wins and losses.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Combines features and target for the pair plot
pair_plot_data = pd.concat([features, data['WL']], axis=1)
pair_plot_data['Outcome'] = pair_plot_data['WL'].apply(lambda x: 'Win' if x[0] == 'W' else 'Loss')

# Creates the pair plot
sns.pairplot(pair_plot_data, hue='Outcome')
plt.show()


## Discussion

Based off of the analysis, we can come to multiple conclusions.

- Firstly, the accuracy of the model in predicting the outcome of a is found to have an accuracy of 85%, indicating a very accurate interpretation of the the dataset.

- The feature importances suggests that the impacts of the key features (Points, assists, rebounds, blocks, steals and plus/minus) indicates that Scottie's highest positive impact stat is his plus/minus. This means that Scottie in general being on the floor has the highest impact on the team's success. Likewise, his least positive impact stat is his points stat. This can be interpreted in many ways. One could say that Scottie is simply not scoring enough in games and that he needs to score more points. Another way this can be interpreted is that the team does not need to strongly rely on Scottie's point production to be able to win games. I personally believe it is a mix of both.

- The pair plot that was generated gives us a good visual representation of Scottie's statistical performance and their impact on a game's outcome. Orange represents his stats in wins while blue represents his stats in losses. As we can see, in wins his stats in all categories is higher than his stats in losses. This shows that when Scottie Barnes scores more points, gets more steals or overall plays well, his team has a higher chance of winning.

- To sum it all up, the insights found suggest that Scottie Barnes' individual player performance, especially points, assists, rebounds, steals, blocks and +/-, significantly influence's the outcome of a match.


## References

* Packages used: pandas, scikit-learn, seaborn, matplotlib

* CSV file was curated from stat downloads from Basketball Reference 
    - (https://www.basketball-reference.com/players/b/barnesc01.html)
    


* ChatGPT was used to generate **only** the introduction
    - command used: "Create an introduction paragraph that talks about my project that has the goal of exploring the correlation between scottie barnes' performance and his team's outcome. I will be using points, assists, rebounds, steals, blocks and +/- to determine this. The data will be taken from a CSV file.

    

* Knowledge of artificial neural networks were learnt from [Youtube](https://www.youtube.com/watch?v=ZzWaow1Rvho&list=PLxt59R_fWVzT9bDxA76AHm3ig0Gg9S3So&index=1
)

