# Data Denizens Progress Report
By: Chi Hieu Nguyen, Jesus Rojas, Daniel Rodriguez, Dinh Dang Khoa Tran, Duc Tam Nguyen

### Project Introduction
Our data science project seeks to analyze and predict the playoff performance of NBA teams and individual players using a combination of regular season statistics and performance against playoff-caliber opponents. The goal is to determine whether we can accurately forecast postseason outcomes, such as a team’s playoff success or a player's postseason statline, based on regular season trends and contextualized performance data.
One of the core issues with evaluating NBA performance is the distinct contrast between the regular season and the playoffs. Playoff games feature slower pace, increased defensive intensity, tighter rotations, and more targeted game planning. As a result, regular season metrics do not always carry over and some teams and players excel when the stakes are lower, while others elevate their game under pressure. This is an important issue for our stakeholders (NBA coaches, teams, analysts, and fans) because we seek to provide meaningful insight to the most important stretch of games in the NBA. This insight will be greatly important for understanding which players to give more minutes to in the playoffs, as giving the wrong players minutes can lead to a teams short exit in the postseason.
To address this, we’re going beyond just overall regular season stats and win percentage. A key feature of our approach is to isolate regular season performance against other playoff-bound teams, under the assumption that these matchups more closely reflect the intensity and structure of playoff basketball. By focusing on how teams fare against high-level competition, we hope to identify patterns that traditional season-long averages might hide. Our interest in this topic stems from both a passion for basketball and the analytical challenges the NBA presents. With its rich dataset and clear regular season vs. postseason split, the NBA offers an ideal setting to explore how performance under pressure can be modeled and understood.

### Changes from Original Proposal
One change that was made from the original proposal is to include more data from different years. Previously, we planned on using only the regular season for 2024-2025, but we quickly realized that we need to provide other years for our model to train on so it can accurately predict playoff outcomes. That is why we dedcided on using the previous three years of regular season and playoff data in our approach.

### Data cleaning
Data cleaning is done in the team_individual_stat file. The following steps were performed to clean the dataset.
 Step 1:
All column names are converted to uppercase (and extra whitespace is removed) to ensure consistency in later steps.

Steps 2 & 3:
Duplicate rows are removed, and columns that are not relevant (any column with "RANK" in its name plus extra identifier columns like NICKNAME, W, L, and W_PCT) are dropped.

Step 4:
Rows with any missing values are dropped to ensure data integrity.

Step 5:
The function attempts to convert columns (except for key non-numeric ones like PLAYER_NAME, TEAM_ABBREVIATION, and SEASON) to numeric types, which facilitates any numerical analysis.

Step 6:
Column names are ensured to remain uppercase and stripped of whitespace.


In [5]:
import pandas as pd
import team_individual_stat

# Now you can use the function in your notebook:
df = pd.read_csv('player_per_game_stats_regular_2425.xls')
df = team_individual_stat.wrangle_player_performance(df)
df.head()

  df_clean[col] = pd.to_numeric(df_clean[col], errors='ignore')


Unnamed: 0,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_ABBREVIATION,AGE,GP,MIN,FGM,FGA,FG_PCT,...,TOV,STL,BLK,PF,PTS,PLUS_MINUS,NBA_FANTASY_PTS,DD2,TD3,SEASON
0,1630639,A.J. Lawson,1610612761,TOR,24.0,20,17.1,2.8,7.0,0.4,...,0.4,0.3,0.2,1.6,8.1,-0.9,14.2,1,0,2024-25
1,1631260,AJ Green,1610612749,MIL,25.0,66,22.6,2.5,5.9,0.423,...,0.6,0.5,0.1,2.2,7.4,1.7,13.6,0,0,2024-25
2,1642358,AJ Johnson,1610612764,WAS,20.0,23,18.8,2.6,6.2,0.415,...,1.1,0.4,0.1,1.1,7.0,-2.3,13.6,0,0,2024-25
3,203932,Aaron Gordon,1610612743,DEN,29.0,46,27.8,5.0,9.4,0.53,...,1.4,0.4,0.2,1.6,14.3,5.7,25.5,3,0,2024-25
4,1628988,Aaron Holiday,1610612745,HOU,28.0,57,12.8,1.8,4.2,0.43,...,0.6,0.3,0.2,1.0,5.3,1.6,9.7,0,0,2024-25


In [None]:
#code to show some of the datas values. show column names

### EDA & Visualization

In [None]:
#show eda stuff

##### Column names in the cleaned dataset

In [None]:
#dataset.columns

### ML analysis


**Model**

explain how we picked and used our model, 

### Reflection: 
The hardest part that we have encountered was the exploration process of finding the right relationships between statistics that legitimately impacted playoff outcomes. Our whole group engaged in EDA's with the data and we were only able to find a handful of insightful relationships. However, now that we have found these relationships, we can build off of them and continue to delve deeper into exploring them.

We have confirmed the negative relationship that most players have with playoff performance vs. regular season performance, as most players tend to see a downtick in their field goal percentage in the playoffs.

The current biggest problem that we are facing is that... (To be determined)

The results that we have gotten indicate that we are on the right track and also prove our initial hypotheses. The relationship between field goal percentage in the playoffs vs. regular season is that in the playoffs players typically shoot lower than their average for the regular season. Additionally, teams that have a lower win percentage against playoff caliber teams typically do not have great success in the playoffs. Given these findings, we believe that it is worth proceeding with the project as we can continue to find more metrics that determine playoff success for struggling teams and we can also find players that should be played more given other metrics (assists, rebounds, steals, etc).

### Next steps: 
We plan to expand our dataset by getting the previous three years of regular season and playoff statistics. Additionally, we will continue to work on our Models to find the best Model for predicting certain outcomes.