# **[Data4Life] - Introduction to Data Science**
Topic ***NBA*** - Group ***16***

### Our motivation
The NBA is one of the most popular sports leagues worldwide. It has been captivating millions of fans with its teams, players and storied history. Because of its popularity, it generate vast amount of data from game statistics to player achievements lead to create an excellent opportunity to apply data science techniques to analyze trends, identify patterns and derive meaningful insights.

Our works is designed to serve for **sports enthusiasts** who will gain a richer understanding of individual game performance and season-wide trends, **Team and League Stakeholders** who will use these insights for better decision-making and the last one, **data science community** who can learn about practical applications of data science in sports and inspiring further exploration in this field. (I hope that :>>)

### Our purpose
- Provide NBA fans, analysts and stakeholders with deeper insights into team and player performance through data-drive approaches.
- Assist coaches, managers or team owners in optimizing strategies and resources.
- Demonstrate how data science can be applied to a real-world context, showcasing the power of statistical analysis and machine learning in sports analytics.

### Our key objective

**Title 0: Overview of the Current NBA Situation**
- Provide an initial analysis to contextualize the state of the NBA.
- Break down insights into three main categories:
   1. Players
   2. Teams
   3. Data Modelling

**Title 1: Players**

Objective 1: Identify high-performing and consistent players (2020-2025)
- Focus on players with high performance and consistency in the past five years.
- Use Efficiency (EFF) as the primary metric:
EFF = (PTS+REB+AST+STL+BLK)−((FGA−FGM)+(FTA−FTM)+TO) / GP
​
Where:
PTS: Points scored
REB: Total rebounds (offensive + defensive)
AST: Assists
STL: Steals
BLK: Blocks
FGA, FGM: Field goals attempted and made
FTA, FTM: Free throws attempted and made
TO: Turnovers
GP: Games played

Analyze EFF trends:
- Evaluate EFF variability across seasons
- Analyze EFF changes with age (does performance decline as players get older?)

Objective 2: Identify Top-performing rookies (2024-2025)
- Focus only on the current season to identify rookies with standout performances
- Key metrics: EFF, PTS, REB, AST

Objective 3: Analyze player performance by position
- Break down player performance based on their roles:
  - Defensive Players: Metric like REB, STL, BLK
  - Attackers (Scores): Metric like PTS, FG%, 3P%
  - Playmakers (Shooting/Passing): Metric like AST, FG%, 3P%, FT%

Approach: 
- Calculate averages and standard deviations for EFF across years for consistency
- Segment players by position (DEF, ATK, Playmaker)
- Compare rookies' performance against established players

**Title 2: Teams**

Objective: Analyze Team Weaknesses
- Identify areas where teams are underperforming:
   - If a team has a low offensive score (ATK), recommend acquiring players strong in scoring metrics (e.g., PTS, FG%, 3P%)
   - If a team has a weak defense, focus on players with high REB, STL, BLK

Approach:
- Aggregate team-level metrics (e.g., average PTS, REB, STL) and compare them across the league
- Identify statistically significant deficiencies (e.g., below the league average)
- Recommend specific player profiles to address these weakness

**Title 3: Data Modelling**

Objective: Predict Future Performance
- Develop a model to predict whether a player's performance will remain high in the next season
- Use player data from past seasons to build predictive models, incorporating:
   - EFF
   - Age
   - Historical trends in metrics like PTS, REB, AST, STL, BLK

Approach:
1. Feature Engineering
   - Create relevant features (e.g., age, position, past EFF trends)

2. Train Predictive Models
   - Models: Linear regression, random forest or Neural Networks
   - Evaluate performance using metrics like RMSE or MAE

3. Test the model
   - Validate predictions against current season data to ensure accuracy



### Import module

In [21]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns