# Step 1 - Business Understanding

## Problem Definition

### Primary Problem Statement
This project aims to **predict player performance metrics in the NBA**, specifically focusing on predicting a player's points per game (PPG) in upcoming seasons based on historical career statistics and performance trends. This represents a **regression problem** with a continuous numeric response variable.

**Response Variable:** Points Per Game (PPG) - Continuous numerical variable

### Business Context and Industry Importance

#### Relevance to NBA Teams and Franchises
Player performance prediction is critically important for NBA organisations for several key reasons:

1. **Contract Negotiations and Salary Cap Management**: Teams operate under strict salary cap constraints. Accurately predicting a player's future performance helps management make informed decisions about contract offers, extensions, and trade valuations, potentially saving millions of pounds in overpayments or identifying undervalued talent.

2. **Draft Strategy and Talent Acquisition**: Understanding performance trajectories helps teams evaluate draft prospects and potential acquisitions, enabling more strategic roster construction and long-term planning.

3. **Game Strategy and Coaching Decisions**: Predicting player output assists coaching staff in lineup optimisation, playing time allocation, and tactical planning throughout the season.

4. **Injury Prevention and Load Management**: Identifying performance decline patterns may indicate injury risks or fatigue, allowing teams to implement preventive measures and optimise player rest schedules.

#### Broader Industry Applications

1. **Sports Analytics Industry**: The methodologies developed can be applied across other professional sports leagues (Premier League, NFL, MLB) for similar performance prediction challenges.

2. **Sports Betting and Fantasy Sports**: Accurate performance predictions drive billions in revenue for betting companies and daily fantasy sports platforms, where player projections directly impact odds-setting and user engagement.

3. **Sports Media and Broadcasting**: Media organisations use performance predictions for content creation, pre-match analysis, and viewer engagement, enhancing the fan experience.

4. **Player Development and Training**: Youth academies and development programmes can use predictive insights to optimise training regimens and identify areas requiring additional focus.

### Associated Problems and Analysis Requirements

#### Predictor Variable Profiling
We need to examine and understand the characteristics of various predictor variables, including:
- Games played (GP)
- Minutes played (MIN)
- Field goal percentage (FG%)
- Three-point percentage (3P%)
- Free throw percentage (FT%)
- Rebounds (REB)
- Assists (AST)
- Steals (STL)
- Blocks (BLK)
- Turnovers (TOV)
- Player age and experience level
- Season and team context

#### Relationship Analysis Requirements
1. **Predictor-to-Predictor Relationships**: 
   - Identify multicollinearity amongst predictor variables
   - Understand how variables like minutes played correlate with other statistics
   - Examine temporal dependencies (how previous season performance affects current season)

2. **Predictor-to-Response Relationships**:
   - Determine which variables have the strongest predictive power for PPG
   - Identify non-linear relationships that may require transformation
   - Analyse interaction effects between variables

### Research Questions

1. Which historical performance metrics are the strongest predictors of future points per game?
2. How does player age and career stage impact performance trajectory?
3. Are there identifiable patterns that distinguish players who improve versus those who decline?
4. What is the optimal combination of variables for accurate PPG prediction?
5. How do team changes and contextual factors influence individual player performance?

### Project Objectives

1. Develop a robust predictive model for player PPG with acceptable accuracy (target: R² > 0.75)
2. Identify key performance indicators that drive scoring output
3. Create actionable insights for team management and player evaluation
4. Validate model performance across different player archetypes and positions
5. Provide a scalable framework applicable to other performance metrics

# Step 2 - Data Mining

In [None]:
from nba_api.stats.endpoints import playercareerstats

# Nikola Jokić
career = playercareerstats.PlayerCareerStats(player_id='203999') 

# pandas data frames (optional: pip install pandas)
career.get_data_frames()[0]

# json
career.get_json()

# dictionary
career.get_dict()


{'resource': 'playercareerstats',
 'parameters': {'PerMode': 'Totals', 'PlayerID': 203999, 'LeagueID': '00'},
 'resultSets': [{'name': 'SeasonTotalsRegularSeason',
   'headers': ['PLAYER_ID',
    'SEASON_ID',
    'LEAGUE_ID',
    'TEAM_ID',
    'TEAM_ABBREVIATION',
    'PLAYER_AGE',
    'GP',
    'GS',
    'MIN',
    'FGM',
    'FGA',
    'FG_PCT',
    'FG3M',
    'FG3A',
    'FG3_PCT',
    'FTM',
    'FTA',
    'FT_PCT',
    'OREB',
    'DREB',
    'REB',
    'AST',
    'STL',
    'BLK',
    'TOV',
    'PF',
    'PTS'],
   'rowSet': [[203999,
     '2015-16',
     '00',
     1610612743,
     'DEN',
     21.0,
     80,
     55,
     1733,
     307,
     600,
     0.512,
     28,
     84,
     0.333,
     154,
     190,
     0.811,
     181,
     379,
     560,
     189,
     79,
     50,
     104,
     208,
     796],
    [203999,
     '2016-17',
     '00',
     1610612743,
     'DEN',
     22.0,
     73,
     59,
     2038,
     494,
     854,
     0.578,
     45,
     139,
     0.324,

# Step 3 - Data Cleaning

# Step 4 - Data Exploration

# Step 5 - Feature Engineering

# Step 6 - Predictive Modelling

# Step 7 - Findings