# Goals

### 1. Predict how NBA players will do in a given year based on past data
- Output: expected value (points generated per possession) and rates for different situations or actions
    - Option 1: predict based on touches (post, front court, and elbow)
    - Option 2: predict based on action (drive, pull-up shot, and catch and shoot)
- Input: passed years performance on a wider range of stats, injuries?, and age/experience

### 2. Convert the predictions into a wins estimate (Future Goal)
- Output: wins per minute
- Input: output from the predictive model

----------------------------------------------------------------------------------------------------------------------

# Plan
### 1. Get lots of situational data from NBA.com
--- DONE ---
### 2. Understand the data better
- Make sure I understand all stats available
    - Literally understand the stats --- DONE ---  
    - Some EDA to understand some relationships and distributions better --- DONE ---
- Look at the data to determine if Option 1, Option 2, or some other option would make most sense for the predictions
    - *Using Option 2 with post touches since there is overlap in Option 1* --- DONE ---
- Look at the data to determine what measures to use for "expected value" and "rates." 
    - *Using points per shot for shooting and an estimate of points per possession on drives using a calculated expected value from assists and turnovers*  --- DONE ---
- Determine if you are missing necessary data --- ON GOING ---

### 3. Prepare a master dataframe
- Merge the data --- DONE ---
- Determine what players to remove from the model based on playing time (lack of data) --- DONE ---

### 4. Understand the relationship between the predicted stats and team performance
- Create the expected value and rates for each play type for each player --- DONE ---
- Gain insights into the value of the categories that will be predicted. --- DONE ---

### 5. EDA for predictive stats
- Create features based on previous years
- Look for what offensive stats might best predict offensive output

### 6. Create models predicting future performance based on past performance (as outlined in goals)
- Most likely it will be a slightly different model (type, hyperparameters, and input data) for each predicted stat
- Determine what important inferences can be made from the data
- Determine what important predictions can be made from the data

### 7. Create a model to determine wins from stats
- Make sure to only use the output stats from the other models

### 8. Create a presentation/app

# BONUS
### 1. EDA with a focus on defensive stats
- Try to determine what defensive (including rebounding) stats we might need to better be able to determine wins

### 2. EDA with a focus on finding predictive stats
- Look for what defense/hustle/rebounding stats might best predict defensive output


# Assumptions and Reasoning
### Predictive stats
<u>Assumption:</u> Points per possession based on player scoring, assists, and turnovers and estimated league assist and turnover values is a reasonable estimate of the value of the possession.  
<u>Reasoning:</u> Measures that take in many more possible outcomes would be significantly harder to calculate, would be even harder to do so accurately, and may add only marginal value.

<u>Assumption:</u> Using a player's assist to potential assist ratio is better than using the league ratio when determining the value of a potential assist.  
<u>Reasoning:</u> Players on teams with poor shooting will be undervalued, but quality of player passes and open looks generated by a player will be valued.

<u>Assumption:</u> Potential assist to assist ratios are consistent among different play types.  
<u>Reasoning:</u> I could not find data breaking it down by play type. Using the player's overall ratio was as close as I could get.

<u>Assumption</u> The plays that are being used (catch andf shoot, pull-ups, drives, and post-ups give a complete enough picture of a player.  
<u>Reasoning:</u> While predicting these stats does not mean they should solely be used to evaluate a player, it does limit what one has to evaluate a player. Also, these playes are major scoring plays and many of the other available plays had too much overlap.

<u>Assumption:</u> The overlap of assists from drives and post-ups and catch and shoot plays for team evaluation is worth the gain created by adding the value of passing from drives and post-ups.  
<u>Reasoning:</u> While the team numbers may have overlap, this will not create overlap for individual players, which is the ultimate goal of this project. Also, when evaluating the value of points from a play type (Step 4), this assumption will be taken into account.

### Determining wins
<u>Assumption:</u> The general assumption is that if a stat is equivalent for two players, they are performing equally well in that category. This takes into account the assumptions below.
- Player stats are against equal opponents (although some players play more against bench players and some play more against starters).
- Player stats are not just random (for example: free throw percentage against would have nothing to do with a player's defensive skill, but would be predictive of their defensive "output").
- Player stats are equally valuable on different teams and lineups.
- Players stats are not a product of the teammates and lineups.
- Player stats are demonstrative of their non-garbage time performance.

<u>Reasoning:</u> The model would need to be significantly more complex to take in all of that information. Understanding the affects of schedule, teammates, and role is a future goal.
    
### Data Included
<u>Assumption:</u> NBA.com and Basketball-Reference have accurately determined all of these plays and there is next to no missing data.  
<u>Reasoning:</u> NBA.com has the most thorough data I can find. Both NBA.com and Basketball-Reference are trusted resources.  

<u>Assumption:</u> Data for players who have consistently played very few minutes is not accurate enough. Similarly, only including marginal players who grow to be non-marginal players will not skew the model too much.  
<u>Reasoning:</u> Very few players make the jump from scrub to contributor. Therefore, it should not have much of an effect on the model. Also, the purpose of the model is more to predict growth in regular players than looking for fringe players to invest in.