## Lab: Pythagorean Expectation for NBA

### On You Own: Deriving the Exponent for NBA Pythagorean Expectation

In this lab, you will perform the same analysis for the NBA reusing (almost exactly) the same code from the demo on Pythagorean Expectation for MLB but tweaked whereever necessary.  If you are unsure how to do something, just look to the corresponding part of the MLB section and emulate the code.  The data is loaded in the first cell.

The columns (excluding some self-explanatory ones):
+ `lg_id`: League ID
+ `mp`: minutes played
+ `pts`: points scored
+ `opp_pts`: opponent points scored

**IMPORTANT TIP**: Reuse the code from the demo on Pythagorean Expectation for MLB as much as possible.  You should be able to reuse basically all of it and rename a few things here and there.  It should all work to produce the results for this lab!  

## Setup (Do Not Change)

In [None]:
%run ../../utils/notebook_setup.py

import numpy as np

from datascience import Table
from datascience_stats import linear_fit

usecols = ['wins','losses','g','pts','opp_pts','year','team_id']

nba = Table.read_table(
    "nba_team_season_data.csv",
    usecols=usecols
)

nba.show(5)

### 1. The first thing we need to do is compute the winning percentage 
$$
    \text{Win Pct} = W / G
$$

### 2. Then we need to compute Points per Game values
\begin{align*}
    \text{Points For per Game} & = \text{Points For}\ /\ \text{Game} \\
    \text{Points Against per Game} & = \text{Points Against}\ /\ \text{Game} \\
    \text{Net Points per Game} & = \text{Points For per Game} - \text{Points Against per Game}
\end{align*}

Call the columns `ppg`, `opp_ppg`, and `net_ppg`.

*Note: Feel free to perform the analysis using Ratings, which provide points per 100 possessions, provided in the NBA dataset (you will need to change the data loading to include those columns).  The results will be identical.

_Question_

We're computing a per game value.  Should we use 82 or should we use something else?  What happened in the NBA recently (~2011) that might necessitate not using 82?

Show the top 10 team seasons by Net Points per Game.  Only show the following columns: `team_id, wins, losses, ppg, opp_ppg, net_ppg`

### 3. Compute the Linear Model
$$
    \text{Linear Win Pct} = \alpha  + \beta \cdot \text{Net Points per Game}
$$
where $\alpha$ gives $\text{Average Win Pct}$ and $\beta$ gives $\text{Win Pct per Net Points per Game}$.

Plot the linear model results as we did with MLB.

**Remember: Reuse the code from the MLB demo!**


_Question_

For what values of Net Points per Game does $\text{Linear Win Pct} < 0$ and $\text{Linear Win Pct} > 1$?  How much of an issue is that here compared to when we looked at MLB data?

The estimated value of $\beta$ should be about $0.03$.

Compute the "Net PPG per Win" from the linear model.

You should get a Net PPG per win of about 0.38.

### 4. Compute the following values:
\begin{align*}
    \text{Points Ratio} & = \text{PPG}\ /\ \text{Opp PPG} \\
    \text{Log Points Ratio} & = \log \text{Points Ratio} \\
    \text{Log Odds} & = \log \text{Wins}\ /\ \text{Losses}
\end{align*}

### 5. Compute a Pythagorean exponent for the NBA

Plot the results of the model for the Pythagorean exponent.  

**Again, reuse the MLB code with appropriate changes!**

You should get a large value (around 14).  We could perform this analysis on all sorts of sports. 

_Question_

What does this large value for the exponent mean?  To answer this question, start by answering this series of questions:
+ Suppose some random sport had an exponent of $K=1\text{mil}$.  If a team is able to score just a bit more than its opponents so $\text{Points Ratio} > 1$ by a small amount.  What is $\text{Points Ratio}^K$ in this case?  What is the team's expected winning percentage?
+ Suppose as sport had an $K=0.00001$.  What is $\text{Points Ratio}^K$ in this case?  What is a team's expected winning percentage if it is able to score just a bit more than its opponents?  What about if it's outscored by a little bit?
+ Do larger or smaller values of K lead to a sport which features a lot of luck/chance in its outcomes?

### 6. Using the computed exponent*, compute the Pythagorean Expectation
*To skip the previous cell if it isn't working immediately: use 14. 

_Question_

For team with really poor net scoring performance, how does the Pythagorean formula compare to the linear formula?  Which seems to perform better in this case?

### 7. Compute Pythagorean Luck

Again, use the columns `team_id, wins, losses, ppg, opp_ppg, net_ppg`

+ Display a table of the top 10 "luckiest" teams.
+ Display a table of the top 10 "unluckiest" teams.

### 8. Compute a table of Points-to-Wins values

+ A function with the Points per Win formula has been provided
+ A range of point-per-game values for PPG and Opponent PPG has been provided
+ Compute the Points per Win for various PPG values

You should see values around 30 points, or .3 PPG, per Win.  Interpret this as follows: if you increase your scoring by 1 PPG, you should expect about a 3 win improvement.  Teams like the 96 Bulls or recent Warriors with a Net PPG of 10 see close to 30 game increases above .500, ie high 60s wins compared to 41 wins.

In [None]:
def pts_per_win(ppg, opp_ppg, K):
    PR = ppg / opp_ppg
    pyth = PR**K / (PR**K + 1)
    return opp_ppg * PR**(K + 1) / (K * pyth**2)

ppg_rng = np.arange(85, 130, 5)

_Question_


Say a team has a star player who averages 20-30 points per game.  The team loses this player for 10 games in the middle of the season.  Use the Net PPG-to-Wins conversions above and give a "back-of-the-envelope" estimate (with a bit of explanation your thinking) of how many extra games we should expect a team without the star to lose.  Consider this when answering:  Do you lose all 20-30 points the player provides or is it replaced in some way?  Is it replaced to the full extent?