<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">


# Predicting Shots Made Per Game by Kobe Bryant

_Authors: Kiefer Katovich (SF)_

---

In this lab you'll be using regularized regression penalties — ridge, lasso, and elastic net — to try and predict how many shots Kobe Bryant made per game during his career.

The Kobe Shots data set contains hundreds of columns representing different characteristics of each basketball game. Fitting an ordinary linear regression using every predictor would dramatically overfit the model, considering the limited number of observations (games) we have available. Plus, many of the predictors have significant multicollinearity. 


**Warning:** Some of these calculations are computationally expensive and may take a while to execute. It may be worthwhile to only use a portion of the data to perform these calculations, especially if you've experienced kernel issues in the past.

---

### 1) Load packages and data.

In [1]:
import numpy as np
import pandas as pd
import patsy

from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.model_selection import cross_val_score, train_test_split


import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')


%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [2]:
kobe = pd.read_csv('./datasets/kobe_superwide_games.csv')

---

### 2) Examine the data.

- How many columns are there?
- Examine what the observations (rows) and columns represent.
- Why might regularization be particularly useful for modeling this data?

In [3]:
# A:
#2.1
print(f'The number of columns is {len(kobe.columns)}\n')
#2.2 and 2.3 below the result of 2.2
kobe.head(2)

The number of columns is 645



Unnamed: 0,SHOTS_MADE,AWAY_GAME,SEASON_OPPONENT:atl:1996-97,SEASON_OPPONENT:atl:1997-98,SEASON_OPPONENT:atl:1999-00,SEASON_OPPONENT:atl:2000-01,SEASON_OPPONENT:atl:2001-02,SEASON_OPPONENT:atl:2002-03,SEASON_OPPONENT:atl:2003-04,SEASON_OPPONENT:atl:2004-05,...,ACTION_TYPE:tip_layup_shot,ACTION_TYPE:tip_shot,ACTION_TYPE:turnaround_bank_shot,ACTION_TYPE:turnaround_fadeaway_bank_jump_shot,ACTION_TYPE:turnaround_fadeaway_shot,ACTION_TYPE:turnaround_finger_roll_shot,ACTION_TYPE:turnaround_hook_shot,ACTION_TYPE:turnaround_jump_shot,SEASON_GAME_NUMBER,CAREER_GAME_NUMBER
0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,1
1,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,2


<b style="color:blue;">Answer: </b> The regularization will use to prevent the overfitting if it is possible and reduce the impact of the unimportant field.

---

### 3) Create predictor and target variables. Standardize the predictors.

Why is normalization necessary for regularized regressions?

Use the `sklearn.preprocessing` class `StandardScaler` to standardize the predictors.

In [4]:
# A:
from sklearn.preprocessing import StandardScaler
# select all the columns that are not the target
X = kobe.drop(axis=1,columns='SHOTS_MADE')
# Choose a target/dependent variable that we will predict
y = kobe['SHOTS_MADE']
# train test split with shuffling
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, shuffle=True)

# we standardise all the columns 
ss = StandardScaler()
Xs_train = ss.fit_transform(X_train)
Xs_test=ss.transform(X_test)
# Build Linear regression model 
lr = LinearRegression()
# perform fit function for Xs_train (standardised columns) & y_train (SHOTS_MADE)
lr.fit(Xs_train, y_train)

LinearRegression()

<b style="color:blue;">Answer: </b> To prevent the impact of the variety of each predictor coefficient that it will affect in regularization methods if they are not scaled.

---

### 4. Build a linear regression predicting `SHOTS_MADE` from the rest of the columns.

Cross-validate the $R^2$ of an ordinary linear regression model with 10 cross-validation folds.

How does it perform?

In [5]:
# A:
# Perform 10-fold cross validation
scores = cross_val_score(lr, Xs_train, y_train, cv=10)
print("Cross-validated scores:", scores)
print("Mean of Cross-validated scores:", scores.mean())

Cross-validated scores: [-6.93941635e+27 -7.92597535e+27 -4.99313504e+27 -1.08616371e+28
 -1.38051227e+28 -8.73559755e+27 -1.98411548e+28 -9.99087418e+27
 -1.86413295e+28 -6.62137427e+27]
Mean of Cross-validated scores: -1.0835561681728554e+28


<b style="color:blue;">Answer: </b> The performace of model is too weak.

---

### 5) Find an optimal value for the ridge regression alpha using `RidgeCV`.

Go to the documentation and [read how RidgeCV works](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html).

> *Hint: Once the RidgeCV is fit, the attribute `.alpha_` contains the best alpha parameter it found through cross-validation.*

Recall that ridge performs best when searching alphas through logarithmic space (`np.logspace`). This may take awhile to fit.


In [7]:
# A:
# Build RidgeCV regression model 
ridgecv = RidgeCV(alphas=np.logspace(.1, 10, 30),cv=5)
lr_ridgecv = ridgecv.fit(Xs_train, y_train)
print('The optimal value for the ridge regression alpha is ',ridgecv.alpha_)

The optimal value for the ridge regression alpha is  1487.3521072935118


---

### 6) Cross-validate the ridge regression $R^2$ with the optimal alpha.

Is it better than the linear regression? If so, why might this be?

In [16]:
# A:
ridgecv.score(Xs_train, y_train)
scores = cross_val_score(lr_ridgecv, Xs_train, y_train,cv=8)
print("Cross-validated scores:", scores)
print("\nMean cross-validated scores:", scores.mean())

Cross-validated scores: [0.57035289 0.64968905 0.63782977 0.60536333 0.61690176 0.67664804
 0.61268225 0.65654696]

Mean cross-validated scores: 0.6282517547731273


<b style="color:blue;">Answer: </b> The performace of model has been improved due to used `ridge` method that reduce impact of unimportant predictors.

---

### 7) Find an optimal value for lasso regression alpha using `LassoCV`.

Go to the documentation and [read how LassoCV works](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html). It is very similar to `RidgeCV`.

> *Hint: Again, once the `LassoCV` is fit, the attribute `.alpha_` contains the best alpha parameter it found through cross-validation.*

Recall that lasso, unlike ridge, performs best when searching for alpha through linear space (`np.linspace`). However, you can actually let the LassoCV decide what and how many alphas to use itself. We recommend letting scikit-learn choose the range of alphas.

_**Tip:** If you find your CV taking a long time and you're not sure if it's working, set `verbose =1`._

In [17]:
# A:
# Build LassoCV regression model 
lassocv = LassoCV(random_state=1,n_jobs=3,verbose=1,cv=5)
lr_lassocv = lassocv.fit(Xs_train,y_train)
print('The optimal value for lasso regression alpha is ',lassocv.alpha_)

[Parallel(n_jobs=3)]: Using backend ThreadingBackend with 3 concurrent workers.
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................[Parallel(n_jobs=3)]: Done   5 out of   5 | elapsed:    6.2s finished


The optimal value for lasso regression alpha is  0.08661930007890158


---

### 8) Cross-validate the lasso $R^2$ with the optimal alpha.

Is it better than the linear regression? Is it better than ridge? What do the differences in results imply about the issues with the data set?

In [22]:
# A:
# Perform 8-fold cross validation
scores = cross_val_score(lr_lassocv, Xs_train, y_train, cv=8,n_jobs=3)
print("Cross-validated scores:", scores)
print("\nMean cross-validated scores:", scores.mean())

Cross-validated scores: [0.58937374 0.68147255 0.6613788  0.63854485 0.65467849 0.70002173
 0.61439559 0.67707408]

Mean cross-validated scores: 0.6521174777440594


<b style="color:blue;">Answer: </b> The performance of the model has been improved compare aginst the linear regression by eliminating the unimportant predictors.

---

### 9) Look at the coefficients for variables in the lasso.

1. Show the coefficient for variables, ordered from largest to smallest coefficient by absolute value.
2. What percent of the variables in the original data set are "zeroed-out" by the lasso?
3. What are the most important predictors for how many shots Kobe made in a game?

> **Note:** If you only fit the lasso within `cross_val_score`, you'll have to refit it outside of that function to pull out the coefficients.

In [28]:
# A:
#9.1
#create data frame of predictors
lassocv_coef_df = pd.DataFrame(X.columns)
lassocv_coef_df.columns = ['predictor_name']
#add predictor coefficients as column
lassocv_coef_df['predictor_coefficients'] = abs(lassocv.coef_)
lassocv_coef = lassocv.coef_
lassocv_coef_df.sort_values('predictor_coefficients',ascending=False).head()

Unnamed: 0,predictor_name,predictor_coefficients
579,COMBINED_SHOT_TYPE:jump_shot,1.217244
574,SHOT_TYPE:2pt_field_goal,0.916124
566,SHOT_ZONE_BASIC:restricted_area,0.360201
577,COMBINED_SHOT_TYPE:dunk,0.29432
611,ACTION_TYPE:jump_shot,0.273077


In [30]:
#9.2
print('The percent of the variables in the original data set are "zeroed-out" is ',len(lassocv_coef[lassocv_coef==0])/len(lassocv_coef))

The percent of the variables in the original data set are "zeroed-out" is  0.8478260869565217


In [31]:
#9.3
print('The most important predictors are \n')
lassocv_coef_df[(lassocv_coef_df['predictor_coefficients'] != 0)][['predictor_name']].head()

The most important predictors are 



Unnamed: 0,predictor_name
0,AWAY_GAME
25,SEASON_OPPONENT:bos:2001-02
30,SEASON_OPPONENT:bos:2006-07
38,SEASON_OPPONENT:bos:2015-16
92,SEASON_OPPONENT:cle:2015-16


---

### 10) Find an optimal value for elastic net regression alpha using `ElasticNetCV`.

Go to the documentation and [read how ElasticNetCV works](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html).

Note that here you'll be optimizing both the alpha parameter and the l1_ratio:
- `alpha`: Strength of regularization.
- `l1_ratio`: Amount of ridge vs. lasso (0 = all ridge, 1 = all lasso).
    
Do not include 0 in the search for `l1_ratio` — it won't allow it and will break.

You can use `n_alphas` for the alpha parameters instead of setting your own values, which we highly recommend.

Also, be careful setting too many l1_ratios over cross-validation folds in your search. It can take a long time if you choose too many combinations and, for the most part, there are diminishing returns in this data.

In [32]:
# A:
# Build ElasticNetCV regression model 
elastic_netcv = ElasticNetCV(cv=5, random_state=1,l1_ratio=[.3,.6,.9],alphas=np.linspace(0.01, .99, 100),tol=0.2)
lr_elastic_netcv = elastic_netcv.fit(Xs_train,y_train)
print('The optimal value for elastic net regression alpha is ',elastic_netcv.alpha_)

The optimal value for elastic net regression alpha is  0.09909090909090908


---

### 11) Cross-validate the elastic net $R^2$ with the optimal alpha and l1_ratio.

How does it compare to the ridge and lasso regularized regressions?

In [33]:
# A:
# Perform 5-fold cross validation
scores = cross_val_score(lr_elastic_netcv, Xs_train, y_train, cv=5)
print("Cross-validated scores:", scores)
print("Mean cross-validated scores:", scores.mean())

Cross-validated scores: [0.63895139 0.66060063 0.63241056 0.65975473 0.64980404]
Mean cross-validated scores: 0.6483042717963627


<b style="color:blue;">Answer: </b> The elastic net produce the score which is higher than the previous models but it is not huge difference.