# Optimization exercise

## Goal: Train the 2nd order polynomial predictor using both gradient descent and stochastic gradient descent. Optimize the stepsizes and compare against scikit-learn implementation

1. Download data from https://drive.google.com/file/d/0Bz9_0VdXvv9bUUNlUTVrMF9VcVU/view?usp=sharing.
2. Create a function psi(x), which transforms features AST (assists), REB (rebounds) and STL (steals) into 2nd order polynomial features (add each feature squared and each pair of features multiplied with every other)
3. Create a transformed data matrix X, where each x is mapped to psi(x).
4. Create a function p2(x,w), which outputs the value of the polynomial at x for given parameters w.
5. Create a function Loss(X,y,w), which computes the squared loss of predicting y from X by p2(x,w) using parameters w. Take variable PTS as y. We will predict scored points based on assists, rebounds and steals.
6. Code up the gradient descent. It should input a point w and a stepsize.
7. Choose an arbitrary point and stepsize. Run gradient descent for 100 iterations and compute the Loss after each iteration. How does the loss behave? Does it converge to something?
8. Can you find the stepsize, for which the loss is smallest after 100 iterations?

In [219]:
# IMPORT PACKAGES
import matplotlib.pyplot as plt
import numpy as np 
import pandas as pd 
import random
from matplotlib.pylab import rcParams
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
import copy
%matplotlib inline

# 1)

In [220]:
nb = pd.read_csv('nba_games_2013_2015.csv', delimiter=';')
x = nb[['AST','REB','STL']]
y = nb['PTS']

# 2)

In [201]:
def psi(x):
    X = x
    length = len(X.columns)
    for i in range(len(X.columns)):
        x[f'{X.columns[i]}_squared'] = X[X.columns[i]]**2
        for j in range(i+1,length):
            x[f'{X.columns[i]}_{X.columns[j]}'] = X[X.columns[i]]*X[X.columns[j]]
    return(X)

# 3)

In [202]:
X = psi(x)
X.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x[f'{X.columns[i]}_squared'] = X[X.columns[i]]**2
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x[f'{X.columns[i]}_{X.columns[j]}'] = X[X.columns[i]]*X[X.columns[j]]


Unnamed: 0,AST,REB,STL,AST_squared,AST_REB,AST_STL,REB_squared,REB_STL,STL_squared
0,41,43,14,1681,1763,574,1849,602,196
1,23,43,8,529,989,184,1849,344,64
2,20,39,7,400,780,140,1521,273,49
3,19,47,6,361,893,114,2209,282,36
4,21,43,4,441,903,84,1849,172,16


# 4)

In [203]:
def p2(x,params):
    y_pred = np.zeros(x.shape[0])
    for i in range(len(x.columns)):
        y_pred = y_pred + x[x.columns[i]]*params[i]
    return(y_pred)

In [226]:
w = [1,2,3,4,5,6,7,8,9,10]
y_res = p2(X_test,w)
y_res

0       45439.0
1       28603.0
2       22961.0
3       28735.0
4       24899.0
         ...   
7375    23483.0
7376    29053.0
7377    37360.0
7378    29005.0
7379    23824.0
Length: 7380, dtype: float64

# 5)

In [227]:
def Loss(X,y,w):
    y_res = p2(X,w)
    err = (y - y_res)**2
    SSL = sum(err)
    return(SSL)

In [228]:
Loss(X_test,y,w)

6396362382681.0

# 6)

In [229]:
alpha = 0.001 #learning rate
iterations = 100 #No. of iterations
m = y.size #No. of data points
np.random.seed(123) #Set the seed
theta = np.random.rand(10) #Pick some random values to start with - we a

In [225]:
def gradient_descent(x, y, theta, iterations, alpha):

    past_costs = []
    past_thetas = [theta]

    for i in range(iterations):
        prediction = p2(x, theta)
        error = prediction - y       
        cost = Loss(x,y,w)
        past_costs.append(cost)
        
        print('iteration:',i)
        print(theta)
        print(x.shape,error.shape,theta.shape, '\n')
        
        # GRADIENT DESCENT
        theta = theta + (2 * alpha * np.dot(x.T, error))
        past_thetas.append(theta)
        
    return past_thetas, past_costs

past_thetas, past_costs = gradient_descent(X_test, y, theta, iterations, alpha)
theta = past_thetas[-1]

IndexError: list index out of range

In [222]:
X_test.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1.0,41.0,43.0,14.0,1681.0,1763.0,574.0,1849.0,602.0,196.0
1,1.0,23.0,43.0,8.0,529.0,989.0,184.0,1849.0,344.0,64.0
2,1.0,20.0,39.0,7.0,400.0,780.0,140.0,1521.0,273.0,49.0
3,1.0,19.0,47.0,6.0,361.0,893.0,114.0,2209.0,282.0,36.0
4,1.0,21.0,43.0,4.0,441.0,903.0,84.0,1849.0,172.0,16.0


In [185]:
theta - np.dot(X.T, err)

array([-1.09385410e+14, -2.16805696e+14, -3.83696546e+13, -2.65820605e+15,
       -5.02525332e+15, -8.98392065e+14, -1.01694964e+16, -1.74974595e+15,
       -3.55937559e+14])

In [167]:
X.shape

(7380, 9)

In [168]:
y.shape

(7380,)

In [169]:
theta.shape

(9,)

In [171]:
m

7380

In [208]:
X

Unnamed: 0,AST,REB,STL,AST_squared,AST_REB,AST_STL,REB_squared,REB_STL,STL_squared
0,41,43,14,1681,1763,574,1849,602,196
1,23,43,8,529,989,184,1849,344,64
2,20,39,7,400,780,140,1521,273,49
3,19,47,6,361,893,114,2209,282,36
4,21,43,4,441,903,84,1849,172,16
...,...,...,...,...,...,...,...,...,...
7375,17,39,10,289,663,170,1521,390,100
7376,26,40,10,676,1040,260,1600,400,100
7377,23,52,8,529,1196,184,2704,416,64
7378,23,41,11,529,943,253,1681,451,121


In [221]:
poly = PolynomialFeatures(2)
X_test = pd.DataFrame(poly.fit_transform(x))

In [None]:
x