# Part 1: Polynomial Regression

### A) Use the Auto dataset, find the test $R^2$ score of a linear regression model that predicts the miles per gallon (mpg) from the horsepower.

### B) Use polynomial regression to include both the horsepower feature and $(horsepower)^2$ in the regression model. Find the $R^2$ metric. 

Hint: You can use [numpy.concatenate](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.concatenate.html). For example to add to an array U a column vector $W^2$, we can use X=np.concatenate((U,W**2),axis=1)

In [1]:
from pandas import read_csv
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
AutoData = read_csv('Auto_modify.csv') # read the data
X = AutoData[['horsepower']]
Y = AutoData['mpg']
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,random_state = 0)
linreg = LinearRegression().fit(X_train,Y_train)
Y_predict = linreg.predict(X_test)
R_2 = r2_score(Y_test,Y_predict)
print('R square value is',R_2)
print('The beta0 is',linreg.intercept_)
print('The beta1 is',np.array(linreg.coef_[0]))

R square value is 0.6217658811398383
The beta0 is 39.998611679300524
The beta1 is -0.159527734327999


In [2]:
AutoData['horsepower^2'] = pow(AutoData['horsepower'],2)
X1 = AutoData[['horsepower','horsepower^2']]
X_train1,X_test1,Y_train1,Y_test1 = train_test_split(X1,Y,random_state = 0)
linreg1 = LinearRegression().fit(X_train1,Y_train1)
R_2 = r2_score(Y_test,linreg1.predict(X_test1))
print('R square value is',R_2)
print('The beta0 is',linreg.intercept_)
print('The beta1 is',np.array(linreg1.coef_[0]))
print('The beta2 is',np.array(linreg1.coef_[1]))

R square value is 0.7271031504642005
The beta0 is 39.998611679300524
The beta1 is -0.4531023308533184
The beta2 is 0.0011800690487566512


### C)Use KNN regression to predict the miles per gallon(mpg) with K=7, and find $R^2$ metric in the following cases 

- One feature: Horsepower only

- Two features: horsepower and $(horsepower)^2$ 

Hint: 

    Create KNN regression object using neighbors.KNeighborsRegressor:

    knnRegression = neighbors.KNeighborsRegressor(n_neighbors=7)

    Use the .fit and .score methods as before



In [3]:
from sklearn import neighbors
knnRegression = neighbors.KNeighborsRegressor(n_neighbors = 7)
Knn_reg1 = knnRegression.fit(X_train,Y_train)
Y_predict = Knn_reg1.predict(X_test)
R_2 = r2_score(Y_test,Y_predict)
print('R square value is',R_2)

R square value is 0.6674777441714226


In [4]:
from sklearn import neighbors
knnRegression = neighbors.KNeighborsRegressor(n_neighbors = 7)
Knn_reg2 = knnRegression.fit(X_train1,Y_train1)
Y_predict = Knn_reg2.predict(X_test1)
R_2 = r2_score(Y_test,Y_predict)
print('R square value is',R_2)

R square value is 0.6701084048823853


#### COMMENT on your results on (E) and (F): which model performs better? How does performance change when adding the quadratic feature?

# Part 2: Regularization

### A) Use the Boston dataset, and use Ridge regression model with tuning parameter set to 100 (alpha =100). Find the $R^2$ score and number of non zero coefficients.

###  B) Use Lasso regression instead of Ridge regression, also set the tuning parameter to 100. Find the $R^2$ score and number of non zero coefficients.

### C) Change the tuning parameter of the Lasso model to a very low value (alpha =0.001). What is the $R^2$ score.



In [5]:
from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge 
from sklearn.linear_model import Lasso
import numpy as np

dataset = load_boston()
X = dataset.data
Y = dataset.target
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,random_state = 0)

In [9]:
#A
RidgeModel = Ridge(alpha = 100).fit(X_train,Y_train)
R2_1 = r2_score(Y_test,RidgeModel.predict(X_test))
print('R^2 for Ridge Regression w/ tuning parameter of 100:', R2_1)

R^2 for Ridge Regression w/ tuning parameter of 100: 0.5925358036157629


array([-0.10593087,  0.05388664, -0.06777195,  0.48192973, -0.16346727,
        1.98200812, -0.00433025, -1.15460301,  0.24416784, -0.01437486,
       -0.87957246,  0.00845672, -0.64846291])

In [11]:
#B
LassoModel1 = Lasso(alpha = 100).fit(X_train,Y_train)
R2_2 = r2_score(Y_test,LassoModel1.predict(X_test))
print('R^2 for Ridge Regression w/ tuning parameter of 100:', R2_2)

R^2 for Ridge Regression w/ tuning parameter of 100: 0.11866916175527809


array([-0.        ,  0.        , -0.        ,  0.        , -0.        ,
        0.        , -0.        ,  0.        , -0.        , -0.02291247,
       -0.        ,  0.00482211, -0.        ])

In [8]:
#C
LassoModel2 = Lasso(alpha = 0.001).fit(X_train,Y_train)
R2_3 = r2_score(Y_test,LassoModel2.predict(X_test))
print('R^2 for Ridge Regression w/ tuning parameter of 0.001:', R2_3)

R^2 for Ridge Regression w/ tuning parameter of 0.001: 0.6350353125168686


### D) Comment on your result. In this problem, do all feature seem important in making predictions?


All features are important in terms of making prediction. In (B) we could see R_2 value drops dramatically as we forced all parameters, beta's, to be zero as implementing a large tuning parameter. Once we chose a smaller tuning parameter in (C), all features are took in account and R_2 suddenly improved.