# Part 1: Regularization

A) Use the Boston dataset, and use Ridge regression model with tuning parameter set to 100 (alpha =100). Find the $R^2$ score and number of non zero coefficients.

B) Use Lasso regression instead of Ridge regression, also set the tuning parameter to 100. Find the $R^2$ score and number of non zero coefficients.

C) Change the tuning parameter of the Lasso model to a very low value (alpha =0.001). What is the $R^2$ score.

D) Comment on your result. In this problem, do all feature seem important in making predictions?


In [3]:
from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge 
from sklearn.linear_model import Lasso
import numpy as np

dataset = load_boston()
X=dataset.data
Y=dataset.target
X_train, X_test, Y_train, Y_test= train_test_split(X, Y, random_state= 0)


#A) Ridge regression, using tuning parameter of 100
RidgeModel100=Ridge(alpha=100).fit(X_train, Y_train) 
#find the R2 metric with the .score 
print("Score of Ridge Regression with tuning parameter =100 is: ", RidgeModel100.score(X_test,Y_test))
print("number of coef. that are equal to zero with Ridge regression", np.sum(RidgeModel100.coef_==0))


#B) Lasso regression, using tuning parameter of 100
LassoModel100=Lasso(alpha=100).fit(X_train, Y_train) 
print("Score of Lasso Regression with tuning parameter =100 is: ", LassoModel100.score(X_test,Y_test))
print("number of coef. that are equal to zero with Lasso regression when alpha =100 is: ", np.sum(LassoModel100.coef_==0))



#C) Lasso regression, using very small tuning parameter 
LassoModel001=Lasso(alpha=0.001).fit(X_train, Y_train) 
print("Score of Lasso Regression with tuning parameter =0.001 is: ", LassoModel001.score(X_test,Y_test))
print("number of coef. that are equal to zero with Lasso regression when alpha =0.001 is: ", np.sum(LassoModel001.coef_==0))



Score of Ridge Regression with tuning parameter =100 is:  0.592535803616
number of coef. that are equal to zero with Ridge regression 0
Score of Lasso Regression with tuning parameter =100 is:  0.118669161755
number of coef. that are equal to zero with Lasso regression when alpha =100 is:  11
Score of Lasso Regression with tuning parameter =0.001 is:  0.635035312517
number of coef. that are equal to zero with Lasso regression when alpha =0.001 is:  0


### Comment

- It is clear from the results above that with Ridge regression, non of the coefficients is zero. Using Lasso regression with the same value of the tuning paramter, 11 coefficients are equal to zero. 

- With tuning parameter set to 100, Ridge performs better than Lasso regression in this example. This implies that all features are important in predicting the response.

- With a low value for the tuning paramter, non of the coefficients are equal to zero with LAsso. It is expected that this result will be simiar to OLS with no regularization (you can check that in a straightforward manner)

# Part 2: Logistic Regression

In this exercise, you will use logistic regression to classify breast cancer as malignant or benign using the sklearn data set. Run the code below to print and read the description of the data set. Use logistic regression, with Lasso regularization (penelty =l1) and the default regularization parameter to build the classifier. What is the accuracy?


In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
import numpy as np

DataCancer=load_breast_cancer()
print(DataCancer.keys())
print(DataCancer.DESCR)

X_features=DataCancer.data
Y_targetClass=DataCancer.target

X_train, X_test, Y_train, Y_test= train_test_split(X_features, Y_targetClass, random_state= 0)




dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])
Breast Cancer Wisconsin (Diagnostic) Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Ra

In [5]:
FittedLogRegModelLasso= LogisticRegression(penalty="l1").fit(X_train,Y_train)
R2score_logRegLasso1=FittedLogRegModelLasso.score(X_test,Y_test)
print("The score of log reg with C=1, Lasso regression, is:", R2score_logRegLasso1)

The score of log reg with C=1, Lasso regression, is: 0.958041958042
