# Lab Three - Extending Logistic Regression

In this lab, you will compare the performance of logistic regression optimization programmed in scikit-learn and via your own implementation. You will also modify the optimization procedure for logistic regression. 

This report is worth 10% of the final grade. Please upload a report (<b>one per team</b>) with all code used, visualizations, and text in a rendered Jupyter notebook. Any visualizations that cannot be embedded in the notebook, please provide screenshots of the output. The results should be reproducible using your report. Please carefully describe every assumption and every step in your report.

<b>Dataset Selection</b>

Select a dataset identically to the way you selected for the lab one (i.e., table data). You are not required to use the same dataset that you used in the past, but you are encouraged. You must identify a classification task from the dataset that contains <b>three or more classes to predict</b>. That is it cannot be a binary classification; it must be multi-class prediction. 

## Preparation and Overview (3pt)

<ul>
    <li>[<b>2 points</b>] Explain the task and what business-case or use-case it is designed to solve (or designed to investigate). Detail exactly what the classification task is and what parties would be interested in the results. For example, would the model be deployed or used mostly for offline analysis? </li>
    <li>[<b>.5 points</b>] (<i>mostly the same processes as from previous labs</i>) Define and prepare your class variables. Use proper variable representations (int, float, one-hot, etc.). Use pre-processing methods (as needed) for dimensionality reduction, scaling, etc. Remove variables that are not needed/useful for the analysis. Describe the final dataset that is used for classification/regression (include a description of any newly formed variables you created). </li>
    <li>[<b>.5 points</b>] Divide you data into training and testing data using an 80% training and 20% testing split. Use the cross validation modules that are part of scikit-learn. <b>Argue "for" or "against" splitting your data using an 80/20 split. That is, why is the 80/20 split appropriate (or not) for your dataset?</b></li>
</ul>

### Use Case

Our task will be looking at a patients information and determining whether they are likely to have a stroke, heart disease, or hypertension. The use-case for this classifier would be to flag at-risk patients and enable some kind of response to be made to prevent serious medical emergencies that these conditions might cause or prevent the conditions in the first place.

For example, if a person were to be flagged as very likely to have a stroke, the doctor could contact the patient in an attempt to prevent the stroke by prescribing them medication or alerting the patient's family to monitor them in case they were to have a stroke. Similar actions could be taken for hypertension and heart disease.

Alernatively, some kind of application could be made to allow people to enter their information and determine how at risk they might be for these conditions, giving people more clear information about their health and the issues that are likely to affect them.

### Data Preparation

In [1]:
# Importing packages and reading in dataset
import numpy as np
import pandas as pd

print('Pandas:', pd.__version__)
print('Numpy:',  np.__version__)

raw_data = pd.read_csv('healthcare-dataset-stroke-data.csv')
raw_data.head()

Pandas: 1.1.3
Numpy: 1.19.2


Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,work_type,residence_type,avg_glucose_level,bmi,smoking_status,stroke
0,9046,Male,67.0,0,1,Yes,Private,Urban,228.69,36.6,formerly smoked,1
1,51676,Female,61.0,0,0,Yes,Self-employed,Rural,202.21,,never smoked,1
2,31112,Male,80.0,0,1,Yes,Private,Rural,105.92,32.5,never smoked,1
3,60182,Female,49.0,0,0,Yes,Private,Urban,171.23,34.4,smokes,1
4,1665,Female,79.0,1,0,Yes,Self-employed,Rural,174.12,24.0,never smoked,1


In [2]:
# Dropping categorical column 'work_type'; not very useful and
# doesn't translate nicely into ordinal numbers
df = raw_data.drop('work_type', axis = 1)

# Dropping 1 observation of person with gender 'Other' to simplify
# using the gender column to calculate, impute, or visualize
df.drop(df[df.gender == 'Other'].index, inplace=True)

# Making values' format consistent
for c in df.columns:
    if df[c].dtype == 'object':
        df[c] = df[c].str.lower()

# Adding numbers to smoking_status values to order them properly
# when they will get passed through the SKLearn LabelEncoder
df.smoking_status.replace(to_replace= ['never smoked', 'formerly smoked', 'smokes', 'Unknown'],
                          value     = ['0_never_smoked', '1_formerly_smoked', '2_smokes', '3_Unknown'],
                          inplace=True)

In [3]:
from sklearn.preprocessing import LabelEncoder

# Encoding all of the non-numeric columns
le = {}

for col in df.columns:
    if df[col].dtype == 'object':
        le[col] = LabelEncoder()
        df[col] = le[col].fit_transform(df[col])

# Call le[col].inverse_transform(df[col]) for any column name
# to convert numbers back to their labels

# Converting all 'Unknown' values in smoking status to NaN so
# that we can impute the missing values.
df.smoking_status.mask(df.smoking_status == 3, np.nan, inplace=True)
               
df.head()

Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,residence_type,avg_glucose_level,bmi,smoking_status,stroke
0,9046,1,67.0,0,1,1,1,228.69,36.6,1.0,1
1,51676,0,61.0,0,0,1,0,202.21,,0.0,1
2,31112,1,80.0,0,1,1,0,105.92,32.5,0.0,1
3,60182,0,49.0,0,0,1,1,171.23,34.4,2.0,1
4,1665,0,79.0,1,0,1,0,174.12,24.0,0.0,1


In [4]:
# Imputing missing values
from sklearn.impute import KNNImputer
import copy

knn = KNNImputer(n_neighbors=3)

# Imputing on all columns except id
columns = list(df.columns)
columns.remove('id')

df_imputed = copy.deepcopy(df)
df_imputed[columns] = knn.fit_transform(df[columns])

# Rounding imputed values to be compatible with LabelEncoder
# for smoking_status and to match the format of other values
# for bmi
df_imputed.smoking_status = df_imputed.smoking_status.apply(lambda x: round(x, 0))
df_imputed.bmi = df_imputed.bmi.apply(lambda x: round(x, 1))

In [5]:
# Using df_imputed as the primary dataset
df = df_imputed

# Changing columns modified by KNN Imputer back to integers from floats
columns = [
    'gender',
    'hypertension',
    'heart_disease',
    'ever_married',
    'residence_type',
    'smoking_status',
    'stroke'
]

for col in columns:
    df[col] = df[col].astype(int)

To prep this dataset, one attribute was removed due to it being relatively unimportant and not encoding nicely into an ordinal set of integers. All categorical variables were converted to numeric data using SKLearn's LabelEncoder class. Missing values for bmi and smoking_status were imputed using KNN Imputer. One record was dropped for being the only entry with gender 'Other'. Removing this record will make visualizing the gender data simpler and will have little impact on the training, as having an outlier like that might cause other attributes to be slightly undervalued in comparison.

Here is a table of the LabelEncoder encoded variables.

| value | gender | ever_married | residence_type | smoking_status    |
|-------|--------|--------------|----------------|-------------------|
| 0     | female | no           | rural          | 0_never_smoked    |
| 1     | male   | yes          | urban          | 1_formerly_smoked |
| 2     |   -    |      -       |       -        | 2_smokes          |


In [6]:
df.head()

Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,residence_type,avg_glucose_level,bmi,smoking_status,stroke
0,9046,1,67.0,0,1,1,1,228.69,36.6,1,1
1,51676,0,61.0,0,0,1,0,202.21,30.9,0,1
2,31112,1,80.0,0,1,1,0,105.92,32.5,0,1
3,60182,0,49.0,0,0,1,1,171.23,34.4,2,1
4,1665,0,79.0,1,0,1,0,174.12,24.0,0,1


### Dataset Division

In [25]:
columns = list(df.columns)
columns.remove('id')
targets = ['stroke', 'heart_disease', 'hypertension']

for col in targets:
    columns.remove(col)



#splitting into train and test data
from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=.20, random_state=42)

X_test = test[columns]
X_train = train[columns]
y_test = []
y_train = []
for col in targets:
    y_test.append(test[col])
    y_train.append(train[col])
    
# Pass y arrays in to classifier and then iterate over array use binary classifier
# for each take highest percentage that is the class that we get for that one

['gender', 'age', 'ever_married', 'residence_type', 'avg_glucose_level', 'bmi', 'smoking_status']
['stroke', 'heart_disease', 'hypertension']
      gender    age  ever_married  residence_type  avg_glucose_level   bmi  \
4688       1  31.00             0               0              64.85  23.0   
4478       1  40.00             1               0              65.29  28.3   
3521       1  52.00             1               0             111.04  30.0   
4355       0  79.00             1               0              76.64  19.5   
3826       0  75.00             1               0              94.77  27.2   
...      ...    ...           ...             ...                ...   ...   
3605       1   1.88             0               0             143.97  16.9   
3510       0  15.00             0               1             190.89  22.0   
4754       1  52.00             1               0              67.92  31.1   
4105       0  56.00             0               1             128.63  24.9   


## Modeling (5pt)

<ul>
    <li>The implementation of logistic regression must be written only from the examples given to you by the instructor. No credit will be assigned to teams that copy implementations from another source, regardless of if the code is properly cited.</li>
    <li>[<b>2 points</b>] Create a custom, one-versus-all logistic regression classifier using numpy and scipy to optimize. Use object oriented conventions identical to scikit-learn. You should start with the template developed by the instructor in the course. You should add the following functionality to the logistic regression classifier:
    <ul>
        <li>Ability to choose optimization technique when class is instantiated: either steepest descent, stochastic gradient descent, or Newton's method. </li>
        <li>Update the gradient calculation to include a customizable regularization term (either using no regularization, L1 regularization, L2 regularization, or both L1 and L2 regularization). Associate a cost with the regularization term, "C", that can be adjusted when the class is instantiated.  </li>
    </ul>
    </li>
    <li>[<b>1.5 points</b>] Train your classifier to achieve good generalization performance. That is, adjust the <b>optimization technique</b> and the value of the <b>regularization term "C"</b> to achieve the best performance on your test set. Visualize the performance of the classifier versus the parameters you investigated. Is your method of selecting parameters justified? That is, do you think there is any "data snooping" involved with this method of selecting parameters?</li>
    <li>[<b>1.5 points</b>] Compare the performance of your "best" logistic regression optimization procedure to the procedure used in scikit-learn. Visualize the performance differences in terms of training time and classification performance. <b>Discuss the results</b>. </li>
</ul>

In [8]:
class BinaryLogisticRegressionBase:
    # private:
    def __init__(self, optimization='bgd', eta = 0.01, iterations=20, regularization='ridge', c=0):
        self.eta = eta
        self.iters = iterations
        self.opt = optimization
        self.reg = regularization
        self.c = c
        # internally we will store the weights as self.w_ to keep with sklearn conventions
    
    def __str__(self):
        return 'Base Binary Logistic Regression Object, Not Trainable'
    
    # convenience, private and static:
    @staticmethod
    def _sigmoid(theta):
        return 1/(1+np.exp(-theta)) 
    
    @staticmethod
    def _add_bias(X):
        return np.hstack((np.ones((X.shape[0],1)),X)) # add bias term
    
    # public:
    def predict_proba(self,X,add_bias=True):
        # add bias term if requested
        Xb = self._add_bias(X) if add_bias else X
        return self._sigmoid(Xb @ self.w_) # return the probability y=1
    
    def predict(self,X):
        return (self.predict_proba(X)>0.5) #return the actual prediction
    

In [9]:
from scipy.special import expit
from numpy.linalg import pinv

class BinaryLogisticRegression(BinaryLogisticRegressionBase):
    #private:
    def __str__(self):
        if(hasattr(self,'w_')):
            return 'Binary Logistic Regression Object with coefficients:\n'+ str(self.w_) # is we have trained the object
        else:
            return 'Untrained Binary Logistic Regression Object'
        
    #optimization methods
    def _get_gradient(self, X, y):
        
        gradient = None
        if self.opt == 'bgd': gradient = self.steepest_descent
        elif self.opt == 'sgd': gradient = self.stochastic_gradient_descent
        elif self.opt == 'newton': gradient = self.newton
        elif self.opt == 'hessian': gradient = self.hessian
            
        return gradient(X,y)
    
    def steepest_descent(self,X,y):
        ydiff = y-self.predict_proba(X,add_bias=False).ravel() # get y difference
        gradient = np.mean(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
        gradient = gradient.reshape(self.w_.shape)
        gradient[1:] += self.c * self._get_reg_gradient()
        
        return gradient
    
    def stochastic_gradient_descent(self,X,y):
       # idx = int(np.random.rand()*len(y)) # grab random instance\
        idx = np.random.randint(len(y))
        ydiff = y[idx]-self.predict_proba(X[idx],add_bias=False) # get y difference (now scalar)
        gradient = X[idx] * ydiff[:,np.newaxis] # make ydiff a column vector and multiply through
        
        gradient = gradient.reshape(self.w_.shape)
        gradient[1:] += self.c * self._get_reg_gradient()
        
        return gradient
    
    def hessian(self, X, y):
        g = self.predict_proba(X,add_bias=False).ravel() # get sigmoid value for all classes
        hessian = X.T @ np.diag(g*(1-g)) @ X - 2 * self.c  # calculate the hessian

        ydiff = y-g # get y difference
        gradient = np.sum(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
        gradient = gradient.reshape(self.w_.shape)
        gradient[1:] +=  self._get_reg_gradient()
        
        return pinv(hessian) @ gradient
    
    def newton(self,X,y):
        sigmoid_z = (sigma1*X + sigma2).astype("float_")
        sigmoid = 1.0/(1.0 + np.exp(-z))
        return np.sum(y * np.log(sigmoid) + (1 - y) * np.log(1 - sigmoid))
    
    @staticmethod
    def _sigmoid(theta):
        # increase stability, redefine sigmoid operation
        return expit(theta) #1/(1+np.exp(-theta))
    
    #regularization methods
    def _get_reg_gradient(self):
        if self.reg == 'ridge':
            return -2 * self.w_[1:]
        elif self.reg == 'lasso':
            return np.sign(self.w_[1:])
        elif self.reg == 'elastic_net':
            return -2 * self.w_[1:] + np.sign(self.w_[1:])
    
    # public:
    def fit(self, X, y):
        Xb = self._add_bias(X) # add bias term
        num_samples, num_features = Xb.shape
        
        self.w_ = np.zeros((num_features,1)) # init weight vector to zeros
        
        # for as many as the max iterations
        for _ in range(self.iters):
            gradient = self._get_gradient(Xb,y)
            self.w_ += gradient*self.eta # multiply by learning rate 

In [10]:
# class StochasticLogisticRegression(BinaryLogisticRegression):
#     # stochastic gradient calculation 
#     def _get_gradient(self,X,y):
#         idx = int(np.random.rand()*len(y)) # grab random instance
#         ydiff = y[idx]-self.predict_proba(X[idx],add_bias=False) # get y difference (now scalar)
#         gradient = X[idx] * ydiff[:,np.newaxis] # make ydiff a column vector and multiply through
        
#         gradient = gradient.reshape(self.w_.shape)
#         gradient[1:] += -2 * self.w_[1:] * self.C
        
#         return gradient

In [11]:
# from numpy.linalg import pinv
# class HessianBinaryLogisticRegression(BinaryLogisticRegression):
#     # just overwrite gradient function
#     def _get_gradient(self,X,y):
#         g = self.predict_proba(X,add_bias=False).ravel() # get sigmoid value for all classes
#         hessian = X.T @ np.diag(g*(1-g)) @ X - 2 * self.C # calculate the hessian

#         ydiff = y-g # get y difference
#         gradient = np.sum(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
#         gradient = gradient.reshape(self.w_.shape)
#         gradient[1:] += -2 * self.w_[1:] * self.C
        
#         return pinv(hessian) @ gradient

In [12]:
# from scipy.special import expit
# class VectorBinaryLogisticRegression(BinaryLogisticRegression):
#     # inherit from our previous class to get same functionality
#     @staticmethod
#     def _sigmoid(theta):
#         # increase stability, redefine sigmoid operation
#         return expit(theta) #1/(1+np.exp(-theta))
    
#     # but overwrite the gradient calculation
#     def _get_gradient(self,X,y):
#         ydiff = y-self.predict_proba(X,add_bias=False).ravel() # get y difference
#         gradient = np.mean(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
        
#         return gradient.reshape(self.w_.shape)

# Logisitic Regression Class

In [13]:
class LogisticRegression:
    
    def __init__(self, optimization, eta, iterations, regularization, c=0):
    
        self.eta = eta
        self.iters = iterations
        self.opt = optimization
        self.reg = regularization
        self.encodings = {}
        self.c = c
        
    
    def __str__(self):
        if(hasattr(self,'w_')):
            return 'MultiClass Logistic Regression Object with coefficients:\n'+ str(self.w_) # is we have trained the object
        else:
            return 'Untrained MultiClass Logistic Regression Object'
    
    def fit(self,X,y):
        num_samples, num_features = X.shape
        self.unique_ = np.unique(y) # get each unique class value
        num_unique_classes = len(self.unique_)
        self.classifiers_ = [] # will fill this array with binary classifiers
        
        for i,yval in enumerate(self.unique_): # for each unique value
            self.encodings[yval] = i
            y_binary = (y==yval) # create a binary problem
            # train the binary classifier for this class
            blr = BinaryLogisticRegression(self.opt, self.eta, self.iters, self.reg, self.c )
            blr.fit(X,y_binary)
            # add the trained classifier to the list
            self.classifiers_.append(blr)
            
        # save all the weights into one matrix, separate column for each class
        self.w_ = np.hstack([x.w_ for x in self.classifiers_]).T
        
    def predict_proba(self,X):
        probs = []
        for blr in self.classifiers_:
            probs.append(blr.predict_proba(X)) # get probability for each classifier
        
        return np.hstack(probs) # make into single matrix
    
    def predict(self,X):
        return np.argmax(self.predict_proba(X),axis=1) # take argmax along row
    
    
lr = LogisticRegression('bgd',0.01, 100, 'ridge')
print(lr)

Untrained MultiClass Logistic Regression Object


In [14]:
#testing on iris dataset to make sure Logistic Regression function works

# from sklearn.metrics import accuracy_score
# from sklearn.datasets import load_iris
# ds = load_iris()
# X = ds.data
# y = ds.target

# lr = LogisticRegression(optimization='bgd',eta=0.1, iterations=500, c=.01)
# lr.fit(X,y)
# yprobs = lr.predict_proba(X)

# yhat = lr.predict(X)
# print("YHat", yhat)
# print('Accuracy of: ',accuracy_score(y,yhat))

In [15]:
#evaluate on train dataset
from sklearn.metrics import accuracy_score
lr = LogisticRegression(optimization='bgd',eta=0.1, regularization='ridge', iterations=1)
lr.fit(X_train, y_train)
print(lr)
yhat = lr.predict(X_train)
print("Accuracy of Training Dataset (10 iterations): ", accuracy_score(y_train,yhat))

MultiClass Logistic Regression Object with coefficients:
[[ 4.54245168e-02  1.86322486e-02  1.84387766e+00  3.49889895e-03
   1.66381209e-03  2.86885246e-02  2.29508197e-02  4.70366198e+00
   1.30780157e+00  2.85539516e-02]
 [-4.54245168e-02 -1.86322486e-02 -1.84387766e+00 -3.49889895e-03
  -1.66381209e-03 -2.86885246e-02 -2.29508197e-02 -4.70366198e+00
  -1.30780157e+00 -2.85539516e-02]]
Accuracy of Training Dataset (10 iterations):  0.9542451676046


In [16]:
#can we do better with more iterations?
lr = LogisticRegression(optimization='bgd',eta=0.1, regularization='ridge', iterations=499)
lr.fit(X_train, y_train)
print(lr)
yhat = lr.predict(X_train)
print("Accuracy of Training Dataset (499 iterations): ", accuracy_score(y_train,yhat))

MultiClass Logistic Regression Object with coefficients:
[[ 0.31498099  0.06465734 -0.77851769 -0.19876363 -0.17338041  0.23444653
   0.10706758  0.98694271  3.06929819  0.10110441]
 [-0.32648982 -0.06660175 -0.27173484  0.1965613   0.17305884 -0.25748951
  -0.11225208 -2.99666597 -3.58685321 -0.11747852]]
Accuracy of Training Dataset (499 iterations):  0.9542451676046


In [17]:
#evaluate on test data set
lr = LogisticRegression(optimization='bgd',eta=0.1, regularization='ridge', iterations=500)
lr.fit(X_test, y_test)
print(lr)
yhat = lr.predict(X_test)
print("Accuracy of Testing Dataset (500 iterations): ", accuracy_score(y_test,yhat))

MultiClass Logistic Regression Object with coefficients:
[[ 0.40844723  0.02715669 -1.63419732 -0.35373189 -0.16825717  0.08282651
   0.04248065 -0.3387838   4.21334559  0.02586569]
 [-0.43752507 -0.04083793 -0.31464324  0.35090617  0.16876157 -0.12035843
  -0.06059902 -3.74350423 -5.3267     -0.05171114]]
Accuracy of Testing Dataset (500 iterations):  0.9393346379647749


In [18]:
eVals=[]
start_e=0.000001
for x in range(0,6):
    eVals.append(start_e)
    start_e*=10
results=[]

for e in eVals:
    print(e)

    lr = LogisticRegression(optimization='bgd',eta=e, regularization='ridge', iterations=1)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"bgd","ridge"))

    lr = LogisticRegression(optimization='bgd',eta=e, regularization='lasso', iterations=1)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"bgd","lasso"))
    
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='elastic_net', iterations=1)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"bgd","elastic_net"))
    
    
#     lr = LogisticRegression(optimization='bgd',eta=e, regularization='elastic_net', iterations=300)
#     lr.fit(X_train, y_train)
#     pred = lr.predict(X_train)
#     encode = lambda x: lr.encodings[x]
#     y_train_encode = np.array(list(map(encode, y_train)))
#     train_mse = accuracy_score(y_train_encode, pred)
#     print("Training MSE: {}, Eta: {}, optimization: {}, regularization: {}".format(train_mse,e,"bgd","elastic_net"))
#     results.append([train_mse,e,"bgd","elastic_net"])

1e-06
Training Accuracy: 0.9542451676046, Eta: 1e-06, optimization: bgd, regularization: ridge
Training Accuracy: 0.9542451676046, Eta: 1e-06, optimization: bgd, regularization: lasso
Training Accuracy: 0.9542451676046, Eta: 1e-06, optimization: bgd, regularization: elastic_net
9.999999999999999e-06
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-06, optimization: bgd, regularization: ridge
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-06, optimization: bgd, regularization: lasso
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-06, optimization: bgd, regularization: elastic_net
9.999999999999999e-05
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-05, optimization: bgd, regularization: ridge
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-05, optimization: bgd, regularization: lasso
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-05, optimization: bgd, regularization: elastic_net
0.001
Training Accuracy: 0.95424516

In [19]:
resultsLists_sgd=[]
for e in eVals:
    print(e)
    lr = LogisticRegression(optimization='sgd',eta=e, regularization='ridge', iterations=300)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"sgd","ridge"))

    lr = LogisticRegression(optimization='sgd',eta=e, regularization='lasso', iterations=300)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"sgd","lasso"))
    
    lr = LogisticRegression(optimization='sgd',eta=e, regularization='elastic_net', iterations=300)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"sgd","elastic_net"))
    print("\n")

1e-06
Training Accuracy: 0.9542451676046, Eta: 1e-06, optimization: sgd, regularization: ridge
Training Accuracy: 0.9542451676046, Eta: 1e-06, optimization: sgd, regularization: lasso
Training Accuracy: 0.9542451676046, Eta: 1e-06, optimization: sgd, regularization: elastic_net


9.999999999999999e-06
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-06, optimization: sgd, regularization: ridge
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-06, optimization: sgd, regularization: lasso
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-06, optimization: sgd, regularization: elastic_net


9.999999999999999e-05
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-05, optimization: sgd, regularization: ridge
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-05, optimization: sgd, regularization: lasso
Training Accuracy: 0.9542451676046, Eta: 9.999999999999999e-05, optimization: sgd, regularization: elastic_net


0.001
Training Accuracy: 0.94

In [20]:
resultsLists_newton=[]
for e in eVals:
    print(e)
    lr = LogisticRegression(optimization='newton',eta=e, regularization='ridge', iterations=300)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"newton","ridge"))

    lr = LogisticRegression(optimization='newton',eta=e, regularization='lasso', iterations=300)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"newton","lasso"))
    
    lr = LogisticRegression(optimization='newton',eta=e, regularization='elastic_net', iterations=300)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training Accuracy: {}, Eta: {}, optimization: {}, regularization: {}".format(accuracy_score(y_train,yhat),e,"newton","elastic_net"))
    print("\n")

1e-06


NameError: name 'sigma1' is not defined

In [None]:
from sklearn.metrics import accuracy_score

eVals = [0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1]
optimizations = ["bgd", "sgd", "hessian"]
results = []

for e in eVals:
    print("Eta: ", e, ", Optimization: ", optimizations[0])
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='ridge', iterations=200)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training: {}, regularization: {}".format(accuracy_score(y_train,yhat), "ridge"))
    
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='lasso', iterations=200)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training: {}, regularization: {}".format(accuracy_score(y_train,yhat), "lasso"))
    
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='elastic_net', iterations=200)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    print("Training: {}, regularization: {}".format(accuracy_score(y_train,yhat), "elastic_net"))
    print("\n")
    

In [None]:
eVals = [0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1]
optimizations = ["bgd", "sgd", "hessian"]
regularizations = ["ridge", "lasso", "elastic_net"]
results = []

for e in eVals:
    print("Eta: ", e, ", Optimization: ", optimizations[0])
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='ridge', iterations=499)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train = accuracy_score(y_train_encode, yhat)
    print("Training: {}, regularization: {}".format(train, "ridge"))
    results.append([train,e,"bgd","ridge"])
    
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='lasso', iterations=499)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train = accuracy_score(y_train_encode, pred)
    print("Training: {}, regularization: {}".format(train,"lasso"))
    results.append([train,e,"bgd","lasso"])
    
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='elastic_net', iterations=499)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train = accuracy_score(y_train_encode, pred)
    print("Training: {}, regularization: {}".format(train,"elastic_net"))
    results.append([train,e,"bgd","elastic_net"])
    print("\n")
    
    print("Eta: ", e, ", Optimization: ", optimizations[1])
    lr = LogisticRegression(optimization='sgd',eta=e, regularization='ridge', iterations=499)
    lr.fit(X_train, y_train)
    yhat = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train = accuracy_score(y_train_encode, yhat)
    print("Training: {}, regularization: {}".format(train, "ridge"))
    results.append([train,e,"bgd","ridge"])
    
    lr = LogisticRegression(optimization='sgd',eta=e, regularization='lasso', iterations=499)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train = accuracy_score(y_train_encode, pred)
    print("Training: {}, regularization: {}".format(train,"lasso"))
    results.append([train,e,"bgd","lasso"])
    
    lr = LogisticRegression(optimization='sgd',eta=e, regularization='elastic_net', iterations=499)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train = accuracy_score(y_train_encode, pred)
    print("Training: {}, regularization: {}".format(train,"elastic_net"))
    results.append([train,e,"bgd","elastic_net"])
    print("\n")

## Deployment (1pt)

<ul>
    <li>Which implementation of logistic regression would you advise be used in a deployed machine learning model, your implementation or scikit-learn (or other third party)? Why?</li>
</ul>

## Exceptional Work (1pt)

<ul>
    <li>You have free reign to provide additional analyses. <b>One idea</b>: Update the code to use either "one-versus-all" or "one-versus-one" extensions of binary to multi-class classification. </li>
    <li><b>Required for 7000 level students</b>: Choose ONE of the following:
    <ul>
        <li><b>Option One</b>: Implement an optimization technique for logistic regression using <b>mean square error</b> as your objective function (instead of binary cross entropy). Derive the gradient updates for the Hessian and use Newton's method to update the values of "w". Then answer, is this process better than using binary cross entropy? </li>
        <li><b>Option Two</b>: Implement the BFGS algorithm from scratch to optimize logistic regression. That is, use BFGS without the use of an external package (for example, do not use SciPy). Compare your performance accuracy and runtime to the BFGS implementation in SciPy (that we used in lecture). </li>
    </ul>
    </li>
</ul>

In [None]:
eVals=[]
start_e=0.000001
for x in range(0,6):
    eVals.append(start_e)
    start_e*=10
results=[]

for e in eVals:
    print(e)
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='ridge', iterations=300)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train_mse = accuracy_score(y_train_encode, pred)
    print("Training MSE: {}, Eta: {}, optimization: {}, regularization: {}".format(train_mse,e,"bgd","ridge"))
    results.append([train_mse,e,"bgd","ridge"])
    
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='lasso', iterations=300)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train_mse = accuracy_score(y_train_encode, pred)
    print("Training MSE: {}, Eta: {}, optimization: {}, regularization: {}".format(train_mse,e,"bgd","lasso"))
    results.append([train_mse,e,"bgd","lasso"])
    
    lr = LogisticRegression(optimization='bgd',eta=e, regularization='elastic_net', iterations=300)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_train)
    encode = lambda x: lr.encodings[x]
    y_train_encode = np.array(list(map(encode, y_train)))
    train_mse = accuracy_score(y_train_encode, pred)
    print("Training MSE: {}, Eta: {}, optimization: {}, regularization: {}".format(train_mse,e,"bgd","elastic_net"))
    results.append([train_mse,e,"bgd","elastic_net"])