# Lab Assignment Four: Extending Logistic Regression 
## Rupal Sanghavi, Omar Roa

## Business Case

This dataset represents the responses from students and their friends(ages 15-30, henceforth stated as "young people") of a Statistics class from the Faculty of Social and Economic Sciences at The Comenius University in Bratislava, Slovakia. Their survey was a mix of various topics.

* Music preferences (19 items)
* Movie preferences (12 items)
* Hobbies & interests (32 items)
* Phobias (10 items)
* Health habits (3 items)
* Personality traits, views on life, & opinions (57 items)
* Spending habits (7 items)
* Demographics (10 items)

The dataset can be found here. https://www.kaggle.com/miroslavsabo/young-people-survey

Our target is to predict how likely a "young person" would be interested in shopping at a large shopping center. We were not given details about what a "large" shopping center, but searching online for malls led us to the Avion Shopping Park in Ružinov, Slovakia. It has an area of 103,000m<sup>2</sup> and is the largest shopping mall in Slovakia (https://www.avion.sk/sk-sk/about-the-centre/fakty-a-cisla). 

Slovakia is in the lower half of European nations by size (28/48 - https://en.wikipedia.org/wiki/List_of_European_countries_by_area) and is very mountainous, making real estate space a precious commodity. This information would be of great interest to any commercial devlopment firm deciding on where to build their next shopping center or place of business. This could also help other parties trying to purchase real estate for youth-orientated construction (parks, recreation centers).

In [171]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.simplefilter('ignore', DeprecationWarning)
%matplotlib inline 
%load_ext memory_profiler
from sklearn.metrics import accuracy_score
from scipy.special import expit
import time
import math
from memory_profiler import memory_usage

from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression as SKLogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

target_classifier = 'Shopping centres'
df = pd.read_csv('responses.csv', sep=",")

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler


## Define and Prepare Class Variables

In [172]:
# remove rows whose target classfier value is NaN
df_cleaned_classifier = df[np.isfinite(df[target_classifier])]
# change NaN number values to the mean
df_imputed = df_cleaned_classifier.fillna(df.mean())
# get categorical features
object_features = list(df_cleaned_classifier.select_dtypes(include=['object']).columns)
# one hot encode categorical features
one_hot_df = pd.concat([pd.get_dummies(df_imputed[col],prefix=col) for col in object_features], axis=1)
# drop object features from imputed dataframe
df_imputed_dropped = df_imputed.drop(object_features, 1)
frames = [df_imputed_dropped, one_hot_df]
# concatenate both frames by columns
df_fixed = pd.concat(frames, axis=1)

## Divide Data into Training and Testing

In [173]:
from sklearn.model_selection import ShuffleSplit

# we want to predict the X and y data as follows:
if target_classifier in df_fixed:
    y = df_fixed[target_classifier].values # get the labels we want
    del df_fixed[target_classifier] # get rid of the class label
    X = df_fixed.values # use everything else to predict!

num_cv_iterations = 3
num_instances = len(y)
cv_object = ShuffleSplit(n_splits=num_cv_iterations,test_size = 0.2)

print(cv_object)

ShuffleSplit(n_splits=3, random_state=None, test_size=0.2, train_size=None)


## Creating One-Versus-All Logistic Regression Classifier

In [174]:
%%time
# from last time, our logistic regression algorithm is given by (including everything we previously had):
class BinaryLogisticRegression:
    def __init__(self, eta, iterations=20, C=0.001,reg=0):
        self.eta = eta
        self.iters = iterations
        self.C = C
        self.iterations = 0
        self.reg = reg
        # internally we will store the weights as self.w_ to keep with sklearn conventions
        
    def __str__(self):
        if(hasattr(self,'w_')):
            return 'Binary Logistic Regression Object with coefficients:\n'+ str(self.w_) # is we have trained the object
        else:
            return 'Untrained Binary Logistic Regression Object'
        
    # convenience, private:
    @staticmethod
    def _add_bias(X):
        return np.hstack((np.ones((X.shape[0],1)),X)) # add bias term
    
    @staticmethod
    def _sigmoid(theta):
        # increase stability, redefine sigmoid operation
        return expit(theta) #1/(1+np.exp(-theta))
    
    # vectorized gradient calculation with regularization using L2 Norm
    def _get_gradient(self,X,y):
        ydiff = y-self.predict_proba(X,add_bias=False).ravel() # get y difference
        gradient = np.mean(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
        
        gradient = gradient.reshape(self.w_.shape)
        if(self.reg == 0):
            gradient[1:] += 2 * self.w_[1:] * self.C
        elif(self.reg == 1):
            gradient[1:] += np.sign(self.w_[1:]) * self.C
        else:
            gradient[1:] += 2 * self.w_[1:] * self.C
            gradient[1:] += np.sign(self.w_[1:]) * self.C
        #gradient[1:] += 2 * self.w_[1:] * self.C
        
        return gradient
    def _get_l1_gradient(self,X,y):
        ydiff = y-self.predict_proba(X,add_bias=False).ravel() # get y difference
        gradient = np.mean(X * ydiff[:,np.newaxis], axis=0) # make ydiff a column vector and multiply through
        
        gradient = gradient.reshape(self.w_.shape)
        gradient[1:] += np.sign(self.w_[1:]) * self.C
        return gradient
    # public:
    def predict_proba(self,X,add_bias=True):
        # add bias term if requested
        Xb = self._add_bias(X) if add_bias else X
        return self._sigmoid(Xb @ self.w_) # return the probability y=1
    
    def predict(self,X):
        return (self.predict_proba(X)>0.5) #return the actual prediction
    
    
    def fit(self, X, y):
        Xb = self._add_bias(X) # add bias term
        num_samples, num_features = Xb.shape
        
        self.w_ = np.zeros((num_features,1)) # init weight vector to zeros
        
        # for as many as the max iterations
        for _ in range(self.iters):
            gradient = self._get_gradient(Xb,y)
            self.w_ += gradient*self.eta # multiply by learning rate 

# blr = BinaryLogisticRegression(eta=0.1,iterations=500,C=0.001)

# blr.fit(X,y)
# print(blr)

# yhat = blr.predict(X)
# print('Accuracy of: ',accuracy_score(y,yhat+1))

CPU times: user 53 µs, sys: 15 µs, total: 68 µs
Wall time: 78 µs


In [175]:
%%time
# and we can update this to use a line search along the gradient like this:
from scipy.optimize import minimize_scalar
from scipy.optimize import OptimizeResult

import copy
class LineSearchLogisticRegression(BinaryLogisticRegression):
    
    # define custom line search for problem
    @staticmethod
    def line_search_function(eta,X,y,w,grad,C):
        wnew = w + grad*eta
        yhat = expit(X @ wnew)>0.5
        return np.sum((y-yhat)**2) + C*np.sum(wnew**2)
     
    def fit(self, X, y):
        Xb = self._add_bias(X) # add bias term
        num_samples, num_features = Xb.shape
        
        self.w_ = np.zeros((num_features,1)) # init weight vector to zeros
        
        # for as many as the max iterations
        for _ in range(self.iters):
            gradient = self._get_gradient(Xb,y)
            
            # do line search in gradient direction, using scipy function
            opts = {'maxiter':self.iters/20} # unclear exactly what this should be
            res = minimize_scalar(self.line_search_function, # objective function to optimize
                                  bounds=(self.eta/1000,self.eta*10), #bounds to optimize
                                  args=(Xb,y,self.w_,gradient,self.C), # additional argument for objective function
                                  method='bounded', # bounded optimization for speed
                                  options=opts) # set max iterations
            eta = res.x # get optimal learning rate
            self.w_ += gradient*eta # set new function values
                
      

CPU times: user 47 µs, sys: 11 µs, total: 58 µs
Wall time: 62 µs


In [176]:
%%time
class StochasticLogisticRegression(BinaryLogisticRegression):
    # stochastic gradient calculation 
    def _get_gradient(self,X,y):
        idx = int(np.random.rand()*len(y)) # grab random instance
        ydiff = y[idx]-self.predict_proba(X[idx],add_bias=False) # get y difference (now scalar)
        gradient = X[idx] * ydiff[:,np.newaxis] # make ydiff a column vector and multiply through
        
        gradient = gradient.reshape(self.w_.shape)
        if(self.reg == 0):
            gradient[1:] += 2 * self.w_[1:] * self.C
        elif(self.reg == 1):
            gradient[1:] += np.sign(self.w_[1:]) * self.C
        else:
            gradient[1:] += 2 * self.w_[1:] * self.C
            gradient[1:] += np.sign(self.w_[1:]) * self.C
        #gradient[1:] += 2 * self.w_[1:] * self.C
        #gradient[1:] += np.sign(self.w_[1:]) * self.C
        return gradient
    
    
# slr = StochasticLogisticRegression(0.1,1000, C=0.001) # take a lot more steps!!

# slr.fit(X,y)

# yhat = slr.predict(X)
# print(slr)
# print('Accuracy of: ',accuracy_score(y,yhat))      

CPU times: user 40 µs, sys: 16 µs, total: 56 µs
Wall time: 58.9 µs


In [177]:
%%time
# for this, we won't perform our own BFGS implementation 
# (it takes a good deal of code and understanding of the algorithm)
# luckily for us, scipy has its own BFGS implementation:
from scipy.optimize import fmin_bfgs
class BFGSBinaryLogisticRegression(BinaryLogisticRegression):
    @staticmethod
    def objective_function(w,X,y,C,reg):
        g = expit(X @ w)
        return -np.sum(np.log(g[y==1]))-np.sum(np.log(1-g[y==0])) + C*sum(w**2) #-np.sum(y*np.log(g)+(1-y)*np.log(1-g))

    @staticmethod
    def objective_gradient(w,X,y,C,reg):
        g = expit(X @ w)
        ydiff = y-g # get y difference
        gradient = np.mean(X * ydiff[:,np.newaxis], axis=0)
        gradient = gradient.reshape(w.shape)
        if(reg == 0):
            gradient[1:] += 2 * w[1:] * C
        elif(reg == 1):
            gradient[1:] += np.sign(w[1:]) * C
        else:
            gradient[1:] += 2 * w[1:] * C
            gradient[1:] += np.sign(w[1:]) * C
        return -gradient
    
    # just overwrite fit function
    def fit(self, X, y):
        Xb = self._add_bias(X) # add bias term
        num_samples, num_features = Xb.shape
        self.w_ = fmin_bfgs(self.objective_function, # what to optimize
                            np.zeros((num_features,1)), # starting point
                            fprime=self.objective_gradient, # gradient function
                            args=(Xb,y,self.C,self.reg), # extra args for gradient and objective function
                            gtol=1e-03, # stopping criteria for gradient, |v_k|
                            maxiter=self.iters, # stopping criteria iterations
                            disp=False)
        result = fmin_bfgs(self.objective_function, # what to optimize
                            np.zeros((num_features,1)), # starting point
                            fprime=self.objective_gradient, # gradient function
                            args=(Xb,y,self.C,self.reg), # extra args for gradient and objective function
                            gtol=1e-03, # stopping criteria for gradient, |v_k|
                            maxiter=self.iters, # stopping criteria iterations
                            disp=False,
                            retall=True)
        self.iterations = len(result)
        #print("Iterations: ", self.iterations)
        #print("iterations: ", self.iterations)
        self.w_ = self.w_.reshape((num_features,1))
    def getIterations(self):
        return self.iterations
# bfgslr = BFGSBinaryLogisticRegression(_,2) # note that we need only a few iterations here

# bfgslr.fit(X,y)
# yhat = bfgslr.predict(X)
# print(bfgslr)
# print('Accuracy of: ',accuracy_score(y,yhat+1))

CPU times: user 44 µs, sys: 3 µs, total: 47 µs
Wall time: 51 µs


In [178]:
class MultiClassLogisticRegression:
    def __init__(self, eta, iterations=20, C=0.0001, optimization=None,reg=0):
        self.eta = eta
        self.iters = iterations
        self.C = C
        self.classifiers_ = []
        self.optimization = optimization
        self.reg = reg
        self.params = {}

        # internally we will store the weights as self.w_ to keep with sklearn conventions
    
    def __str__(self):
        if(hasattr(self,'w_')):
            return 'MultiClass Logistic Regression Object with coefficients:\n'+ str(self.w_) # is we have trained the object
        else:
            return 'Untrained MultiClass Logistic Regression Object'
        
    def fit(self,X,y):
        num_samples, num_features = X.shape
        self.unique_ = np.sort(np.unique(y)) # get each unique class value
        num_unique_classes = len(self.unique_)
        self.classifiers_ = []
        for i,yval in enumerate(self.unique_): # for each unique value
            y_binary = y==yval # create a binary problem
            # train the binary classifier for this class
            #hblr = HessianBinaryLogisticRegression(self.eta,self.iters,self.C)
            if(self.optimization == "BFGSBinaryLogisticRegression"):
                #self.iters = 10
                hblr = BFGSBinaryLogisticRegression(self.eta,self.iters,self.C,self.reg)
                #print("Iterations: ",hblr.getIterations())

            elif(self.optimization == "StochasticLogisticRegression"):
                #self.iters = 2000 #1000
                hblr = StochasticLogisticRegression(self.eta,self.iters,self.C,self.reg)
            else:
                #self.iters = 100
                #self.C = 0.001
                hblr = LineSearchLogisticRegression(self.eta,self.iters,self.C,self.reg)

            hblr.fit(X,y_binary)
            #print(accuracy(y_binary,hblr.predict(X)))
            # add the trained classifier to the list
            self.classifiers_.append(hblr)
            
        # save all the weights into one matrix, separate column for each class
        self.w_ = np.hstack([x.w_ for x in self.classifiers_]).T
        
    def predict_proba(self,X):
        probs = []
        for hblr in self.classifiers_:
            probs.append(hblr.predict_proba(X).reshape((len(X),1))) # get probability for each classifier
        
        return np.hstack(probs) # make into single matrix
    
    def predict(self,X):
        return np.argmax(self.predict_proba(X),axis=1) # take argmax along row
    
    def get_params(self,deep=False):
        #return self.params
        return dict(C=self.C,eta=self.eta,iterations=self.iters, optimization=self.optimization)

    def set_params(self,**kwds):
        print(kwds)
        self.C = kwds['C']


## Varying Optimization Techniques, Etas, Iterations, and Regularization Parameters

In [179]:
# run logistic regression and vary some parameters

from sklearn import metrics as mt
with np.errstate(all='ignore'):
    # first we create a reusable logisitic regression object
    #   here we can setup the object with different learning parameters and constants

    optimizations = ["BFGSBinaryLogisticRegression","StochasticLogisticRegression","LineSearchLogisticRegression"]
    #optimizations = ["BFGSBinaryLogisticRegression","BFGSBinaryLogisticRegression","BFGSBinaryLogisticRegression"]
    etas = [0.1, 0.1, 0.001]
    iters = [10, 5000, 150]
    regs = [0,1,2]

    for optimization,eta,iter_,reg in zip(optimizations,etas,iters,regs):
        lr_clf = MultiClassLogisticRegression(eta=eta,iterations=iter_, C=0.02, optimization=optimization,reg=reg) # get object


        # now we can use the cv_object that we setup before to iterate through the 
        #    different training and testing sets. Each time we will reuse the logisitic regression 
        #    object, but it gets trained on different data each time we use it.

        iter_num=0
        # the indices are the rows used for training and testing in each iteration
        for train_indices, test_indices in cv_object.split(X,y): 
            # I will create new variables here so that it is more obvious what 
            # the code is doing (you can compact this syntax and avoid duplicating memory,
            # but it makes this code less readable)
            X_train = (X[train_indices])
            y_train = y[train_indices]

        #     print(X_train)
        #     print(y_train)

            X_test = (X[test_indices])
            y_test = y[test_indices]

    #         st = time.time()

            lr_clf.fit(X_train,y_train)  # train object
    #         t = (time.time() -st)
    #         lr_clf_times.append(t)

            lr_clf.fit(X_train,y_train)

            # train the reusable logisitc regression model on the training data
            y_hat = lr_clf.predict(X_test) # get test set precitions

            # now let's get the accuracy and confusion matrix for this iterations of training/testing
            acc = mt.accuracy_score(y_test,y_hat+1)
    #         lr_clf_accuracies.append(acc)
    #         cost_accuracies.append([acc])

            conf = mt.confusion_matrix(y_test,y_hat+1)
            print("====Iteration",iter_num," ====")
            if(reg == 0):
                label = "L1"
            elif(reg == 1):
                label = "L2"
            else:
                label = "L1 and L2"
            print('For ',optimization,' eta: ',eta, "Iterations: ",iter_,"Regularization: ",label,' : Accuracy of: ',acc)

            #print("accuracy", acc )
            print("confusion matrix\n",conf)
            iter_num+=1

        
    # Also note that every time you run the above code
    #   it randomly creates a new training and testing set, 
    #   so accuracy will be different each time

====Iteration 0  ====
For  BFGSBinaryLogisticRegression  eta:  0.1 Iterations:  10 Regularization:  L1  : Accuracy of:  0.386138613861
confusion matrix
 [[ 7 10  7  1  1]
 [ 9 13  8 10  1]
 [ 4  8 16 19  4]
 [ 3  4  6 24  5]
 [ 4  1  4 15 18]]
====Iteration 1  ====
For  BFGSBinaryLogisticRegression  eta:  0.1 Iterations:  10 Regularization:  L1  : Accuracy of:  0.326732673267
confusion matrix
 [[ 7  7  6  3  3]
 [ 8 11 11  7  3]
 [ 2 14 11 17 11]
 [ 1  4  7 11 16]
 [ 1  2  3 10 26]]
====Iteration 2  ====
For  BFGSBinaryLogisticRegression  eta:  0.1 Iterations:  10 Regularization:  L1  : Accuracy of:  0.341584158416
confusion matrix
 [[ 9  8  3  3  3]
 [ 6 10  6  3  0]
 [ 5 15 10 18  9]
 [ 1  6 15 13 15]
 [ 0  4  8  5 27]]
====Iteration 0  ====
For  StochasticLogisticRegression  eta:  0.1 Iterations:  5000 Regularization:  L2  : Accuracy of:  0.168316831683
confusion matrix
 [[ 0 26  0  0  0]
 [ 0 34  0  0  0]
 [ 0 50  0  0  0]
 [ 0 46  0  0  0]
 [ 0 46  0  0  0]]
====Iteration 1  ====


## With Pipelining PCA

In [None]:
# run logistic regression and vary some parameters
from sklearn import metrics as mt

# first we create a reusable logisitic regression object
#   here we can setup the object with different learning parameters and constants

optimizations = ["BFGSBinaryLogisticRegression","StochasticLogisticRegression","LineSearchLogisticRegression"]
#optimizations = ["BFGSBinaryLogisticRegression","BFGSBinaryLogisticRegression","BFGSBinaryLogisticRegression"]
etas = [0.1, 0.1, 0.001]
iters = [10, 5000, 150]
regs = [0,1,2]
components = 90
pca = PCA(n_components=components)

with np.errstate(all='ignore'):
    for optimization,eta,iter_,reg in zip(optimizations,etas,iters,regs):
        mglr = MultiClassLogisticRegression(eta=eta,iterations=iter_, C=0.02, optimization=optimization,reg=reg) # get object
        lr_clf = Pipeline([ ('pca', pca), ("multiclasslogregression", mglr)]) # get object


        # now we can use the cv_object that we setup before to iterate through the 
        #    different training and testing sets. Each time we will reuse the logisitic regression 
        #    object, but it gets trained on different data each time we use it.

        iter_num=0
        # the indices are the rows used for training and testing in each iteration
        for train_indices, test_indices in cv_object.split(X,y): 
            # I will create new variables here so that it is more obvious what 
            # the code is doing (you can compact this syntax and avoid duplicating memory,
            # but it makes this code less readable)
            X_train = (X[train_indices])
            y_train = y[train_indices]

        #     print(X_train)
        #     print(y_train)

            X_test = (X[test_indices])
            y_test = y[test_indices]

    #         st = time.time()

            lr_clf.fit(X_train,y_train)  # train object
    #         t = (time.time() -st)
    #         lr_clf_times.append(t)

            lr_clf.fit(X_train,y_train)

            # train the reusable logisitc regression model on the training data
            y_hat = lr_clf.predict(X_test) # get test set precitions

            # now let's get the accuracy and confusion matrix for this iterations of training/testing
            acc = mt.accuracy_score(y_test,y_hat+1)
    #         lr_clf_accuracies.append(acc)
    #         cost_accuracies.append([acc])

            conf = mt.confusion_matrix(y_test,y_hat+1)
            print("====Iteration",iter_num," ====")
            if(reg == 0):
                label = "L1"
            elif(reg == 1):
                label = "L2"
            else:
                label = "L1 and L2"
            print('For ',optimization,' eta: ',eta, "Iterations: ",iter_,"Regularization: ",label,' : Accuracy of: ',acc)

            #print("accuracy", acc )
            print("confusion matrix\n",conf)
            iter_num+=1

        
    # Also note that every time you run the above code
    #   it randomly creates a new training and testing set, 
    #   so accuracy will be different each time

====Iteration 0  ====
For  BFGSBinaryLogisticRegression  eta:  0.1 Iterations:  10 Regularization:  L1  : Accuracy of:  0.366336633663
confusion matrix
 [[10  4  5  6  2]
 [ 6  5  8  3  4]
 [ 2 11 15 11  9]
 [ 0  5 15 17 12]
 [ 2  1  8 14 27]]
====Iteration 1  ====
For  BFGSBinaryLogisticRegression  eta:  0.1 Iterations:  10 Regularization:  L1  : Accuracy of:  0.356435643564
confusion matrix
 [[ 4  6  7  5  1]
 [ 9 12  8  9  5]
 [ 6 13 22 15  3]
 [ 4  4  8  9 12]
 [ 0  1  2 12 25]]
====Iteration 2  ====
For  BFGSBinaryLogisticRegression  eta:  0.1 Iterations:  10 Regularization:  L1  : Accuracy of:  0.376237623762
confusion matrix
 [[11 10 11  1  1]
 [ 9 10  7  8  2]
 [ 3  8 17 11  5]
 [ 2  3 12 13 12]
 [ 0  2  9 10 25]]
====Iteration 0  ====
For  StochasticLogisticRegression  eta:  0.1 Iterations:  5000 Regularization:  L2  : Accuracy of:  0.336633663366
confusion matrix
 [[ 6  5  8  1  1]
 [13  5 12  3  1]
 [10  7 20  5 12]
 [ 5  7  8 15 18]
 [ 3  3  7  5 22]]
====Iteration 1  ====


Best: eta: 0.1, reg = 0, iters = 10, acc = 35%
eta: eta .001, iters 150, reg 2, acc 40

## Adjusting Values of C to Achieve Best Performance

In [None]:
# run logistic regression and vary some parameters
from sklearn import metrics as mt

# first we create a reusable logisitic regression object
#   here we can setup the object with different learning parameters and constants
lr_clf_accuracies = []
lr_clf_times = []
lr_clf_mem = []

costs = np.logspace(-3,1)
costs.sort()

cost_accuracies = []
with np.errstate(all='ignore'):
    for cost in costs:
        mglr = MultiClassLogisticRegression(eta=0.1,iterations=5000, C=cost, optimization="BFGSBinaryLogisticRegression",reg=0) # get object

        lr_clf = Pipeline([ ('pca', pca), ("multiclasslogregression", mglr)])
        # now we can use the cv_object that we setup before to iterate through the 
        #    different training and testing sets. Each time we will reuse the logisitic regression 
        #    object, but it gets trained on different data each time we use it.

        iter_num=0
        # the indices are the rows used for training and testing in each iteration
        for train_indices, test_indices in cv_object.split(X,y): 
            # I will create new variables here so that it is more obvious what 
            # the code is doing (you can compact this syntax and avoid duplicating memory,
            # but it makes this code less readable)
            X_train = (X[train_indices])
            y_train = y[train_indices]

        #     print(X_train)
        #     print(y_train)

            X_test = (X[test_indices])
            y_test = y[test_indices]

            st = time.time()

            mem = memory_usage((lr_clf.fit,(X_train,y_train))) # train object
            t = (time.time() -st)
            lr_clf_times.append(t)
            lr_clf_mem.append(mem[0])

            # train the reusable logisitc regression model on the training data
            y_hat = lr_clf.predict(X_test) # get test set precitions

            # now let's get the accuracy and confusion matrix for this iterations of training/testing
            acc = mt.accuracy_score(y_test,y_hat+1)
            lr_clf_accuracies.append(acc)
            cost_accuracies.append([acc])

            conf = mt.confusion_matrix(y_test,y_hat+1)
            print("====Iteration",iter_num," ====")
            print("accuracy", acc )
            print("confusion matrix\n",conf)
            iter_num+=1

# Also note that every time you run the above code
#   it randomly creates a new training and testing set, 
#   so accuracy will be different each time

====Iteration 0  ====
accuracy 0.391089108911
confusion matrix
 [[ 6 13  7  4  1]
 [ 6 11 15  4  2]
 [ 3  6 16 15  4]
 [ 0  3 15 18  9]
 [ 1  1  7  7 28]]
====Iteration 1  ====
accuracy 0.331683168317
confusion matrix
 [[ 9  1 10  4  3]
 [11  9 10 10  1]
 [ 9  6 12 16  5]
 [ 3  5 16 13 14]
 [ 1  0  4  6 24]]
====Iteration 2  ====
accuracy 0.346534653465
confusion matrix
 [[ 9  8  5  3  1]
 [ 5  5 12  4  4]
 [ 7  9 18 16  5]
 [ 2  6  8 17 21]
 [ 3  0  4  9 21]]
====Iteration 0  ====
accuracy 0.321782178218
confusion matrix
 [[12 12  2  0  0]
 [ 8 25  4  2  2]
 [ 6 31  5  4  1]
 [ 1 23  5  3  8]
 [ 0 20  3  5 20]]
====Iteration 1  ====
accuracy 0.40099009901
confusion matrix
 [[ 8  3  4  5  0]
 [ 7  8  8  3  1]
 [ 6  7 12  7 11]
 [ 1  8 15 17  9]
 [ 0  3 10 13 36]]
====Iteration 2  ====
accuracy 0.361386138614
confusion matrix
 [[ 8 12  6  4  1]
 [ 3  9 10  5  2]
 [ 4 10 21  9  7]
 [ 3  4 13 11 15]
 [ 1  2  4 14 24]]
====Iteration 0  ====
accuracy 0.386138613861
confusion matrix
 [[ 9  6

In [None]:
from itertools import repeat
c2 = [x for pair in zip(costs,costs,costs) for x in pair]

In [None]:
li = [np.arange(1,151)]
p = plt.plot(cost_accuracies)
plt.title("Determining Optimal Value of Regularization Term C")
plt.xlabel('Value of C ')
plt.ylabel('Accuracy (%) ')
costs_plot = np.around(c2,decimals=2)
plt.xticks(li[0],costs_plot, rotation=90)
plt.figure()

In [None]:
max_y = max(p[0].get_ydata())
d = {}
for pair in zip(c2,p[0].get_ydata()):
    d[pair[1]] = pair[0]
max_x = d[max_y]
print(max_x)


## Comparing our Best Logistic Regression Optimization Procedure to that of Scikit-Learn

In [None]:

lr_clf_accuracies = []
lr_clf_times = []
lr_clf_mem = []
lr_clf_iterations = []

#For  BFGSBinaryLogisticRegression  eta:  0.1 Iterations:  10 Regularization:  L1  : Accuracy of:  0.381188118812

mglr = MultiClassLogisticRegression(eta=0.1,iterations=10, C=max_x, optimization="BFGSBinaryLogisticRegression",reg=0) # get object

lr_clf = Pipeline([ ('pca', pca), ("multiclasslogregression", mglr)])
# now we can use the cv_object that we setup before to iterate through the 
#    different training and testing sets. Each time we will reuse the logisitic regression 
#    object, but it gets trained on different data each time we use it.

iter_num=0
# the indices are the rows used for training and testing in each iteration
with np.errstate(all='ignore'):
    for train_indices, test_indices in cv_object.split(X,y): 
        # I will create new variables here so that it is more obvious what 
        # the code is doing (you can compact this syntax and avoid duplicating memory,
        # but it makes this code less readable)
        X_train = (X[train_indices])
        y_train = y[train_indices]

    #     print(X_train)
    #     print(y_train)

        X_test = (X[test_indices])
        y_test = y[test_indices]

        st = time.time()

        mem = memory_usage((lr_clf.fit,(X_train,y_train))) # train object
        t = (time.time() -st)
        lr_clf_times.append(t)
        lr_clf_mem.append(mem[0])

        # train the reusable logisitc regression model on the training data
        y_hat = lr_clf.predict(X_test) # get test set precitions

        # now let's get the accuracy and confusion matrix for this iterations of training/testing
        acc = mt.accuracy_score(y_test,y_hat+1)
        lr_clf_accuracies.append(acc)
        cost_accuracies.append([acc])

        conf = mt.confusion_matrix(y_test,y_hat+1)
        print("====Iteration",iter_num," ====")
        print("accuracy", acc )
        print("confusion matrix\n",conf)
        iter_num+=1
for x in range(0,15):
    lr_clf_iterations.append(2) 
# Also note that every time you run the above code
#   it randomly creates a new training and testing set, 
#   so accuracy will be different each time

In [None]:
# run logistic regression and vary some parameters
from sklearn import metrics as mt
from sklearn.linear_model import LogisticRegression as SKLogisticRegression
# first we create a reusable logisitic regression object
#   here we can setup the object with different learning parameters and constants


lr_sk = SKLogisticRegression(solver='lbfgs',class_weight='balanced',max_iter=500,C=0.002) 

lr_sk_accuracies = []
lr_sk_times = []
lr_sk_mem = []
lr_sk_iterations = []
# now we can use the cv_object that we setup before to iterate through the 
#    different training and testing sets. Each time we will reuse the logisitic regression 
#    object, but it gets trained on different data each time we use it.

iter_num=0
# the indices are the rows used for training and testing in each iteration
for train_indices, test_indices in cv_object.split(X,y): 
    # I will create new variables here so that it is more obvious what 
    # the code is doing (you can compact this syntax and avoid duplicating memory,
    # but it makes this code less readable)
    X_train = X[train_indices]
    y_train = y[train_indices]
    
#     print(X_train)
#     print(y_train)
    
    X_test = X[test_indices]
    y_test = y[test_indices]
    
    # train the reusable logisitc regression model on the training data
    st = time.time()
    mem = memory_usage((lr_sk.fit,(X_train,y_train)))
    #lr_sk.fit(X_train,y_train)
    t = (time.time() -st)
    lr_sk_times.append(t)
    lr_sk_mem.append(mem[0])
    #print(np.hstack((lr_sk.intercept_[:,np.newaxis],lr_sk.coef_)))
    yhat = lr_sk.predict(X_test)
    print("Iterations ",lr_sk.n_iter_)
    lr_sk_iterations.append(lr_sk.n_iter_)
    # now let's get the accuracy and confusion matrix for this iterations of training/testing
    acc = mt.accuracy_score(y_test,y_hat)
    lr_sk_accuracies.append(acc)
    conf = mt.confusion_matrix(y_test,y_hat)
    print("====Iteration",iter_num," ====")
    print("accuracy", acc )
    print("confusion matrix\n",conf)
    iter_num+=1

print(lr_sk_times)
# Also note that every time you run the above code
#   it randomly creates a new training and testing set, 
#   so accuracy will be different each time

In [None]:
plt.boxplot([lr_sk_accuracies,lr_clf_accuracies])
plt.title("Comparing Accuracies")
plt.xlabel('Implementation of Logistic Regression')
plt.ylabel('Accuracy Percentage ')
plt.xticks([1,2],['SKL','OURS'])
plt.figure()
print((time.time() -st)*100)
# ax = fig.add_subplot(111)


In [None]:
plt.boxplot([lr_sk_times,lr_clf_times])
plt.title("Comparing Training Times")
plt.xlabel('Implementation of Logistic Regression')
plt.ylabel('Training Time (seconds) ')
plt.xticks([1,2],['SKL','OURS'])
plt.figure()

In [None]:
plt.boxplot([lr_sk_mem,lr_clf_mem])
plt.title("Comparing Memory ")
plt.xlabel('Implementation of Logistic Regression')
plt.ylabel('Memory Usage (mb) ')
plt.xticks([1,2],['SKL','OURS'])
plt.figure()

In [None]:
plt.boxplot([lr_sk_iterations,lr_clf_iterations])
plt.title("Comparing Number of Iterations ")
plt.xlabel('Implementation of Logistic Regression')
plt.ylabel('Number of Iterations')
plt.xticks([1,2],['SKL','OURS'])
plt.figure()

## Analyzing which implementation of Logistic Regression would be best for our case

## Exceptional Work

One idea: Make your implementation of logistic regression compatible with the GridSearchCV function that is part of scikit-learn.


In [None]:
param_grid_input = {'C': costs[:3] }
mglr = MultiClassLogisticRegression(eta=eta,iterations=iter_, C=0.02, optimization="BFGSBinaryLogisticRegression")
gscv = GridSearchCV(cv= cv_object, estimator=mglr, param_grid= param_grid_input, scoring= "accuracy",refit=False)
gscv.fit(X,y)
print(gscv.best_params_)

# 