## Implementing Gradient Descent
#### for Logistic Regression

In [None]:
import pandas as pd
import numpy as np

We are going to make a Class object. We've been working with certain classes. Going to come to the foreground now.

An **instance** of a Class is made by a function (with same name as the Class) that initializes it. 

A Class can have:
* attributes - variables which get assigned a value for each instance of the Class; 
* methods - functions that can "see" and use attributes and other methods of the Class instance.

In [None]:
# Numpy array is a Class. 
my_array = np.array([[1],[2],[312]]) # this initializes an array instance
my_array.shape # shape is an attribute

In [None]:
# printing output of 3 methods for the Class instance
print( my_array.astype('float') )  
print( my_array.min(axis=0) )
print( my_array.flatten() )

In [None]:
auto = pd.read_csv('../../DataSets/Auto.csv')
auto.head() # A method called on auto (instance of pandas.DataFrame Class)

We're going to use data from other columns (not `mpg`), to predict whether a car has "high" mpg or "low" mpg.

In [None]:
auto['mpg01'] = np.array(auto['mpg'] > auto['mpg'].median()).astype('int') # make column with 0 or 1; =1 if mpg is larger than median mpg
auto.head()

In [None]:
data_init = auto[['cylinders','displacement','horsepower','weight','acceleration','year']]
labels = auto['mpg01']

In [None]:
data = pd.DataFrame({}) # empty dataframe

In [None]:
# Standardize scale of columns (so that max minus min in each column is same)
for c in data_init.columns:
  data[c] = (data_init[c] - data_init[c].min())/ (data_init[c].max() - data_init[c].min())

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
clf=LogisticRegression(tol=0.001)
clf.fit(data, labels)

In [None]:
clf.coef_

In [None]:
clf.score(data, labels)

In [None]:
np.sum(clf.predict(data) == labels)/len(labels)

Logistic Regression class does not use gradient descent. We will make a Class that does.

In [None]:
def sigma(z):
  return 1/(1+np.exp(-z))

In [None]:
class LogisticModel:
  """
  info about the class
  """
  def __init__(self, tolerance=0.001, iteration_cap=1e4):
    self.coefs = None
    ## also put in attributes for the threshhold and max number of iterations: call them 'tol' and 'max_iters'
  def grad(self, x,y):
    n = x.shape[0]
    X = np.hstack((x,np.ones(n).reshape(-1,1)))
    # below computes the per-example loss
    per_exm = ( -y*(X.T)*(1 - self.predict(X[:,:-1])) + (1-y)*(X.T)*self.predict(X[:,:-1]) ).T
    return np.sum(per_exm, axis=0)/n
  def predict(self, x):
    return sigma(x@self.coefs[:-1] + self.coefs[-1])
  def fit(self, x, y, lr=0.1, return_iter=True):
    n, d = x.shape
    t = 0
    self.coefs = np.zeros(d+1, dtype='float')
    # put code here to iteratively update self.coefs with gradient descent steps, 
    # stopping after change in parameters falls below threshhold
    if return_iter:
      print(f'Last iteration: {t}.')
    return None  # technically don't need this line

The cells below should run after you have filled in the code above.

In [None]:
my_model = LogisticModel()
my_model.fit(data.to_numpy(), labels.to_numpy(), lr=1.1)

In [None]:
my_model.coefs

In [None]:
y_prob = my_model.predict(data.to_numpy())
y_pred = (y_prob > 0.5).astype('int')

In [None]:
np.sum(y_pred == labels)/len(labels)