## LOGISITC REGRESSION

* typically a binary classification problem - Class 1 or Class 0
* Project the group scores onto a sigmoid function that better covers both the classes than a straight line 
* Set a threshold to the best and classify into groups


### Mathematically
* We have likelihood functions for x, y and beta.
* Estimate the Beta - MLE. Select the beta value that maximizes the probability of observing data into the right class for the given vector of x points. For the given set of x values, a probability of y  in class 1 or class 0 is projected. This is the likelihood of Beta. Now we maximize this.
* The maximizing is done in two steps - Take the log likelihood, apply gradient descent. To apply the GD, we use the log loss function, that is the exact opposite of Log likelihood. We start by randomly choosing the beta value, keep iterating and arrive at the minimum error WHICH becomes the MLE of Beta
* Use learning rate to control the gradient of the Betas

In [1]:
## main function
def logistic_regression(x,y, iterations = 100, learning_rate = 0.01):
    m,n = len(x), len(x[0])
    beta_0, beta_other = initialize_params(n)
    for i in range(iterations):
        gradient_beta_0, gradient_beta_other = (compute_gradients(x,y,beta_0, beta_other,m,n, 50))
        beta_0, beta_other = update_params(beta_0, beta_other, gradient_beta_0, gradient_beta_other, learning_rate)
        return beta_0, beta_other
    
    

In [2]:
#supporting functions
# initializing the beta parameters,random start for the gradient descent
def initialize_params(dimensions):
    beta_0 = 0
    beta_other = [random.random() for i in range(dimensions)]
    return beta_0, beta_other



In [3]:
# Compute functions

def compute_gradients(x,y,beta_0, beta_other, n,m):
    gradient_beta_0 = 0
    gradient_beta_other = [0]*n
    
    for i , point in enumerate(x): # computing gradients for each data point in x
        prediction = logistic_regression(point, beta_0, beta_other) # getting the prediction for that point 
        
        for j, features in enumerate(point): # compute the gradient for that single point
            gradient_beta_other[j] += (pred - y[i])*features/m #similar to the function I wrote for Gradient at Betaj. Accumulate the data point from all data points and  normalize them by /m
            gradient_beta_0 += (pred - y[i])/m
    return gradient_beta_0, gradient_beta_other



In [5]:
#Mini - Batch gradient descent
def compute_gradients_minibatch(x,y,beta_0, beta_other, n,m, batch_size):
    gradient_beta_0 = 0
    gradient_beta_other = [0]*n
    
    for i in range(batch_size): # we pick a batch size and perform validation. Something like cross validatipn
        i = random.randint(0, m-1)
        point = x[i]
        prediction = logistic_regression(point, beta_0, beta_other) # getting the prediction for that point 
        
        for j, features in enumerate(point): # compute the gradient for that single point
            gradient_beta_other[j] += (pred - y[i])*features/m #similar to the function I wrote for Gradient at Betaj. Accumulate the data point from all data points and  normalize them by /m
            gradient_beta_0 += (pred - y[i])/m
    return gradient_beta_0, gradient_beta_other




In [4]:
# Update the paramters

def update_params(beta_0, beta_other, gradient_beta_0, gradient_beta_other, learning_rate):
    beta_0 -= gradient_beta_0 * learning_rate
    
    for i in range(len(beta_other)):
        beta_other[i] -= (gradient_beta_other[i])*learning_rate
        return beta_0, beta_other