# Logistic Regression
Logistic regression is a classification model which assumes that the probability of each class is proportional to a weighted sum of the input features.

In terms of mathematics, it is basically the same as linear regression except we apply a sigmoid activation to our output to map it between 0 and 1 as it represents a probability.

### $h = \sigma(XW)$

Even though Mean Squared Error Loss would work for this problem too, it has been shown that using Cross Entropy Loss leads to faster convergence.

### $ J = \sum_{i=1}^{m} - y^{(i)} \cdot \text{log}(h^{(i)}) + (1-y^{(i)}) \cdot \text{log}(1-h^{(i)})$

### Multi-class case
Consider the case where we are not performing binary classification but have multiple classes. How do we extend our regression model to work with this? We simply have multiple outputs where each output is a sigmoid applied to a linear combination of the inputs representing the probability of belonging to each class. This corresponds to adding an extra column in our weights vector so it is now a matrix. To calculate the cost we use the same cost function but sum the cost across all outputs.

To represent the labels, we can't use a single scalar value representing the class number anymore since our output is a vector so we wouldn't be able to compare them. Instead, we use one-hot encode the label. This is when we have a vector of length K, where K is the number of classes, which has a value of 0 for all the numbers except for the label class number which has a value of 1. If you think about it intuitively, this makes sense as we want our model to predict a probability of 1 for belonging to the label class and 0 for all other classes.

### Optimization
As usual, we use gradient descent to optimize this. Calculating our outputs and using that to calculate the gradient of the cost w.r.t each parameter so we can update the parameters to reduce the cost.

## Implementation
We are going to be implementing a model that takes in 4 features which are measurements of a flower and it will predict which one of the three species it belongs to.

We begin by importing the required libraries

In [11]:
import numpy as np
import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt
import pandas as pd

Import the required dataset intro a pandas data frame. Map our class labels into numerical values and shuffle the dataset.

In [12]:
df = pd.read_csv('Iris.csv')
df[['Species']] = df['Species'].map({'Iris-setosa':0, 'Iris-virginica':1, 'Iris-versicolor':2}) #map text labels to numberical vaules
df = df.sample(frac=1) #shuffle our dataset

print(df.head())

      Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm  Species
34    35            4.9           3.1            1.5           0.1        0
100  101            6.3           3.3            6.0           2.5        1
0      1            5.1           3.5            1.4           0.2        0
47    48            4.6           3.2            1.4           0.2        0
145  146            6.7           3.0            5.2           2.3        1


Select the appropriate features and labels and convert them into torch tensors so we can use them in our model

In [13]:
X = torch.Tensor(np.array(df[df.columns[1:-1]])) #pick our features from our dataset
Y = torch.LongTensor(np.array(df[['Species']]).squeeze()) #select our label - squeeze() removes redundant dimensions

Split the data into training and test sets and convert to variables so we can use them with the torch autograd library

In [14]:
m = 100 #size of training set

#training set
x_train = Variable(X[0:m])
y_train = Variable(Y[0:m])

#test set
x_test = Variable(X[m:])
y_test = Variable(Y[m:])

Define the Logistic Model using PyTorch's class interface.

In [15]:
class logisticmodel(torch.nn.Module):
    def __init__(self):
        super().__init__() #call parent class initializer
        self.linear = torch.nn.Linear(4, 3) #define linear combination function with 4 inputs and 3 outputs

    def forward(self, x):
        x = self.linear(x) #linearly combine our inputs to give 3 outputs
        x = torch.nn.functional.softmax(x, dim=1) #activate our output neurons to give probabilities of belonging to each of the three class
        return x

Define the training hyperparameters, cost function and optimizer. Instantiate a model from the class we defined earlier.

In [16]:
no_epochs = 100
lr = 0.1

mymodel = logisticmodel() #create our model from defined class
criterion = torch.nn.CrossEntropyLoss() #cross entropy cost function as it is a classification problem
optimizer = torch.optim.Adam(mymodel.parameters(), lr = lr) #define our optimizer

Define the axes which we will use to plot the costs. Define the function used to train the model and train it for the number of epochs specified earlier.

In [None]:
costs=[] #store the cost each epoch
plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel('Epoch')
ax.set_ylabel('Cost')
ax.set_xlim(0, no_epochs)
plt.show()

#define train function
def train(no_epochs):
    for epoch in range(no_epochs):
        h = mymodel.forward(x_train) #forward propagate - calulate our hypothesis

        #calculate, plot and print cost
        cost = criterion(h, y_train)
        costs.append(cost.data[0])
        ax.plot(costs, 'b')
        fig.canvas.draw()
        print('Epoch ', epoch, ' Cost: ', cost.data[0])

        #calculate gradients + update weights using gradient descent step with our optimizer
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()

train(no_epochs) #train the model

Test the accuracy of the trained model

In [None]:
test_h = mymodel.forward(x_test) #predict probabilities for test set
_, test_h = test_h.data.max(1) #returns the output which had the highest probability
test_y = y_test.data
correct = torch.eq(test_h, test_y) #perform element-wise equality operation
accuracy = torch.sum(correct)/correct.shape[0] #calculate accuracy
print('Test accuracy: ', accuracy)


Save the optimized model parameters

In [None]:
torch.save(mynet.state_dict(), 'trained_logistic_model')