# Logistic Regression

In this video, we will be performing a logistic regression with `pytorch`. We will follow the same steps as the linear regression tutorial:

* Forward pass: Design model (input size, output size)
* Construct the loss and optimizer
* Training loop
    * Forward pass: compute prediction and loss
    * Backward pass: gradients
    * Update weights

The code here will be very similar to the code we made in the linear regression tutorial. However, we will add one more layer to the model and use a different loss function.

First, let's import some packages.

In [43]:
import torch
import torch.nn as nn
import numpy as np #data transformations
from sklearn import datasets #to make a binary classification data set
from sklearn.preprocessing import StandardScaler #to scale our features
from sklearn.model_selection import train_test_split #to split data into the training and testing sets

First, we have to prepare our data.

In [44]:
#lets load the breast cancer data from sklearn - this is a binary classification problem
#where we can predict the presence of breast cancer from the input features
bc = datasets.load_breast_cancer() 

#defining our predictors and outcome
X, y = bc.data, bc.target

#getting the number of samples and number of features
n_samples, n_features = X.shape
print(X.shape) #we have 569 samples and 30 features

#splitting the data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

#now, we have to scale our features - we will use a standard scalar - this will make the features 0 means and unit variance
#this is always recommended to do with logistic regression
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

(569, 30)


Now, we want to convert our data into `pytorch` tensors. 

In [45]:
#right now, it is a double type, so we need to convert it to a float32
X_train = torch.from_numpy(X_train.astype(np.float32))
X_test = torch.from_numpy(X_test.astype(np.float32))
y_train = torch.from_numpy(y_train.astype(np.float32))
y_test = torch.from_numpy(y_test.astype(np.float32))

#we have to reshape our y tensors using .view() function from pytorch to reshape the tensor
y_train = y_train.view(y_train.shape[0], 1) #right now y only has 1 row and many columns, but we want it to have
#only 1 columns and many rows
y_test = y_test.view(y_test.shape[0], 1)

#defining the number of samples and features
n_samples = list(X.shape)[0]
n_features = list(X.shape)[1]

print(n_samples, n_features)

569 30


Now, we can set up the model. Our model is a linear combination of weights and a bias. In the logistic regression case, we apply a sigmoid function at the end. For this, we want to write our own class.

In [46]:
class LogisticRegression(nn.Module):
    
    #first the linear layer
    def __init__(self, n_input_features):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(n_input_features, 1) #out is 1 because we only want 1 output at the end
    
    #now we want to apply the sigmoid function
    def forward(self, x):
        y_predicted = torch.sigmoid(self.linear(x))
        return y_predicted

model = LogisticRegression(n_features)        

Now, we will define the loss and construct the optimizer.

In [47]:
#defining the learning rate
learning_rate = 0.01

#defining the loss function
criterion = nn.BCELoss() #binary cross entropy loss
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

Now, do to do the training loop:

In [48]:
#number of iterations
num_epochs = 100

#training loop
for epoch in range(num_epochs):

    #forward pass and loss
    y_predicted = model(X_train)
    loss = criterion(y_predicted, y_train)
    
    #backward pass - calculate the gradients
    loss.backward()
    
    #update the weights
    optimizer.step()
    
    #empty the gradient so the .grad() attribute does not have the added gradients because .backward() will do this
    optimizer.zero_grad()
    
    #print information
    if (epoch + 1) % 10 == 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

epoch: 10, loss = 0.4750
epoch: 20, loss = 0.4174
epoch: 30, loss = 0.3752
epoch: 40, loss = 0.3428
epoch: 50, loss = 0.3169
epoch: 60, loss = 0.2958
epoch: 70, loss = 0.2781
epoch: 80, loss = 0.2631
epoch: 90, loss = 0.2501
epoch: 100, loss = 0.2388


Now, we want to evaluate our model on the testing data. The evaluation should not be part of the computational graph where we track the history.

In [49]:
with torch.no_grad():
    y_predicted = model(X_test)
    #now to convert to class labels (0, 1) - sigmoid will return values between 0 and 1 in decimals
    y_predicted_cls = y_predicted.round() #if we did not specify the with, this would be part of our computational graph
    
    #accuracy calculation
    acc = y_predicted_cls.eq(y_test).sum() / float(y_test.shape[0]) #for every correct prediction, we add + 1 
    #and divide by the number of test samples
    
    print(f'accuracy = {acc:.4f}')

accuracy = 0.8947


The accuracy is 89.5%. So, our logistic regression is doing fairly well. To improve the results, you can play around with the number of iterations, the learning rate, or trying out a different optimizer.