# Logistic regression with Pytorch

Logistic Regression is a statistical method used to model the relationship between a binary or categorical target variable and one or more predictor variables. The goal is to estimate the probability of a particular class or event occurring, given the predictor variables.

logistic regression uses sigmoid function to model the relationship between the predictors and the outcome. The sigmoid function maps any real-valued number to a value between 0 and 1.

The sigmoid function is defined as:

y = 1 / (1 + e^(-x))
where:

- y is the predicted probability
- e is the base of the natural logarithm
- x is a linear combination of the predictors: x = b0 + b1*X1 + b2*X2 + ... + bn*Xn

### 1. Importing relevant libraries

In [1]:
# !pip install matplotlib==3.8.2
# !pip install numpy==1.26.2
# !pip install pandas==2.1.4
# !pip install scikit_learn==1.4.2
# !pip install seaborn==0.13.2
# !pip install torch==2.2.2
# !pip install torchvision==0.17.2

In [2]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from sklearn.utils import resample
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from warnings import filterwarnings
filterwarnings('ignore')

In [3]:
df = pd.read_csv("../Input/data.csv")

In [4]:
df.tail(3)

Unnamed: 0,year,customer_id,phone_no,gender,age,no_of_days_subscribed,multi_screen,mail_subscribed,weekly_mins_watched,minimum_daily_mins,maximum_daily_mins,weekly_max_night_mins,videos_watched,maximum_days_inactive,customer_support_calls,churn
1997,2015,998474,353-2080,,53,94,no,no,128.85,15.6,14.6,110,16,5.0,0,0.0
1998,2015,998934,359-7788,Male,40,94,no,no,178.05,10.4,20.18,100,6,,3,0.0
1999,2015,999961,414-1496,Male,37,73,no,no,326.7,10.3,37.03,89,6,3.0,1,1.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 16 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   year                    2000 non-null   int64  
 1   customer_id             2000 non-null   int64  
 2   phone_no                2000 non-null   object 
 3   gender                  1976 non-null   object 
 4   age                     2000 non-null   int64  
 5   no_of_days_subscribed   2000 non-null   int64  
 6   multi_screen            2000 non-null   object 
 7   mail_subscribed         2000 non-null   object 
 8   weekly_mins_watched     2000 non-null   float64
 9   minimum_daily_mins      2000 non-null   float64
 10  maximum_daily_mins      2000 non-null   float64
 11  weekly_max_night_mins   2000 non-null   int64  
 12  videos_watched          2000 non-null   int64  
 13  maximum_days_inactive   1972 non-null   float64
 14  customer_support_calls  2000 non-null   

#### 2. Data Cleaning

In [6]:
df = df.drop(["year", "customer_id", "phone_no"], axis=1)#Dropping Unnecessary columns

In [7]:
df.head(2)

Unnamed: 0,gender,age,no_of_days_subscribed,multi_screen,mail_subscribed,weekly_mins_watched,minimum_daily_mins,maximum_daily_mins,weekly_max_night_mins,videos_watched,maximum_days_inactive,customer_support_calls,churn
0,Female,36,62,no,no,148.35,12.2,16.81,82,1,4.0,1,0.0
1,Female,39,149,no,no,294.45,7.7,33.37,87,3,3.0,2,0.0


In [8]:
print(df.shape)         
print(df.columns)   
df.dtypes   

(2000, 13)
Index(['gender', 'age', 'no_of_days_subscribed', 'multi_screen',
       'mail_subscribed', 'weekly_mins_watched', 'minimum_daily_mins',
       'maximum_daily_mins', 'weekly_max_night_mins', 'videos_watched',
       'maximum_days_inactive', 'customer_support_calls', 'churn'],
      dtype='object')


gender                     object
age                         int64
no_of_days_subscribed       int64
multi_screen               object
mail_subscribed            object
weekly_mins_watched       float64
minimum_daily_mins        float64
maximum_daily_mins        float64
weekly_max_night_mins       int64
videos_watched              int64
maximum_days_inactive     float64
customer_support_calls      int64
churn                     float64
dtype: object

### 3.Data Preprocessing

In [9]:
df.isnull().sum()

gender                    24
age                        0
no_of_days_subscribed      0
multi_screen               0
mail_subscribed            0
weekly_mins_watched        0
minimum_daily_mins         0
maximum_daily_mins         0
weekly_max_night_mins      0
videos_watched             0
maximum_days_inactive     28
customer_support_calls     0
churn                     35
dtype: int64

In [10]:
final_data = df.dropna() 
final_data.head()

Unnamed: 0,gender,age,no_of_days_subscribed,multi_screen,mail_subscribed,weekly_mins_watched,minimum_daily_mins,maximum_daily_mins,weekly_max_night_mins,videos_watched,maximum_days_inactive,customer_support_calls,churn
0,Female,36,62,no,no,148.35,12.2,16.81,82,1,4.0,1,0.0
1,Female,39,149,no,no,294.45,7.7,33.37,87,3,3.0,2,0.0
2,Female,65,126,no,no,87.3,11.9,9.89,91,1,4.0,5,1.0
3,Female,24,131,no,yes,321.3,9.5,36.41,102,4,3.0,3,0.0
4,Female,40,191,no,no,243.0,10.9,27.54,83,7,3.0,1,0.0


In [11]:
final_data.shape

(1918, 13)

In [12]:
final_data.isnull().sum()

gender                    0
age                       0
no_of_days_subscribed     0
multi_screen              0
mail_subscribed           0
weekly_mins_watched       0
minimum_daily_mins        0
maximum_daily_mins        0
weekly_max_night_mins     0
videos_watched            0
maximum_days_inactive     0
customer_support_calls    0
churn                     0
dtype: int64

In [13]:
final_data["churn"].value_counts()       

churn
0.0    1665
1.0     253
Name: count, dtype: int64

- There is a class imbalance. To counter the imbalance, we'll perform sampling

In [14]:
df_majority = final_data[final_data['churn']==0] #class 0
df_minority = final_data[final_data['churn']==1] #class 1

In [15]:
df_minority["churn"]

2       1.0
18      1.0
22      1.0
24      1.0
26      1.0
       ... 
1926    1.0
1936    1.0
1940    1.0
1959    1.0
1999    1.0
Name: churn, Length: 253, dtype: float64

In [16]:
df_minority_upsampled = resample(df_minority, replace=True, n_samples=900, random_state=123) #upsampling minority class
df_majority_downsampled = resample(df_majority, replace=False, n_samples=900, random_state=123) #downsampling majority class

- *Minority is less, we increase the samples ie., upsampled*

In [17]:
#concanating both upsampled and downsampled class
df2 = pd.concat([df_majority_downsampled, df_minority_upsampled])

In [18]:
df2["churn"].value_counts()

churn
0.0    900
1.0    900
Name: count, dtype: int64

In [19]:
df2[['gender', 'multi_screen', 'mail_subscribed']]

Unnamed: 0,gender,multi_screen,mail_subscribed
1813,Male,yes,no
1362,Male,no,no
389,Female,no,no
1203,Male,no,no
1710,Male,no,no
...,...,...,...
1759,Male,no,no
845,Male,no,yes
602,Female,yes,no
22,Female,no,no


*Before feeding this to the model we need to encode the categroical variables*

In [20]:
label_encoder = preprocessing.LabelEncoder()
df2['gender']= label_encoder.fit_transform(df2['gender'])
df2['multi_screen']= label_encoder.fit_transform(df2['multi_screen'])
df2['mail_subscribed']= label_encoder.fit_transform(df2['mail_subscribed'])

In [21]:
df2.head(2)

Unnamed: 0,gender,age,no_of_days_subscribed,multi_screen,mail_subscribed,weekly_mins_watched,minimum_daily_mins,maximum_daily_mins,weekly_max_night_mins,videos_watched,maximum_days_inactive,customer_support_calls,churn
1813,1,35,148,1,0,222.3,8.6,25.19,62,3,3.0,2,0.0
1362,1,42,98,0,0,241.5,12.1,27.37,113,4,4.0,4,0.0


#### Independent columns for feeding the model for training by keeping it into separate dataframe

In [22]:
X = df2.iloc[:,:-1] #All rows & skipping last column

In [23]:
X.head(2)

Unnamed: 0,gender,age,no_of_days_subscribed,multi_screen,mail_subscribed,weekly_mins_watched,minimum_daily_mins,maximum_daily_mins,weekly_max_night_mins,videos_watched,maximum_days_inactive,customer_support_calls
1813,1,35,148,1,0,222.3,8.6,25.19,62,3,3.0,2
1362,1,42,98,0,0,241.5,12.1,27.37,113,4,4.0,4


*Scaling is a crucial preprocessing step that can help improve the performance and convergence of many machine learning algorithms. It ensures that all features contribute equally to the model, prevents features with larger scales from dominating*

In [24]:
sc = StandardScaler()   # Bringing the mean to 0 and variance to 1, so as to have a non-noisy optimization
X = sc.fit_transform(X)
X = sc.transform(X)

In [25]:
X

array([[ 0.914526  , -3.64278642, -2.49377785, ..., -1.91821933,
        -4.66177364, -1.11056583],
       [ 0.914526  , -3.58230119, -2.52651659, ..., -1.77079908,
        -3.03689902, -0.34254207],
       [-3.08881506, -3.60822343, -2.55401713, ..., -1.77079908,
        -6.28664826, -1.49457771],
       ...,
       [-3.08881506, -3.41812701, -2.53306434, ..., -2.06563959,
        -6.28664826, -0.34254207],
       [-3.08881506, -3.55637895, -2.54092164, ..., -1.62337882,
        -3.03689902, -1.49457771],
       [-3.08881506, -3.59094194, -2.53830254, ..., -1.18111806,
        -6.28664826, -1.87858958]])

In [26]:
df2

Unnamed: 0,gender,age,no_of_days_subscribed,multi_screen,mail_subscribed,weekly_mins_watched,minimum_daily_mins,maximum_daily_mins,weekly_max_night_mins,videos_watched,maximum_days_inactive,customer_support_calls,churn
1813,1,35,148,1,0,222.30,8.6,25.19,62,3,3.0,2,0.0
1362,1,42,98,0,0,241.50,12.1,27.37,113,4,4.0,4,0.0
389,0,39,56,0,0,379.80,4.4,43.04,133,4,2.0,1,0.0
1203,1,42,37,0,0,350.55,12.0,39.73,101,2,4.0,2,0.0
1710,1,44,63,0,0,246.75,11.2,27.97,97,2,4.0,0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1759,1,34,89,0,0,455.85,5.3,51.66,89,3,2.0,1,1.0
845,1,37,122,0,1,262.35,13.5,29.73,116,3,4.0,1,1.0
602,0,61,88,1,0,352.65,7.2,39.97,76,2,2.0,4,1.0
22,0,45,76,0,0,395.10,11.4,44.78,101,5,4.0,1,1.0


*All the churn 0s & 1s are together, that is why we reset the index*

In [27]:
df2 = df2.sample(frac=1).reset_index(drop=True)

In [28]:
df2

Unnamed: 0,gender,age,no_of_days_subscribed,multi_screen,mail_subscribed,weekly_mins_watched,minimum_daily_mins,maximum_daily_mins,weekly_max_night_mins,videos_watched,maximum_days_inactive,customer_support_calls,churn
0,0,40,86,0,1,338.25,9.8,38.34,81,2,3.0,0,0.0
1,1,42,3,0,1,177.15,11.9,20.08,89,6,4.0,2,0.0
2,1,58,139,0,0,316.65,5.6,35.89,70,4,2.0,0,0.0
3,1,42,139,0,0,201.60,10.2,22.85,125,2,3.0,5,1.0
4,0,31,85,0,0,389.70,5.4,44.17,72,1,2.0,0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1795,0,39,46,0,0,386.10,13.4,43.76,107,5,4.0,2,1.0
1796,0,27,116,0,1,402.90,11.6,45.66,106,3,4.0,2,0.0
1797,1,70,119,0,0,122.85,8.9,13.92,125,1,3.0,2,1.0
1798,0,35,97,0,0,414.15,8.9,46.94,73,4,3.0,0,1.0


In [29]:
X.shape

(1800, 12)

In [30]:
n_samples, n_features = X.shape ##Storing the values in two parts as it will easy later to feed the model 

In [31]:
Y = df2["churn"]

In [32]:
import sklearn
from sklearn.model_selection import train_test_split

In [33]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.30, random_state=42, stratify = Y)

In [34]:
print((y_train == 1).sum())
print((y_train == 0).sum())

630
630


In [35]:
print(type(X_train))
print(type(X_test))
print(type(y_train.values))
print(type(y_test.values))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


*convert all of them to the tensor as PyTorch works on Tensor*

In [36]:
X_train = torch.from_numpy(X_train.astype(np.float32))
X_test = torch.from_numpy(X_test.astype(np.float32))

In [37]:
y_train = torch.from_numpy(y_train.values.astype(np.float32))
y_test = torch.from_numpy(y_test.values.astype(np.float32))

In [38]:
X_train[0]

tensor([-3.0888, -3.6601, -2.5167, -1.6932, -1.9478, -3.1301, -3.7982, -3.1020,
        -5.0388, -2.2131, -4.6618, -1.4946])

In [39]:
y_train[0]

tensor(1.)

In [40]:
y_train

tensor([1., 0., 1.,  ..., 1., 1., 0.])

In [41]:
y_train.shape

torch.Size([1260])

In [42]:
y_train.ndim

1

#### Making output vector Y as a column vector for matrix multiplications

In [43]:
y_train = y_train.view(y_train.shape[0], 1)
y_test = y_test.view(y_test.shape[0], 1)

### 4. Model Building: Creating Logistic Regression model in Pytorch
#### Logistic Regression is a Linear model so we will use Pytorch's [nn.linear] module which is used for performing linear operations  then we will pass the data to sigmoid function which separates a binary data in two parts using probability.

In [44]:
## Define the LogisticRegressionModel class inheriting from torch.nn.Module
class LogisticRegression(nn.Module):
    #Constructor method for the class. This method is called when an object(instance) of the class is created.
    def __init__(self, n_input_features):
        #Calls the constructor of the parent/base class (torch.nn.Module) to initialize the base class properly.
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(n_input_features,1) #output is single value
    #This method is called when you pass data through the model
    def forward(self, x):
        #Output is passed via sigmoid function to introduce non-linearity & squashes values between 0 and 1.
        #This characteristic makes it particularly useful when dealing with probabilities. 
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

In [45]:
lr = LogisticRegression(n_features)

In [46]:
lr.parameters

<bound method Module.parameters of LogisticRegression(
  (linear): Linear(in_features=12, out_features=1, bias=True)
)>

In [47]:
num_epochs = 500                                        
# Traning the model for large number of epochs to see better results  
learning_rate = 0.01                               
criterion = nn.BCELoss()                                
# We are working on lgistic regression so using Binary Cross Entropy
optimizer = torch.optim.Adam(lr.parameters(), lr=learning_rate)  
#optimizer = torch.optim.SGD(lr.parameters(), lr=learning_rate)      
# Using ADAM optimizer to find local minima   

*Using ADAM optimizer to find local minima. Parameters() tells optimizer which parameters of the model it should update during the training process & learning rate tells how much to change the model parameters with respect to the gradient of the loss function*

In [48]:
for epoch in range(num_epochs):
    y_pred = lr(X_train)
    loss = criterion(y_pred, y_train)    
    loss.backward()
    ## Compute gradients of the loss with respect to parameters.
    #The gradients tell us how much to adjust each parameter to reduce the loss by indicating the direction and magnitude of the parameter
    optimizer.step()
    #updates the model's parameters using the computed gradients. This is where the actual learning happens as the model adjusts its parameters to minimize the loss.
    optimizer.zero_grad()
    #Resets the gradients to zero for the next iteration. This is crucial because PyTorch accumulates gradients by default.
    if (epoch+1) % 30 == 0:                                       
        # printing loss values on every 30 epochs to keep track
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')

epoch: 30, loss = 0.6970
epoch: 60, loss = 0.6927
epoch: 90, loss = 0.6917
epoch: 120, loss = 0.6909
epoch: 150, loss = 0.6903
epoch: 180, loss = 0.6898
epoch: 210, loss = 0.6894
epoch: 240, loss = 0.6892
epoch: 270, loss = 0.6890
epoch: 300, loss = 0.6889
epoch: 330, loss = 0.6888
epoch: 360, loss = 0.6888
epoch: 390, loss = 0.6888
epoch: 420, loss = 0.6887
epoch: 450, loss = 0.6887
epoch: 480, loss = 0.6887


In [49]:
with torch.no_grad(): #during evaluation, we're not interested in calculating gradients for backpropagation. Disabling gradients improves memory efficiency and computation speed.
    y_predicted = lr(X_test)
    y_predicted_cls = y_predicted.round()
    #This compares the rounded predictions (y_predicted_cls) with the actual labels (y_test) element-wise using the eq function. The result is a tensor of True/False values indicating correct classifications.
    #Sums the True value & divides it by total rows total number of samples in the test set 
    acc = y_predicted_cls.eq(y_test).sum() / float(y_test.shape[0])
    print(f'accuracy: {acc.item():.4f}')

accuracy: 0.4833


In [50]:
#classification report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_predicted_cls))

              precision    recall  f1-score   support

         0.0       0.48      0.46      0.47       270
         1.0       0.48      0.51      0.50       270

    accuracy                           0.48       540
   macro avg       0.48      0.48      0.48       540
weighted avg       0.48      0.48      0.48       540



In [51]:
#confusion matrix
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test, y_predicted_cls)
print(confusion_matrix)

[[123 147]
 [132 138]]


In [52]:
#It can be further optimized by hyperparameter tuning. This notebook is intended for understanding linear Regression with Pytorch