# Absenteeism - Logistic Regression

## Import necessary libraries

In [1]:
import pandas as pd
import numpy as np
from numpy import log
import matplotlib.pyplot as plt
import random

## Prepare data

#### Import data from CSV file

In [2]:
# Read data from csv
data_df = pd.read_csv('Absenteeism/Absenteeism_at_work_editted_continous_features_target_not_normalised.csv')

print("Total:",len(data_df),"rows.")
data_df.head()

Total: 740 rows.


Unnamed: 0,ID,Reason for absence,Month of absence,Day of the week,Seasons,Transportation expense,Distance from Residence to Work,Age,Work load Average/day,Hit target,Disciplinary failure,Education,Absenteeism time in hours
0,0.692552,0.724379,0.31236,0.337461,0.509511,0.463951,0.610497,0.647262,0.653058,0.690259,0.360488,0.253737,1
1,0.124304,0.088435,0.378506,0.9545,0.677741,0.127401,0.945206,0.87213,0.509911,0.354537,0.999076,0.43208,0
2,0.369064,0.33782,0.083777,0.678456,0.703634,0.173419,0.744381,0.702182,0.920467,0.876997,0.690698,0.470167,1
3,0.73927,0.646078,0.617184,0.416416,0.523913,0.448263,0.562862,0.681659,0.352872,0.360298,0.462425,0.249254,1
4,0.697562,0.725001,0.330455,0.338349,0.561429,0.457796,0.604668,0.650673,0.657022,0.678083,0.361873,0.250986,1


#### Randomly split data into train and test set

We shuffle the dataframe and select 80% of the dataset for the training set and the remaining 20% for the test set.

In [3]:
# Shuffle dataframe
data_df = data_df.sample(frac=1)

# Split data into train and test set
train_length = int(np.round(len(data_df) * 0.8))  # Train set: 80% of data
test_length = len(data_df) - train_length         # Test set: remaining 20%
train_df = data_df.head(train_length)
test_df = data_df.tail(test_length)

# Features and target (last column) of training set
X_train = train_df.iloc[:,:-1].to_numpy()
y_train = train_df.iloc[:,-1].to_numpy()

# Features and target (last column) of test set
X_test = test_df.iloc[:,:-1].to_numpy()
y_test = test_df.iloc[:,-1].to_numpy()

## Logistic Regression

The idea to build a Logistic Regressor model is that we need to determine the weights of features to predict an output.

Then, in order to obtain a probability (a number between 0 and 1), we use the sigmoid function. With *w* being each feature's weights and *x* beings the features, our prediction *ŷ* will then be:

$$ŷ = \frac{1}{1+e^{-w^Tx}}$$

Of course, this implies we need to find the best weight associated with each feature. To do so, we use gradient descent. At each iteration, we compute the cost with given weights (initialised as a an array of zeros) and iterate to find the weights for which the cost is minimal.

The cost function is the following:

$$cost = -ylog(ŷ) - ((1 - y)log(1-ŷ))$$

The cost for all the instances in one training iteration can be computed as the mean of each instance's cost.

The gradient descent will enable to minimise this value. At each iteration, we update the weights as by subtracting the gradient, which is the derivative of the cost function with respect to the weights:

$$\frac{1}{m}X^T(ŷ - y)$$

Then, once the model is trained (i.e. the best weights have been found), we compute the prediction by applying the sigmoid function to wᵀx.

## Build the model

We define a class, which we initialise with:
- learning rate (default value: 0.01)
- maximum number of iterations (default value: 500)
- boolean to add intercept (True by default)

This last parameter enables to add a bias *w₀* to the weights. For example, with *n* being the number of features:

$$w^Tx = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n$$

The first parameter, the learning rate, can control the size of the steps in the gradient descent. The higher the learning rate, the bigger the step, which means it will be faster but at the risk of missing the lowest cost value. The lower the learning rate, the slower the gradient descent.

This class also contains the sigmoid function and the cost function mentioned earlier, as well as a `fit()` function to train the model, like we previously explained, and `predict()` to make a prediction for a given input.

In [4]:
# Logistic Regressor Model
class LogisticRegressor:
    
    # Initialise hyperparameters
    def __init__(self, lr=0.01, max_iter=500, add_intercept=True):
        self.learning_rate = lr
        self.max_iter = max_iter
        self.add_intercept = add_intercept
    
    # Sigmoid function
    def __sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    # Cost function for gradient descent
    def __cost_function(self, y, y_pred):
        return (-y*log(y_pred) - ((1-y)*log(1-y_pred))).mean()
    
    # Train the model
    def fit(self, X, y):
        
        if self.add_intercept:
            intercept = np.ones((X.shape[0], 1))
            X = np.hstack((intercept, X))
        
        # Initialise weights
        self.w = np.zeros(X.shape[1])
        
        # Keep a history of cost values
        self.cost_history = []
        
        # 1/m (m being the number of intances)
        OneOverM = 1 / X.shape[1]
        
        # Iterate
        for i in range (self.max_iter):
            
            # prediction
            y_pred = self.__sigmoid(np.dot(X, self.w.T))
            
            # cost
            cost = self.__cost_function(y, y_pred)
            self.cost_history.append(cost)
            
            # gradient vector
            gradient = np.dot(X.T, (y_pred - y)) * OneOverM
            self.w -= gradient * self.learning_rate
            
    # Predict an output
    def predict(self, X):
        if self.add_intercept:
            intercept = np.ones((X.shape[0], 1))
            X = np.hstack((intercept, X))
        y_pred = self.__sigmoid(np.dot(X, self.w.T))
        return y_pred

## One vs All

Since this is a multiclass problem (there are three possible absenteeism categories: 0, 1 and 2), we need to build our Logistic Regression model using One vs. Rest (also called One vs. All). The idea is building a regressor for each class, which determines if the instance belongs to this one class or to the rest.

Then, we build a model which combines these regressors and for each instance, chooses the class for which the probability is the highest.

#### Prepare data

Firstly, since we will have three models (one for each possible class), we need three target sets.
In a "one vs. rest" perspective, for every output, if it is the "one" class, we set it to 1, if it is the "rest", we set it to 0.

In [5]:
y_1vsall = []

nb_classes = 3

# For each possible class
for c in range (nb_classes):
    y_one = np.where(y_train == c, 1, 0)
    y_1vsall.append(y_one)

#### Build the model

Then, we make a class for our One vs All model. It has the same parameters as the individual regressors. It has a `fit()` and a `predict()` function as well. The model is trained by training an individual regressor for each class. To predict an output, each model will have to compute the probability that the instance belongs to the "one" class (versus the "rest").

In [6]:
# One Vs All Model
class LogisticRegressorOneVsAll:
    
    # Initialise hyperparameters
    def __init__(self, lr=0.01, max_iter=500, add_intercept=True):
        self.learning_rate = lr
        self.max_iter = max_iter
        self.add_intercept = add_intercept
        
    # Train the model
    def fit(self, X, y):
        self.regressors = []

        # For each "one vs. all" target set
        for y_one in y:
            # Build a model with target set
            model = LogisticRegressor(lr=self.learning_rate,
                                      max_iter=self.max_iter,
                                      add_intercept=self.add_intercept)
            # Train a model and add it to the list of regressors
            model.fit(X, y_one)
            self.regressors.append(model)
        
    # Predict an output
    def predict(self, X):
        final_pred = []
        y_pred_1vsall = []

        # For each regressor
        for model in self.regressors:
            y_pred = model.predict(X_test)
            y_pred_1vsall.append(y_pred)
            
        # For each instance
        for i in range(len(X)):
            best_pred = 0
            best_target = -1
            # Find the best prediction (model with highest probability)
            for j in range(len(self.regressors)):
                if y_pred_1vsall[j][i] > best_pred:
                    best_pred = y_pred_1vsall[j][i]
                    best_target = j
            # Add best prediction to the final result
            final_pred.append(best_target)

        # Return final prediction
        final_pred = np.asarray(final_pred)
        return final_pred

#### Train the model

In [7]:
nb_iterations = 100000

# One vs. All regressor
model_onevsall = LogisticRegressorOneVsAll(max_iter=nb_iterations)
model_onevsall.fit(X_train, y_1vsall)

#### Test the model

In [8]:
y_pred = model_onevsall.predict(X_test)
print("Category for each instance:")
print(y_test)
print("\nPrediction:")
print(y_pred)

Category for each instance:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 0 1 1 1 1 1 1 1 1 2 1 1 1 1 1
 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
 1 1 1 2 1 1 1 2 0 1 1 1 1 1 0 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1]

Prediction:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]


In [9]:
# Compute the Accuracy
accuracy = np.mean(y_pred == y_test)
print("Accuracy:", round(accuracy,4))

Accuracy: 0.9459
