# Objective: 
Implement a single neuron neural network (no hidden layers) aka the logistic regression unit carrying out Stochastic Gradient Descent for optimziation and using Log Loss function for calculating cost and sigmoid activation function.

-------------------------

### To perform everything asked in this, I have essentially divided this into 3 sections as follows: 

1. Implement single neuron neural network using SGD, sigmoid activation and log loss optimization
2. Replace sigmoid activation with ReLU activation
3. Add funcationality to perform L1 (Lasso) and/or L2 (Ridge) reguralization 

The common fucntions are defined initially. Then variation of few functions are defined in order to carry out the ReLU activation and then for L1 and L2 reguralization. In short, I have perform logistic regression for all variations and shown results. With every function I have also included a function description which describes what the function does.

The dataset is loaded and then manually split into train-test data. It is important to scale the data for any model training. 
#### I have also implemented standard scaling manually using function that I wrote instead of sci-kit learn StandardScaler as we are not allowed to use the library. 


## Dataset used: 
Airline customer satisfaction dataset. I found this [dataset on kaggle](https://www.kaggle.com/datasets/sjleshrac/airlines-customer-satisfaction). The dataset contains a number of columns like seat comfort, food on flight, delay time, in0flight entertainment, leg room, on-board service, online support, baggage handling, checkin service etc which relate to a customer's score of how they found the overall experience. The prediction variable for this dataset is the "satisfaction" label which is 1 if the customer was satisfied and 0 if the customer was not satsified. the dataset contains almost equal number of samples for both prediction classes. The total number of rows in this dataset is almost 129,880.

In [1]:
import pandas as pd
import numpy as np
import scipy
pd.options.mode.chained_assignment = None  # default='warn'
np.seterr(all="ignore")

{'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'}

Loading the training and testing data using the function I created. 

Note: I will suffle the dataset, this ensures that each time you run this code, it creates a random dataset for training and testing.
#### As a result each time you will need to tune your paramters (learning rate, iterations, lambda for regularization) for optimal solution. A set of these parameters that got a satisfactory test accuracy will not necessarily result is satisfactory results the next time. You will need to fine tune it yourself. 

In [2]:
def load_dataset():
    ''' This function loads the dataset and performs necessary pre-processing for the 
    logistic regression neural network  '''

    df = pd.read_csv('airline_data.csv')

    df = df.sample(frac=1).reset_index(drop=True)

    airline_train_data = df[:int(len(df)*80/100)]   #80% of the data as train data
    airline_test_data = df[int(len(df)*80/100):]    #20% of the data for testing

    X_train = airline_train_data.iloc[:,1:]
    X_test = airline_test_data.iloc[:,1:]
    y_train = airline_train_data.iloc[:,0].to_numpy()
    y_test = airline_test_data.iloc[:,0].to_numpy()

    for col in X_train:
        X_train[col] = (X_train[col] - np.average(X_train[col])) / (np.std(X_train[col]))   #standardizing train data

    for col in X_test:
        X_test[col] = (X_test[col] - np.average(X_test[col])) / (np.std(X_test[col]))   #standardizing the test data

    X_train = X_train.to_numpy()
    X_test = X_test.to_numpy()

    print("Data loaded successfully after pre-processing")
    return X_train, y_train, X_test, y_test

 # 1. Implement single neuron neural network using sigmoid activation

In [3]:
def initialize_parameters(size):
    ''' This function is used to initiziale the logisitc regression praremters W (weights) and b (bias) '''

    W = np.zeros_like(size)
    b = 0
    
    return W, b

In [4]:
def sigmoid(z):
    ''' This function computes and return the sigmoid activation. It always returns a value (probability) 
    between 0 to 1 which can be used to predict the class '''

    return 1/(1+np.exp(-z))
  
def logloss(y_true, y_pred):
    ''' This function computes the cost function for logisitc regression, here I am using the 
    log loss function formula aka the binary cross entropy loss '''

    loss_sum = 0
    
    for i in range(len(y_true)):
        loss_sum += (y_true[i] * np.log(y_pred[i])) + ((1 - y_true[i]) * np.log(1 - y_pred[i]))
    cost = loss_sum * (-1 / len(y_true))
    
    return cost

def gradients(x, y, W, b):
    ''' This function is used to compute gradient/partial derivative with respect to W (weight) and b (bias) '''
    
    y_pred = sigmoid(np.dot(W, x) + b)
    dW = (y_pred - y) * x
    db = y_pred - y
    
    return dW, db

In [5]:
def model_train(X_train, y_train, X_test, y_test, iterations, learning_rate):
    ''' This function implements logistic regression using stochastic gradient descent (SGD) where gradients 
    are updated using one sample at a time. 
    This is used to train the model and it returns the weights and bias along with the train and test 
    losses which are computed using log loss function '''

    train_loss = []
    test_loss = []
    W, b = initialize_parameters(X_train[0])
    
    for i in range(iterations):
        train_pred = []
        test_pred = []

        for j in range(len(X_train)):   #sending one sample at a time as I am implementing Stochastic Gradient Descent (SGD)
            dW, db = gradients(X_train[i], y_train[i], W, b)
            W = W - (learning_rate * dW)    #updating the weight and bias using the partial derivatives as this is SGD
            b = b - (learning_rate * db)
        
        for val in range(len(X_train)):
            train_pred.append(sigmoid(np.dot(W, X_train[val]) + b))   #compute the predictions for this iteration

        for val in range(len(X_test)):
            test_pred.append(sigmoid(np.dot(W, X_test[val]) + b))   #compute the test predictions for this iteration
            
        loss_train = logloss(y_train, train_pred)   #calculate the training loss using the W and b values computed for this iteration
        loss_test = logloss(y_test, test_pred)    #calculate the test loss using the W and b values computed for this iteration
        train_loss.append(loss_train)
        test_loss.append(loss_test)
        
    return W, b, train_loss, test_loss     

In [6]:
def model_predict(W, b, X):
    ''' This function outputs the predicted class for each sample of X using the weight and bais of trained model '''

    y_predict = []

    for i in range(len(X)):
        z = np.dot(W, X[i]) + b
        y_prob = sigmoid(z)   #sigmoid is used for prediction in the output layer since it returns a probability between 0 to 1 so we can define a good threshold of 0.5
        if y_prob >= 0.5:
            y_predict.append(1)
        else:
            y_predict.append(0)

    return np.array(y_predict)

In [7]:
def model_evaluate(y_true, y_pred):
    ''' This function is used to cacluate the accuracy anf F1 score of the model on the predicted set by 
    comparing it with the actual labels '''

    TP = 0    #true positives
    TN = 0    #true negatives
    FP = 0    #false positives
    FN = 0    #false negatives
    precision = 0
    recall = 0
    F1_score = 0
    accuracy = 0

    for i in range(len(y_true)):
        if y_true[i] == 1 and y_true[i] == y_pred[i]:
          TP += 1
        if y_true[i] == 0 and y_true[i] == y_pred[i]:
          TN +=1
        if y_true[i] == 1 and y_true[i] != y_pred[i]:
          FN += 1
        if y_true[i] == 0 and y_true[i] != y_pred[i]:
          FP += 1

    precision = TP / (TP + FP)
    recall = TP / (TP + FN)
    accuracy = (TP + TN) / (TP + TN + FP + FN)
    F1_score = 2 * (precision * recall) / (precision + recall)

    return accuracy, F1_score

### All the functions are now defined.

Lets load the data into training and test sets

In [8]:
X_train, y_train, X_test, y_test = load_dataset()

print("X_train shape: ", X_train.shape)
print("y_train shape: ", y_train.shape)
print("X_test shape: ", X_test.shape)
print("y_test shape: ", y_test.shape)

Data loaded successfully after pre-processing
X_train shape:  (103904, 19)
y_train shape:  (103904,)
X_test shape:  (25976, 19)
y_test shape:  (25976,)


### Next, I need to train the model. In order to train a single neuron neural network we also require the learning rate and number of iterations.

I am using a lower number for iterations since I have implemented SGD in this which already updates the weights and bias terms for each encountered sample, hence finding the gradient minima faster.

<i>Note: the following cell takes sometime to finish running</i>

In [9]:
learning_rate = 0.001
iterations = 20  

W, b, train_loss, test_loss = model_train(X_train, y_train, X_test, y_test, iterations, learning_rate)

#print('Train loss: ', train_loss)
print('Test loss: ', test_loss)

Test loss:  [0.8337196979650416, 1.3657247072545387, 1.1082005793638963, 1.159935103862531, 1.0017971665083492, 1.0151640336266534, 1.0260680329955867, 1.0455187732467373, 1.0574543574867152, 1.057663258456708, 1.1947403883912504, 1.179149693761852, 1.096188435174614, 1.0972451665123488, 1.2458559563752685, 1.2296034663138293, 1.1479517616219845, 1.2434388300627637, 1.2439570432611002, 1.2253985663188705]


The model is trained and we have our optimized parameters: Weights and Bais. I will use these now to predict results for the train and test set. The test set accuracy is provided. 

In [10]:
train_accuracy, train_f1_score = model_evaluate(y_train, model_predict(W, b, X_train))
test_accuracy, test_f1_score = model_evaluate(y_test, model_predict(W, b, X_test))

#print('Train Accuracy: ', train_accuracy)
#print('Train F1 score: ', train_f1_score)
print('Test Accuracy: ', test_accuracy)
print('Test F1 score: ', test_f1_score)

Test Accuracy:  0.7507314444102248
Test F1 score:  0.7635566916194997


### Model accuracy for single neuron neural network using sigmoid activation: 75.0%
### Model F1 score for single neuron neural network using sigmoid activation: 0.763

# 2. Change the model to now use ReLU activation

This is the 2nd part of the assignment where I now use ReLU activation to train the model and get optimzied weights and bias.

Note on what I did and why: 

ReLU function returns a value that is between 0 to infinity. Therefore, the result we get from ReLU activation can not directly be used to do predictions. In neural networks, the output layer always has a sigmoid activation function. The hidden layers usually have ReLU activation but at the end the last layer which is the output layer, the sigmoid activation is applied. This is because sigmoid returns a number between 0 to 1 which can easily be mapped to the probaility and then the 0.5 threshold can be used to predict the final output class as 0 or 1. Therefore, even in my code, I use the ReLU activation to find the optimized W and b parameters. This is where ReLU comes in. But if you see, in the end when I call the predict function, I call the function that I defined above which helps find class probabilities after apply sigmoid.

Essentially ReLU is used here to find the model parameters while training the model. 

I define the ReLU activation function, a new gradient function ReLU and finally a new training fucntion to train parameters using ReLU activation

In [11]:
def ReLU(x):
    ''' This function computes and return the ReLU activation. It returns a value between 0 to infinity '''

    return x * (x > 0) 

def gradients_with_ReLU(x, y, W, b):
    ''' This function is used to compute gradient/partial derivative with respect to W (weight) and b (bias) 
    using ReLU as the activation function'''
    
    y_pred = ReLU(np.dot(W, x) + b)   #apply ReLU activation instead of sigmoid
    dW = (y_pred - y) * x
    db = y_pred - y

    return dW, db

def model_train_with_ReLU(X_train, y_train, X_test, y_test, iterations, learning_rate):
    ''' This function implements logistic regression using stochastic gradient descent (SGD) where gradients 
    are updated using one sample at a time using ReLU activation function. 
    This is used to train the model and it returns the weights and bias along with the train and test 
    losses which are computed using log loss function '''

    train_loss = []
    test_loss = []
    W, b = initialize_parameters(X_train[0])

    for i in range(iterations):
        train_pred = []
        test_pred = []

        for j in range(len(X_train)):
            dW, db = gradients_with_ReLU(X_train[i], y_train[i], W, b)
            W = W - (learning_rate * dW)
            b = b - (learning_rate * db)

        for val in range(len(X_train)):
            train_pred.append(sigmoid(ReLU(np.dot(W, X_train[val]) + b)))   #loss function calcualtes loss between actual y values and predicted y values
                                                                            #Predicted y values have to be in range 0 to 1 hence you need to apply sigmoid function as well
        
        for val in range(len(X_test)):
            test_pred.append(sigmoid(ReLU(np.dot(W, X_test[val]) + b)))   #loss function calcualtes loss between actual y values and predicted y values
                                                                          #Predicted y values have to be in range 0 to 1 hence you need to apply sigmoid function as well

        loss_train = logloss(y_train, train_pred)    
        loss_test = logloss(y_test, test_pred)
        train_loss.append(loss_train)
        test_loss.append(loss_test)
        
    return W, b, train_loss, test_loss     

Now, let's train the model using this ReLU activation to get optimized weights and bais features

<i>Note: the following cell takes sometime to finish running</i>

In [12]:
learning_rate = 0.001
iterations = 20

W, b, train_loss, test_loss = model_train_with_ReLU(X_train, y_train, X_test, y_test, iterations, learning_rate)    #find the trained model parameters using ReLU activation

#print('Train loss: ', train_loss)
print('Test loss: ', test_loss)

Test loss:  [0.6931471805599296, 0.6931471805599296, 0.6931471805599296, 0.6931471805599296, 0.6401806747036672, 0.6406210009887396, 0.6411189193505127, 0.6334734620331189, 0.6334734620331189, 0.6334734620331189, 0.6482120943920854, 0.6232164146866622, 0.6027269208211457, 0.6059710713296159, 0.6036668808498614, 0.6216534445138148, 0.6185153394221701, 0.5999739226058347, 0.604246879917167, 0.5913792277656894]


Find predictions and then the test accuracy using these optimized W and b parameters that we got from just above <b><i>using the ReLU activation</i></b>

In [13]:
train_accuracy, train_f1_score = model_evaluate(y_train, model_predict(W, b, X_train))
test_accuracy, test_f1_score = model_evaluate(y_test, model_predict(W, b, X_test))

#print('Train Accuracy: ', train_accuracy)
#print('Train F1 score: ', train_f1_score)
print('Test Accuracy: ', test_accuracy)
print('Test F1 score: ', test_f1_score)

Test Accuracy:  0.7112334462580844
Test F1 score:  0.760251861795634


### Model accuracy for single neuron neural network using ReLU activation: 71.1%
### Model F1 score for single neuron neural network using ReLU activation: 0.760

# 3. Add funcationality to perform L1 (Lasso) and/or L2 (Ridge) reguralization

I define new log loss function which also includes calculates the regularization cost. A new model training function is also defined which trains the parameters using regularization. Both these functions perform different computations depending on if you choose to apply L1 or L2 regularization. The fucntions that I have defined takes care of each case. The regularization you wish to perform is given as a parameter to the functions. 

What changes when regulairzation is applied is that the cost function now also includes an associated regularization cost whose formula given the log loss/binary cross entropy function is given in the function defined below. Secondly, while training the model for optimized parameters, the W and b term updation now also includes a small factor that is associated with the regulairzation we perform. 

In [14]:
def logloss_with_regularization(y_true, y_pred, my_lambda, W, regularization):
    ''' This function computes the cost function for our logistic regression model when there is 
    regularization performed '''


    loss_sum = 0
    for i in range(len(y_true)):
        loss_sum += (y_true[i] * np.log(y_pred[i])) + ((1 - y_true[i]) * np.log(1 - y_pred[i]))
    cost = loss_sum * (-1 / len(y_true))

    if (regularization == 'L1' or regularization == 'Lasso'):
        regularization_cost = my_lambda * (np.sum(np.absolute(W))) / (2 * len(y_true))    #adding the regularization term's cost also for L1(Lasso) regularization
    elif (regularization== 'L2' or regularization == 'Ridge'):
        regularization_cost = my_lambda * (np.sum(np.square(W))) / (2 * len(y_true))     #adding the regularization term's cost also for L2(Ridge) regularization

    return cost + regularization_cost   #returned cost will also have associated regularization cost

def model_train_with_regularization(X_train, y_train, X_test, y_test, iterations, learning_rate, my_lambda, regularization):
    ''' This function implements logistic regression with L1 or L2 regularization using stochastic gradient descent (SGD) 
    where gradients are updated using one sample at a time. 
    This is used to train the model and it returns the weights and bias along with the train and test 
    losses which are computed using log loss function '''


    train_loss = []
    test_loss = []
    W, b = initialize_parameters(X_train[0])

    for i in range(iterations):
        train_pred = []
        test_pred = []

        for j in range(len(X_train)):
            dW, db = gradients(X_train[i], y_train[i], W, b)

            if (regularization == 'L1' or regularization == 'Lasso'):
                W = W - learning_rate * (dW - (2 * my_lambda/len(X_train)))   #update weight procedure when L1(Lasso) regularization is done
            
            elif (regularization== 'L2' or regularization == 'Ridge'):
                W = W - learning_rate * (dW - (my_lambda * W/len(X_train)))   #update weight procedure when L2(Ridge) regularization is done

            b = b - (learning_rate * db)    #no effect on bias term in regularization

        for val in range(len(X_train)):
            train_pred.append(sigmoid(np.dot(W, X_train[val]) + b))
            
        for val in range(len(X_test)):
            test_pred.append(sigmoid(np.dot(W, X_test[val]) + b))

        loss_train = logloss_with_regularization(y_train, train_pred, my_lambda, W, regularization)
        loss_test = logloss_with_regularization(y_test, test_pred, my_lambda, W, regularization)
        train_loss.append(loss_train) 
        test_loss.append(loss_test)
        
    return W, b, train_loss, test_loss     

Now, let's train the model applying L2 (Ridge) regularization first

<i>Note: the following cell takes sometime to finish running</i>

In [35]:
learning_rate = 0.001
iterations = 20
my_lambda = 0.1   #for L2 regularization (Ridge regularization)

W, b, train_loss, test_loss = model_train_with_regularization(X_train, y_train, X_test, y_test, iterations, learning_rate, my_lambda, regularization = 'L2')

#print('Train loss: ', train_loss)
print('Test loss: ', test_loss)

Test loss:  [0.8337395733835279, 1.3657914355288743, 1.1083160846423246, 1.1600879061259781, 1.0019792922821218, 1.0153917621078126, 1.0263417894963283, 1.0458206872097897, 1.0578124512856142, 1.0580867698416192, 1.1951399672364162, 1.1797229947167098, 1.0967365282720478, 1.097887778127946, 1.2465792919400185, 1.2303392220459153, 1.148773868115079, 1.244419014921392, 1.2450445290508907, 1.2265475560092238]


Find predictions and then the test accuracy using these optimized W and b parameters that we got from just above <b><i>using L2 (Ridge) regularization</i></b>

In [36]:
train_accuracy, train_f1_score = model_evaluate(y_train, model_predict(W, b, X_train))
test_accuracy, test_f1_score = model_evaluate(y_test, model_predict(W, b, X_test))

#print('Train Accuracy: ', train_accuracy)
#print('Train F1 score: ', train_f1_score)
print('Test Accuracy: ', test_accuracy)
print('Test F1 score: ', test_f1_score)

Test Accuracy:  0.7506544502617801
Test F1 score:  0.7634836589373744


### Model accuracy for single neuron neural network using L2 (Ridge) regularization: 75.06%
### Model F1 score for single neuron neural network using L2 (Ridge) regularization: 0.763

Lastly, let's train the model applying L1 (Lasso) regularization first

<i>Note: the following cell takes sometime to finish running</i>

In [31]:
learning_rate = 0.001
iterations = 40
my_lambda = 0.01   #for L1 regularization (Lasso regularization)

W, b, train_loss, test_loss = model_train_with_regularization(X_train, y_train, X_test, y_test, iterations, learning_rate, my_lambda, regularization = 'L1')
#print('Train loss: ', train_loss)
print('Test loss: ', test_loss)

Test loss:  [0.83368822710536, 1.365646147525089, 1.1081395097926765, 1.159876096812088, 1.001816291816018, 1.015192303842172, 1.0260966368206896, 1.0455564591285935, 1.05749861770406, 1.0577157640087238, 1.194766964797059, 1.1791668927584973, 1.096153380261608, 1.0972026690430914, 1.2459071503553647, 1.2296224175706754, 1.1479366405055822, 1.2434642164113552, 1.2439904997397462, 1.2254455107439555, 1.1790740605015357, 1.1790804546686298, 1.215314625518764, 1.2528544534841093, 1.440054722219522, 1.2790947415440572, 1.313257369736387, 1.3834259258853452, 1.4231699874169212, 1.838530945490042, 1.84158757115995, 1.786001130506328, 1.8114172654818066, 1.654730249990512, 1.3572904805616794, 1.3565885866906027, 1.3349433249154, 1.334933380632006, 1.171928941081578, 1.377769952438555]


Find predictions and then the test accuracy using these optimized W and b parameters that we got from just above <b><i>using L1 (Lasso) regularization</i></b>

In [32]:
train_accuracy, train_f1_score = model_evaluate(y_train, model_predict(W, b, X_train))
test_accuracy, test_f1_score = model_evaluate(y_test, model_predict(W, b, X_test))

#print('Train Accuracy: ', train_accuracy)
#print('Train F1 score: ', train_f1_score)
print('Test Accuracy: ', test_accuracy)
print('Test F1 score: ', test_f1_score)

Test Accuracy:  0.7067677856482907
Test F1 score:  0.7263517154661397


### Model accuracy for single neuron neural network using L1 (Lasso) regularization: 70.06%
### Model F1 score for single neuron neural network using L1 (Lasso) regularization: 0.726

As noticed, regularization does not always guarantee you a better testing accuracy. It depends on a number of things, including size of data as well as how you tune your lamdba (regularization parameter) along with learning rate and iterations. A fine tuning of these can be done, however that is out of scope of this assignment as we had to show only the process to how to perform regularization with our existing single neuron neural network code. 

----------------------------------------

## As seen, I have shown steps to implement the single neuron network, then to update this to use ReLU activation and then finally updated this to be able to also perform regularization. 