##  **Design and Analysis of Algorithms - CIA 2**

### **Question**: 
Design a neural network (the choice of implementation model can be pytorch, tensorflow or the whitebox model) for the data set shared in the ML lab assignment for neural networks. 

* Develop individual code base using following algorithms for weight optimization:
1.	Genetic Algorithm
2.	Cultural Algorithm
3.	Particle Swarm Optimization
4.	Ant Colony Optimization
* 
Data to be uploaded to github
1.	Note on the comparison of performance for the four methods. 
2.	The codebase for all four methods 
3.	The research papers that you have referred to.



### Dependencies and Dataset

In [192]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import random
from sklearn.metrics import classification_report

In [193]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import Accuracy
from tensorflow.keras.utils import to_categorical

In [194]:
from sklearn import metrics
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler

In [195]:
import os
import torch
from torch import nn
from torch import optim
from torch.utils.data import Dataset, DataLoader, TensorDataset
from torchvision import datasets, transforms

In [196]:
data = pd.read_csv(r"/content/drive/MyDrive/Colab Notebooks/Bank_Personal_Loan_Modelling.csv")

Columns of the dataset : 
* ID: Customer ID
* Age: Customer Age
* Experience: Amount of work experience in years
* Income: Amount of annual income (in thousands)
* Zipcode: Zipcode of where customer lives
* Family: Number of family members
* CCAvg: Average monthly credit card spendings
* Education: Education level (1: Bachelor, 2: Master, 3: Advanced Degree)
* Mortgage: Mortgage of house (in thousands)
* Securities Account: Boolean of whether customer has a securities account
* CD Account: Boolean of whether customer has Certificate of Deposit account
* Online: Boolean of whether customer uses online banking
* CreditCard: Does the customer use credit card issued by the bank?
* Personal Loan: This is the target variable (Binary Classification Problem)

In [197]:
# We can drop the column Customer ID as they do not help us in the prediction.
df = data.drop(columns=["ID"],axis=1)
df

Unnamed: 0,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Securities Account,CD Account,Online,CreditCard,Personal Loan
0,25,1,49,91107,4,1.6,1,0,1,0,0,0,0
1,45,19,34,90089,3,1.5,1,0,1,0,0,0,0
2,39,15,11,94720,1,1.0,1,0,0,0,0,0,0
3,35,9,100,94112,1,2.7,2,0,0,0,0,0,0
4,35,8,45,91330,4,1.0,2,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,29,3,40,92697,1,1.9,3,0,0,0,1,0,0
4996,30,4,15,92037,4,0.4,1,85,0,0,1,0,0
4997,63,39,24,93023,2,0.3,3,0,0,0,0,0,0
4998,65,40,49,90034,3,0.5,2,0,0,0,1,0,0


## Exploratory Data Analysis

In [198]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Age                 5000 non-null   int64  
 1   Experience          5000 non-null   int64  
 2   Income              5000 non-null   int64  
 3   ZIP Code            5000 non-null   int64  
 4   Family              5000 non-null   int64  
 5   CCAvg               5000 non-null   float64
 6   Education           5000 non-null   int64  
 7   Mortgage            5000 non-null   int64  
 8   Securities Account  5000 non-null   int64  
 9   CD Account          5000 non-null   int64  
 10  Online              5000 non-null   int64  
 11  CreditCard          5000 non-null   int64  
 12  Personal Loan       5000 non-null   int64  
dtypes: float64(1), int64(12)
memory usage: 507.9 KB


In [199]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,5000.0,45.3384,11.463166,23.0,35.0,45.0,55.0,67.0
Experience,5000.0,20.1046,11.467954,-3.0,10.0,20.0,30.0,43.0
Income,5000.0,73.7742,46.033729,8.0,39.0,64.0,98.0,224.0
ZIP Code,5000.0,93152.503,2121.852197,9307.0,91911.0,93437.0,94608.0,96651.0
Family,5000.0,2.3964,1.147663,1.0,1.0,2.0,3.0,4.0
CCAvg,5000.0,1.937938,1.747659,0.0,0.7,1.5,2.5,10.0
Education,5000.0,1.881,0.839869,1.0,1.0,2.0,3.0,3.0
Mortgage,5000.0,56.4988,101.713802,0.0,0.0,0.0,101.0,635.0
Securities Account,5000.0,0.1044,0.305809,0.0,0.0,0.0,0.0,1.0
CD Account,5000.0,0.0604,0.23825,0.0,0.0,0.0,0.0,1.0


In [200]:
df.isna().any()

Age                   False
Experience            False
Income                False
ZIP Code              False
Family                False
CCAvg                 False
Education             False
Mortgage              False
Securities Account    False
CD Account            False
Online                False
CreditCard            False
Personal Loan         False
dtype: bool

## Train Test Split

In [201]:
df.shape

(5000, 13)

In [202]:
x = df.iloc[:,:-1].values
y = df.iloc[:,-1].values

In [203]:
x_train,x_test,y_train,y_test = train_test_split(x, y, test_size=0.25, random_state=69)

In [204]:
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

In [205]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((3750, 12), (1250, 12), (3750,), (1250,))

# **# PyTorch Neural Network**

In [206]:
batch_size = 64

In [207]:
train_x = torch.from_numpy(x_train).to(torch.float32)
train_y = torch.from_numpy(y_train).to(torch.float32)

In [208]:
test_x = torch.from_numpy(x_test).to(torch.float32)
test_y = torch.from_numpy(y_test).to(torch.float32)

In [209]:
class Data(Dataset):
    def __init__(self, x, y):
        self.x = torch.from_numpy(x.astype(np.float32))
        self.y = torch.from_numpy(y.astype(np.float32))
        self.len = self.x.shape[0]
       
    def __getitem__(self, index):
        return self.x[index], self.y[index]
   
    def __len__(self):
        return self.len

In [210]:
train_x.shape, train_y.shape

(torch.Size([3750, 12]), torch.Size([3750]))

In [211]:
train_data = TensorDataset(train_x,train_y)
train_dataloader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)

In [212]:
test_data = TensorDataset(test_x,test_y)
test_dataloader = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=True)

# **# Building Model**

In [213]:
class NeuralNetwork(torch.nn.Module):
    
    def __init__(self):
        super(NeuralNetwork,self).__init__()
        
        self.layer1 = torch.nn.Linear(12,16)
        self.layer2 = torch.nn.Linear(16,8)
        self.layer3 = torch.nn.Linear(8,1)
        self.sigmoid = torch.nn.Sigmoid()
        self.relu = torch.nn.ReLU()
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        x = self.relu(x)
        x = self.layer3(x)
        x = self.sigmoid(x)
        return x

In [214]:
neural_network = NeuralNetwork()

# **Weight Optimization using Ant Colony Optimization**

In [215]:
model = NeuralNetwork()
torch.set_grad_enabled(False)
param = np.concatenate([i.numpy().flatten() for i in model.parameters()])
shape = [i.numpy().shape for i in model.parameters()]
size = [i[0]*i[1] if len(i) == 2 else i[0] for i in shape]
dim = len(param)

print("Dim : ", len(param))
print("Layers Shape : ", shape)
print("Layers Size : ", size)

Dim :  353
Layers Shape :  [(16, 12), (16,), (8, 16), (8,), (1, 8), (1,)]
Layers Size :  [192, 16, 128, 8, 8, 1]


In [216]:
def calculate_accuracy(model):
    y_pred = model(train_x)
    y_pred = torch.where(y_pred>=0.5, 1, 0).flatten()
    accuracy = (y_pred == train_y).sum().float().item() / len(train_x)
    return accuracy

In [217]:
def set_params_particle(vector):
  param = list()
  cum_sum = 0
  for i in range(len(size)):
    array = vector[cum_sum : cum_sum + size[i]]
    array = array.reshape(shape[i])
    cum_sum += size[i]
    param.append(array)
  param = np.array(param, dtype="object")
    
  model = NeuralNetwork()
  for idx, wei in enumerate(model.parameters()):
    wei.data = (torch.tensor(param[idx])).to(torch.float32)
    
  return model

In [218]:
def get_model_params_vector(model):
  vector = np.concatenate([i.numpy().flatten() for i in model.parameters()])
  return vector

In [219]:
ants = 10
loops = 100
evaporation_rate = 0.2
influence_factor = 0.4

In [220]:
pheromones = np.ones(dim)
max_accuracy = 0
fittest_vector = None

for loop in range(loops):
    # Generate Solution
    paths = np.array([NeuralNetwork() for i in range(ants)])
    accuracy = []
    
    for ant in range(ants):
        # Flatten the weights and biases
        vector = get_model_params_vector(paths[ant])
        
        # Multiply with pheromones 
        vector = vector * pheromones
        
        # Calculate Accuracy and Append to the list
        model = set_params_particle(vector)
        acc = calculate_accuracy(model)
        accuracy.append(acc)
        
        # Update the updated path
        paths[ant] = model
        
        # Reset
        model = None
        acc = None
        
    # Select fittest path and accuracy
    paths = paths[np.argsort(accuracy)]
    
    if accuracy[np.argmax(accuracy)] > max_accuracy:
        max_accuracy = accuracy[np.argmax(accuracy)]
        fittest_vector = get_model_params_vector(paths[-1])
    
    # Update pheromones
    alpha = 0
    for ant in range(ants):
        # Flatten the weights and biases
        vector = get_model_params_vector(paths[ant])
        
        # Calculate alpha
        alpha += (vector - fittest_vector)*influence_factor
        
    pheromones = (1-pheromones)*evaporation_rate + alpha
    
    if loop%10 == 0:
        print("Iters {} :".format(loop), calculate_accuracy(paths[-1]))

Iters 0 : 0.9045333333333333
Iters 10 : 0.9026666666666666
Iters 20 : 0.9026666666666666
Iters 30 : 0.9026666666666666
Iters 40 : 0.9026666666666666
Iters 50 : 0.9026666666666666
Iters 60 : 0.9026666666666666
Iters 70 : 0.9026666666666666
Iters 80 : 0.9026666666666666
Iters 90 : 0.9026666666666666


In [222]:
print("Maximum Accuracy : ", max_accuracy)
best_model = set_params_particle(fittest_vector)

Maximum Accuracy :  0.9050666666666667


### Classification Report

In [223]:
y_pred = best_model(test_x)
y_pred = torch.where(y_pred>=0.5, 1, 0).flatten()
print(classification_report(y_pred,test_y))

              precision    recall  f1-score   support

           0       1.00      0.91      0.95      1242
           1       0.04      0.62      0.08         8

    accuracy                           0.91      1250
   macro avg       0.52      0.77      0.52      1250
weighted avg       0.99      0.91      0.95      1250

