## Building a Rectified Learner Unit (RELU) to be used on Titanic dataset.

After following Fastai course and on module 3, I want to use a jupyter notebook to build a Rectified Learner Unit which is a kind of basic building block for that is used in neural nets that is used and employed in deep learning to make machine learning algorithms.

The goal of this notebook is to explain and build the RELU while using the titanic training dataset obtained from kaggle as a test for this framework. This is built on some libraries like numpy, pytorch and some other frameworks used along the line and will be referenced as appropriate. Interesting to note here is that most of the libraries needed for this to function have all been imported from the one line `from fastai.basics import *` below which is as seen in the cell block below.

RELUs are simple linear equation algorithms that uses Gradient Descent for optimizations. And that is what we are going to be doing exactly.

In [1]:
#import modules
from fastai.basics import *
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from category_encoders import OneHotEncoder
from sklearn.pipeline import make_pipeline

In [2]:
#import our dataset
df = pd.read_csv("train.csv")
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [3]:
#perform some necessary cleaning, transformation and feature engineering
df.isnull().sum()/len(df)

PassengerId    0.000000
Survived       0.000000
Pclass         0.000000
Name           0.000000
Sex            0.000000
Age            0.198653
SibSp          0.000000
Parch          0.000000
Ticket         0.000000
Fare           0.000000
Cabin          0.771044
Embarked       0.002245
dtype: float64

In [4]:
#Delete cabin and PassengerId in dataset, drop null values of Age and Embarked columns
df.drop(columns = ["PassengerId", "Cabin"], inplace = True)
#df.dropna(inplace=True)
df['Age'].fillna(df["Age"].mode()[0], inplace=True)
df["Embarked"].fillna(df["Embarked"].mode()[0], inplace=True)

In [5]:
#Delete cabin colums -- High Cardinality
df.drop(columns = "Name", inplace=True) 

In [6]:
#Delete cabin colums  ---High cardinality
df.drop(columns = "Ticket", inplace=True)

In [7]:
#Test
df.isnull().sum()/len(df)

Survived    0.0
Pclass      0.0
Sex         0.0
Age         0.0
SibSp       0.0
Parch       0.0
Fare        0.0
Embarked    0.0
dtype: float64

In [8]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,male,22.0,1,0,7.25,S
1,1,1,female,38.0,1,0,71.2833,C
2,1,3,female,26.0,0,0,7.925,S
3,1,1,female,35.0,1,0,53.1,S
4,0,3,male,35.0,0,0,8.05,S


In [9]:
#Add dummy feature that will be used in the RELU matrx multiplication function
dummy_list = [1] * len(df)
df["dummy"] = dummy_list
df['dummy'].unique()

array([1], dtype=int64)

In [10]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,dummy
0,0,3,male,22.0,1,0,7.25,S,1
1,1,1,female,38.0,1,0,71.2833,C,1
2,1,3,female,26.0,0,0,7.925,S,1
3,1,1,female,35.0,1,0,53.1,S,1
4,0,3,male,35.0,0,0,8.05,S,1


In [11]:
df.drop("Survived", axis = 1).shape   ##Shape of dataset without target that is going into transformation

(891, 8)

Perform transformations such as standard scaler, and OneHotEncoding on categorical data sets

In [12]:
#First convert PClass to an object feature

df["Pclass"] = df["Pclass"].astype(str)
df["Pclass"].head()

0    3
1    1
2    3
3    1
4    3
Name: Pclass, dtype: object

In [34]:
target = "Survived"
Xt = df.drop(columns = ["dummy", target]) #Initiate Xt, transformed that will be used for matrix multiplication,
                                          #Drop dummy and target column, to avoid target and dummy getting transformed  

transform_pipe = make_pipeline(OneHotEncoder(), #Transformation initialized in a pipeline
                          StandardScaler())

Xt = transform_pipe.fit_transform(Xt)         #Transformer piplien object made and called on the Xt dataset

dm_array = np.array(df["dummy"])[:, None]     #Create dummy array of the dummy column to be added back to the Xt to 2D

Xt = np.concatenate([Xt, dm_array], axis = 1)  #Concatenate dummy array into Xt
Xt

array([[ 0.90258736, -0.56568542, -0.51015154, ..., -0.48204268,
        -0.30756234,  1.        ],
       [-1.10792599,  1.76776695, -0.51015154, ...,  2.0745051 ,
        -0.30756234,  1.        ],
       [ 0.90258736, -0.56568542, -0.51015154, ..., -0.48204268,
        -0.30756234,  1.        ],
       ...,
       [ 0.90258736, -0.56568542, -0.51015154, ..., -0.48204268,
        -0.30756234,  1.        ],
       [-1.10792599,  1.76776695, -0.51015154, ...,  2.0745051 ,
        -0.30756234,  1.        ],
       [ 0.90258736, -0.56568542, -0.51015154, ..., -0.48204268,
         3.25137334,  1.        ]])

In [35]:
Xt.shape

(891, 13)

From the above we see that we arrived at a dataset of 712 rows and 13 columns, which is because of the category encoders, Lets us see below the columns that have been created

In [15]:
print(transform_pipe.named_steps["onehotencoder"].get_feature_names())
print(len(transform_pipe.named_steps["onehotencoder"].get_feature_names()))

['Pclass_1', 'Pclass_2', 'Pclass_3', 'Sex_1', 'Sex_2', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked_1', 'Embarked_2', 'Embarked_3']
12


In [115]:
y = np.array(df[target])[:, None]
y.shape

(891, 1)

Since the dummy variable was not part of the features that went through the transformation pipeline, thus the length of the features generated by the transformer pipeline was 12 columns and plus 1(dummy feature) added manually by concatenation to make the final 13 columns that was generated

The next step is to instantiate a tensor of random values that will be the paramenters of our features.

In [50]:
#Initiate random seed
np.random.seed(42)
parameters = (np.random.random(13*1) - 0.5)[:, None]
parameters.shape

(13, 1)

Parameters that have been generated are random and are not responsive to a loss function yet until we put it into a tensor that does that. So the next step will involve using tensors and not numpy arrays to adjust the parameters

In [116]:
#Convert parameters to tensor object
parameters_t = torch.tensor(parameters)
parameters_t = parameters_t.float() #Convert to float cos of tensor calc
#Also convert y array into tensor
y_t = torch.tensor(y)
y_t = y_t.float()

In [182]:
#First let us define our loss function that our tensor of parameters will be adjusted by
# def mae(acts, preds): 
#     return (torch.abs(preds - acts)).mean() #Convert prediction to sigmoid prediction

# #Also define function that makes prediction using tensor parameters and clipping values of y (A RELU function)
# def tensor_prediction(params):
#     result = torch.matmul(torch.tensor(Xt), params)
#     return torch.clip(result, 0.)

In order to use a dataloader for training our model in batches, let us zip the training and prediction values

In [117]:
#Convert Xt to tensor
X_t = tensor(Xt)
X_t = X_t.float()

In [124]:
dset = list(zip(X_t, y_t))
x,y = dset[0]
x.shape, y.shape

(torch.Size([13]), torch.Size([1]))

In [125]:
#Split dataset into valid and train
train_dset = dset[:700]
valid_dset = dset[700:]

#Test if sum of lengths equal to 891
print(len(train_dset) + len(valid_dset))

891


In [132]:
def sigmoid_loss(preds, acts):
    preds = preds.sigmoid()
    return torch.where(acts==1, 1-preds, preds).mean().float()

#Also define function that makes prediction using datasets
#Since we will be using dataloaders, using dataset as input is
#better than using parameteer
def tensor_prediction(tensor_data):
    tensor_data = tensor_data
    #result = torch.matmul(torch.tensor(Xt), params)
    result = tensor_data@parameters_t
    return result

Using the sigmoid function and the corresponding loss function as opposed to the former loss function.

The sigmoid function helps to create a function that gradually smoothens out the loss function output as the parameters change a little bit (Gradient Descent). The sigmoid function uses a formula as seen in this [wiki page](https://en.wikipedia.org/wiki/Sigmoid_function).

It takes in the raw predictions made from the parameters and convert their predictions to be between 0 and 1 as a continous value. Note: No negative values or values above 1.

The torch.where that accompanies the loss function works to penalize errors that are made when it makes prediction using the ffg ways:
1. We are comparing each prediction and corresponding actual values.
2. When the actual is 1, we return 1 minus the prediction made. This will have the following effect:
>- When the prediction is close to 1, e.g., 0.9. The returned loss will be very small or zero if the prediction itself is 1
>- If the prediction is not close to 1 however, the loss returned will be large. E.g 0.1 will return loss of 0.9
3. When the actual is 0, we return the prediction made. This will have the following effect:
>- When the prediction is close to 1, e.g., 0.9. The returned loss will be very large because it is confident that it is close to 1 whereas the actual is 0
>- When the prediction is close to 0, The returned loss will be very small or zero if the prediction itself is 0

In [133]:
#Let us calculate the mae from the first random initial parameters that was generated

sigmoid_loss(tensor_prediction(X_t), y_t)

tensor(0.5336)

From the above we see the sigmoid being so high, so next we will initiate the adjustable torch parameters that will be adjusted based on gradient descent

In [60]:
#initiate requires_grad with the torch method for the parameters
parameters_t.requires_grad_()

tensor([[-0.1255],
        [ 0.4507],
        [ 0.2320],
        [ 0.0987],
        [-0.3440],
        [-0.3440],
        [-0.4419],
        [ 0.3662],
        [ 0.1011],
        [ 0.2081],
        [-0.4794],
        [ 0.4699],
        [ 0.3324]], requires_grad=True)

In order to begin training our data sets, we will do it batch by batch

In [126]:
dl = DataLoader(train_dset)
valid_dl = DataLoader(valid_dset)

In [78]:
#Define function that takes returns the loss and takes the loss backward

def calc_grad(xb, yb, model):
    pred = model(xb)
    #Take the loss
    loss = sigmoid_loss(pred, yb)
    loss.backward()

In [71]:
#Define function that train each batch and that adjusts the parameters
def train_epoch(model, lr, params):
    for xb, yb in dl:
        calc_grad(xb, yb, model)
        with torch.no_grad(): params -= params.grad*lr

In [65]:
#Put the mae_loss function into a variable that calculates the loss based on the adjustable parameters
#Test on the initial parameters seleced to see function is running properly --It should return
#same loss as above --0.6577

# loss = mae_loss(parameters_t)
# loss

In [64]:
#retrive the gradient of each parameters
#each parameter has a partial derivative with respect to the loss function chosen
#This is what is printed out as their gradients below
# loss.backward()   #This method initiates the grad attribute that will be needed to checked
# parameters_t.grad

In [85]:
#Define accuracy function that takes accuracy of the valid_dl with eacch adjusted paramter
#xb is the result from tensor_prediction of xbatch of valid_dl
#yb is the actual from ybatch of valid_dl
def batch_accuracy(xb, yb):
    xb = xb.sigmoid()
    correct = ((xb > 0.5) == yb).float()
    return correct.mean()

def valid_accuracy(model):
    accs = [batch_accuracy(model(xb), yb) for xb, yb in valid_dl]
    accs = round(torch.stack(accs).mean().item(), 4)
    return accs

From the gradients displayed from the information above, we see that some parameters do have a negative gradient and with increase in their values, they would tend to get to the minimum values and lower the loss. The opposite is true for the positve gradients. Because in order to reduce the loss, we need a minima (gradient) values (close to zero).

In [189]:
#Define metrics for eye judgement
# def acc1RELU(params):
#     pred = tensor_prediction(params)
#     preds = [1 if pre>0.5 else 0 for pre in pred]
#     return accuracy_score(y_t, preds)

In [99]:
#Using gradient descent to perform the paramaters adjustment and print the accuracy for each epoch

# for i in range(30):
#     loss = mae_loss(parameters_t)
#     loss.backward()
#     with torch.no_grad(): parameters_t -= parameters_t.grad*0.01
#     accuracy = acc1RELU(parameters_t)
#     print(f'epoch={i}; accuracy={accuracy: .2f}')

In [87]:
for i in range(10):
    train_epoch(tensor_prediction, 0.01, parameters_t)
    accuracy = valid_accuracy(tensor_prediction)
    print(f'epoch={i}; accuracy={accuracy}')

epoch=0; accuracy=0.801
epoch=1; accuracy=0.801
epoch=2; accuracy=0.801
epoch=3; accuracy=0.801
epoch=4; accuracy=0.801
epoch=5; accuracy=0.801
epoch=6; accuracy=0.801
epoch=7; accuracy=0.801
epoch=8; accuracy=0.801
epoch=9; accuracy=0.801


The parameters have been adjusted up to an optimal value (based on the range of values we chose)

In [88]:
#With this change above, let us see the final parameters
parameters_t

tensor([[ -148.0764],
        [  -24.4566],
        [  208.4151],
        [-1730.8922],
        [ 1730.6466],
        [ -223.9242],
        [ -620.9936],
        [  313.7205],
        [ -235.7989],
        [ -164.8757],
        [   -4.5441],
        [  268.5114],
        [ -731.7578]], requires_grad=True)

In [92]:
#Let us use this value to make predictions and check the accuracy_score of our prediction
y_pred = tensor_prediction(X_t)
y_pred = [1 if pred>0.5 else 0 for pred in y_pred]
print(f"Accuracy score: {accuracy_score(y_t, y_pred).round(2)}")

Accuracy score: 0.79


This above is valid and used for creating a one layer RELUnet. In order to create a double layer RELUnet function, some tweaks with the ready made materials made from above will be employed.

In [97]:
#Create param1 and param2 which will adjustable parameters

# param1 = torch.tensor((np.random.random(13*1) - 0.5)[:, None])
# param2 = torch.tensor((np.random.random(13*1) - 0.5)[:, None])
# #param1.requires_grad_()
# #param2.requires_grad_()
# print(param1, param2)

In [98]:
# grand_params = torch.hstack([param1, param2])
# grand_params.requires_grad_()

In [96]:
# learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,
#                 loss_func=sigmoid_loss, metrics=batch_accuracy)

#To construct a simple 1 layer neural net, this code will do summarily all that we did before this code--
#With its .fit(no_of_epochs, lr) method

Using custom made classes and functions from FastAi and pytorch to create a two layer RELU net

In [127]:
#Construct dataloaders
dls = DataLoaders(dl, valid_dl)

In [128]:
#Constructing double layer neural net
simple_net = nn.Sequential(
    nn.Linear(13, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

In [134]:
#Create model architecture
learn = Learner(dls, simple_net, opt_func=SGD, 
               loss_func=sigmoid_loss, metrics=batch_accuracy)
learn.fit(40, 0.01)

[0, 0.39646580815315247, 0.38594138622283936, 0.7015706896781921, '00:03']
[1, 0.29197683930397034, 0.282769113779068, 0.8167539238929749, '00:02']
[2, 0.24068117141723633, 0.22123922407627106, 0.8272251486778259, '00:02']
[3, 0.21912850439548492, 0.1993936002254486, 0.8324607610702515, '00:02']
[4, 0.21071785688400269, 0.19100947678089142, 0.8376963138580322, '00:02']
[5, 0.20730893313884735, 0.18663519620895386, 0.8272251486778259, '00:02']
[6, 0.20532594621181488, 0.18374621868133545, 0.8219895362854004, '00:02']
[7, 0.20394663512706757, 0.18186412751674652, 0.8272251486778259, '00:04']
[8, 0.203352689743042, 0.18056704103946686, 0.8324607610702515, '00:02']
[9, 0.20296187698841095, 0.17956002056598663, 0.8324607610702515, '00:03']
[10, 0.20256441831588745, 0.17876222729682922, 0.8324607610702515, '00:03']
[11, 0.20217667520046234, 0.17811216413974762, 0.8324607610702515, '00:02']
[12, 0.20192135870456696, 0.17761307954788208, 0.8324607610702515, '00:03']
[13, 0.20159770548343658, 0

In [138]:
column_names = ["train_loss", "valid_loss", "batch_accuracy"]
values = learn.recorder.values

pd.DataFrame(values, columns=column_names)

Unnamed: 0,train_loss,valid_loss,batch_accuracy
0,0.396466,0.385941,0.701571
1,0.291977,0.282769,0.816754
2,0.240681,0.221239,0.827225
3,0.219129,0.199394,0.832461
4,0.210718,0.191009,0.837696
5,0.207309,0.186635,0.827225
6,0.205326,0.183746,0.82199
7,0.203947,0.181864,0.827225
8,0.203353,0.180567,0.832461
9,0.202962,0.17956,0.832461


In [222]:
#Double RELU function that takes advantage of the matrix multpilication
# def grand_pred(params):
#     result = torch.matmul(torch.tensor(Xt), params)
#     #result = torch.clip(result, 0.)
#     result = result[:,0] + result[:,1]
#     result = result.sigmoid()
#     return result[:,None]

In [223]:
#Define mae_loss
# def grand_mae_loss(params):
#     return mae(y_t, grand_pred(params))

In [139]:
# grand_loss = grand_mae_loss(grand_params)
# grand_loss

In [225]:
# grand_loss.backward()

In [140]:
#Print the gradient of each 
# grand_params.grad

In [141]:
#Define loss function for RELU2net

# def acc2RELU(params):
#     pred = grand_pred(params)
#     preds = [1 if pre>0.5 else 0 for pre in pred]
#     return accuracy_score(y_t, preds)

In [142]:
#Take loop through the gadient descent
# for i in range(30):
#     grand_loss = grand_mae_loss(grand_params)
#     grand_loss.backward()
#     with torch.no_grad(): grand_params -= grand_params.grad*0.01
#     accuracy2 = acc2RELU(grand_params)
#     print(f'epoch={i+1}; accuracy={accuracy2:.2f}')

In [143]:
# #What is the final grand parameter
# grand_params

In [144]:
#Let us use this value to make predictions and check the accuracy_score of our prediction
# y_pred = grand_pred(grand_params)
# y_pred = [1 if pred>0.5 else 0 for pred in y_pred]
# print(f"Accuracy score: {accuracy_score(y_t, y_pred).round(2)}")

In [204]:
naive_model=round(df["Survived"].value_counts(normalize=True).max(), 4)
print(f"A simple naive model wil have an accuracy of {naive_model}")

A simple naive model wil have an accuracy of 0.6162


From this, we see that the RELUnet that we made does far well than a simlple and naive model.

In [147]:
#Trying both model on the test set
#first define wrangle function

def wrangle(filepath):
    
    #Read in filepath
    data = pd.read_csv(filepath)
    #Obtain dataindex
    data_index = data["PassengerId"]
    
    #Drop the following stated columns
    data.drop(columns = ["PassengerId", "Cabin", "Name", "Ticket"], inplace = True)
    
    #Fill missing values
    data["Age"].fillna(data["Age"].mode()[0], inplace=True)
    data["Fare"].fillna(data["Fare"].mean(), inplace=True)
    #Convert the Pclass column to string type
    data["Pclass"] = data["Pclass"].astype(str)
    
    #create dummy
    dummy = [1] * len(data)
    
    #Add transformation to the dataset
    train_trans = transform_pipe.fit_transform(data)        
    
    #Concatenate dummy array into train_trans
    dummy_array = np.array(dummy)[:, None]     
    train_trans = np.concatenate([train_trans, dummy_array], axis = 1)  
    return train_trans, data_index

In [148]:
test_train, idx = wrangle("test.csv")

In [234]:
test_train.shape

(418, 13)

In [208]:
def pred_data(data, model_parameters):
    result = torch.matmul(torch.tensor(data), model_parameters)
    result = torch.clip(result, 0.)
    if model_parameters.shape[1] > 1:
        result = result[:,0] + result[:,1]
        return result[:,None]
    return result[:,None]

In [209]:
RELU1_pred = pred_data(test_train, parameters_t)
RELU1_pred = [1 if pred>0.5 else 0 for pred in RELU1_pred]
print(RELU1_pred)

[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 

In [210]:
RELU2_pred = pred_data(test_train, grand_params)
RELU2_pred = [1 if pred>0.5 else 0 for pred in RELU2_pred]
print(RELU2_pred)

[0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 

In [211]:
pred = pd.Series(RELU2_pred, index=idx).rename("prediction").to_frame()

In [212]:
pred1 = pd.Series(RELU1_pred, index=idx).rename("prediction").to_frame()

In [213]:
pred.reset_index(inplace=True)

In [214]:
pred.rename(columns={"index": "PassengerId", "prediction":"Survived"}, inplace=True)

In [215]:
pred.to_csv("RELU20121prediction.csv", index=False)

## Limitations

- The whole process can be more flexible than many hardcodcoding done in this notebook. I only did this as a form of practical learning and trying my hands on what I was taught. A suitable way to go about this might be encapsulate it all as a class

- A double RELUnet will possibly take more time(epochs) to train than a single RELUnet, thus on a more refined model construction, the double RELUnet model should do better than the single net model. ## It now does. Check READme for reason why it was not performing as expected

## Image of entry into the titanic dataset competition with self built RELU network

![image](Images/kaggle.png)