## Building a Rectified Learner Unit (RELU) to be used on Titanic dataset.

After following Fastai course and on module 3, I want to use a jupyter notebook to build a Rectified Learner Unit which is a kind of basic building block for that is used in neural nets that is used and employed in deep learning to make machine learning algorithms.

The goal of this notebook is to explain and build the RELU while using the titanic training dataset obtained from kaggle as a test for this framework. This is built on some libraries like numpy, pytorch and some other frameworks used along the line and will be referenced as appropriate. Interesting to note here is that most of the libraries needed for this to function have all been imported from the one line `from fastai.basics import *` below which is as seen in the cell block below.

RELUs are simple linear equation algorithms that uses Gradient Descent for optimizations. And that is what we are going to be doing exactly.

In [36]:
#import modules
from fastai.basics import *
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from category_encoders import OneHotEncoder
from sklearn.pipeline import make_pipeline

In [9]:
#import our dataset
df = pd.read_csv("train.csv")
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [12]:
#perform some necessary cleaning, transformation and feature engineering
df.isnull().sum()/len(df)

PassengerId    0.000000
Survived       0.000000
Pclass         0.000000
Name           0.000000
Sex            0.000000
Age            0.198653
SibSp          0.000000
Parch          0.000000
Ticket         0.000000
Fare           0.000000
Cabin          0.771044
Embarked       0.002245
dtype: float64

In [14]:
#Delete cabin and PassengerId in dataset, drop null values of Age and Embarked columns
df.drop(columns = ["PassengerId", "Cabin"], inplace = True)
df.dropna(inplace=True)

In [18]:
#Delete cabin colums -- High Cardinality
df.drop(columns = "Name", inplace=True) 

In [24]:
#Delete cabin colums  ---High cardinality
df.drop(columns = "Ticket", inplace=True)

In [25]:
#Test
df.isnull().sum()/len(df)

Survived    0.0
Pclass      0.0
Sex         0.0
Age         0.0
SibSp       0.0
Parch       0.0
Fare        0.0
Embarked    0.0
dtype: float64

In [26]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,male,22.0,1,0,7.25,S
1,1,1,female,38.0,1,0,71.2833,C
2,1,3,female,26.0,0,0,7.925,S
3,1,1,female,35.0,1,0,53.1,S
4,0,3,male,35.0,0,0,8.05,S


In [29]:
#Add dummy feature that will be used in the RELU matrx multiplication function
dummy_list = [1] * len(df)
df["dummy"] = dummy_list
df['dummy'].unique()

array([1], dtype=int64)

In [30]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,dummy
0,0,3,male,22.0,1,0,7.25,S,1
1,1,1,female,38.0,1,0,71.2833,C,1
2,1,3,female,26.0,0,0,7.925,S,1
3,1,1,female,35.0,1,0,53.1,S,1
4,0,3,male,35.0,0,0,8.05,S,1


In [69]:
df.drop("Survived", axis = 1).shape   ##Shape of dataset without target that is going into transformation

(712, 8)

Perform transformations such as standard scaler, and OneHotEncoding on categorical data sets

In [32]:
#First convert PClass to an object feature

df["Pclass"] = df["Pclass"].astype(str)
df["Pclass"].head()

0    3
1    1
2    3
3    1
4    3
Name: Pclass, dtype: object

In [67]:
target = "Survived"
Xt = df.drop(columns = ["dummy", target]) #Initiate Xt, transformed that will be used for matrix multiplication,
                                          #Drop dummy and target column, to avoid target and dummy getting transformed  

transform_pipe = make_pipeline(OneHotEncoder(), #Transformation initialized in a pipeline
                          StandardScaler())

Xt = transform_pipe.fit_transform(Xt)         #Transformer piplien object made and called on the Xt dataset

dm_array = np.array(df["dummy"])[:, None]     #Create dummy array of the dummy column to be added back to the Xt to 2D

Xt = np.concatenate([Xt, dm_array], axis = 1)  #Concatenate dummy array into Xt
Xt

array([[ 1.00281295, -0.59032605, -0.56653751, ..., -0.47261792,
        -0.20232566,  1.        ],
       [-0.99719495,  1.69397911, -0.56653751, ...,  2.11587407,
        -0.20232566,  1.        ],
       [ 1.00281295, -0.59032605, -0.56653751, ..., -0.47261792,
        -0.20232566,  1.        ],
       ...,
       [-0.99719495,  1.69397911, -0.56653751, ..., -0.47261792,
        -0.20232566,  1.        ],
       [-0.99719495,  1.69397911, -0.56653751, ...,  2.11587407,
        -0.20232566,  1.        ],
       [ 1.00281295, -0.59032605, -0.56653751, ..., -0.47261792,
         4.94252683,  1.        ]])

In [68]:
Xt.shape

(712, 13)

From the above we see that we arrived at a dataset of 712 rows and 13 columns, which is because of the category encoders, Lets us see below the columns that have been created

In [79]:
print(transform_pipe.named_steps["onehotencoder"].get_feature_names())
print(len(transform_pipe.named_steps["onehotencoder"].get_feature_names()))

['Pclass_1', 'Pclass_2', 'Pclass_3', 'Sex_1', 'Sex_2', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked_1', 'Embarked_2', 'Embarked_3']
12


In [84]:
y = np.array(df[target])[:, None]
y.shape

(712, 1)

Since the dummy variable was not part of the features that went through the transformation pipeline, thus the length of the features generated by the transformer pipeline was 12 columns and plus 1(dummy feature) added manually by concatenation to make the final 13 columns that was generated

The next step is to instantiate a tensor of random values that will be the paramenters of our features.

In [204]:
#Initiate random seed
np.random.seed(42)
parameters = (np.random.random(13*1) - 0.5)[:, None]
parameters.shape

Parameters that have been generated are random and are not responsive to a loss function yet until we put it into a tensor that does that. So the next step will involve using tensors and not numpy arrays to adjust the parameters

In [247]:
#Convert parameters to tensor object
parameters_t = torch.tensor(parameters)
#Also convert y array into tensor
y_t = torch.tensor(y)

In [248]:
#First let us define our loss function that our tensor of parameters will be adjusted by
def mae(acts, preds): return (torch.abs(preds - acts)).mean()

#Also define function that makes prediction using tensor parameters and clipping values of y (A RELU function)
def tensor_prediction(params):
    result = torch.matmul(torch.tensor(Xt), params)
    return torch.clip(result, 0.)

In [250]:
#Let us calculate the mae from the first random initial parameters that was generated

mae(y_t, tensor_prediction(parameters_t))

tensor(0.6577, dtype=torch.float64)

From the above we see the mean absolute error being so high, so next we will initiate the adjustable torch parameters that will be adjusted based on gradient descent

In [251]:
#initiate requires_grad with the torch method for the parameters
parameters_t.requires_grad_()

tensor([[-0.1255],
        [ 0.4507],
        [ 0.2320],
        [ 0.0987],
        [-0.3440],
        [-0.3440],
        [-0.4419],
        [ 0.3662],
        [ 0.1011],
        [ 0.2081],
        [-0.4794],
        [ 0.4699],
        [ 0.3324]], dtype=torch.float64, requires_grad=True)

In [252]:
#Define function mae_loss that calculates the mae_loss with only parameter given
def mae_loss(params):
    return mae(y_t, tensor_prediction(params))

In [253]:
#Put the mae_loss function into a variable that calculates the loss based on the adjustable parameters
#Test on the initial parameters seleced to see function is running properly --It should return
#same loss as above --0.6577

loss = mae_loss(parameters_t)
loss

tensor(0.6577, dtype=torch.float64, grad_fn=<MeanBackward0>)

In [255]:
#retrive the gradient of each parameters
#each parameter has a partial derivative with respect to the loss function chosen
#This is what is printed out as their gradients below
loss.backward()   #This method initiates the grad attribute that will be needed to checked
parameters_t.grad

tensor([[ 0.0474],
        [-0.0481],
        [-0.0061],
        [ 0.3478],
        [-0.3478],
        [-0.0189],
        [-0.1106],
        [-0.0123],
        [-0.0739],
        [ 0.0887],
        [-0.1591],
        [ 0.1265],
        [ 0.3750]], dtype=torch.float64)

From the gradients displayed from the information above, we see that some parameters do have a negative gradient and with increase in their values, they would tend to get to the minimum values and lower the loss. The opposite is true for the positve gradients. Because in order to reduce the loss, we need a minima (gradient) values (close to zero).

In [256]:
#Using gradient descent to perform the paramaters adjustment
for i in range(20):
    loss = mae_loss(parameters_t)
    loss.backward()
    with torch.no_grad(): parameters_t -= parameters_t.grad*0.01
    print(f'step={i}; loss={loss:.2f}')

step=0; loss=0.66
step=1; loss=0.65
step=2; loss=0.64
step=3; loss=0.62
step=4; loss=0.60
step=5; loss=0.57
step=6; loss=0.54
step=7; loss=0.52
step=8; loss=0.49
step=9; loss=0.46
step=10; loss=0.44
step=11; loss=0.42
step=12; loss=0.40
step=13; loss=0.38
step=14; loss=0.36
step=15; loss=0.34
step=16; loss=0.33
step=17; loss=0.32
step=18; loss=0.31
step=19; loss=0.31


The parameters have been adjusted up to an optimal value (based on the range of values we chose)

In [257]:
#With this change above, let us see the final parameters
parameters_t

tensor([[-0.1569],
        [ 0.4731],
        [ 0.2458],
        [-0.4885],
        [ 0.2432],
        [-0.2807],
        [-0.2634],
        [ 0.2648],
        [ 0.1585],
        [ 0.0437],
        [-0.1444],
        [ 0.1555],
        [-0.3334]], dtype=torch.float64, requires_grad=True)

In [259]:
#Let us use this value to make predictions and check the accuracy_score of our prediction
y_pred = tensor_prediction(parameters_t)
y_pred = [1 if pred>0.5 else 0 for pred in y_pred]
print(f"Accuracy score: {accuracy_score(y_t, y_pred).round(2)}")

Accuracy score: 0.79


This above is valid and used for creating a one layer RELUnet. In order to create a double layer RELUnet function, some tweaks with the ready made materials made from above will be employed.

In [348]:
#Create param1 and param2 which will adjustable parameters

param1 = torch.tensor((np.random.random(13*1) - 0.5)[:, None])
param2 = torch.tensor((np.random.random(13*1) - 0.5)[:, None])
#param1.requires_grad_()
#param2.requires_grad_()
print(param1, param2)

tensor([[ 0.4247],
        [ 0.3773],
        [-0.2421],
        [ 0.1600],
        [ 0.3172],
        [ 0.0552],
        [ 0.0297],
        [-0.2581],
        [-0.4069],
        [ 0.3972],
        [ 0.4004],
        [ 0.1331],
        [-0.1610]], dtype=torch.float64) tensor([[-0.1508],
        [ 0.2260],
        [ 0.3971],
        [ 0.3871],
        [ 0.2799],
        [ 0.1420],
        [-0.4159],
        [-0.3384],
        [ 0.3986],
        [ 0.1064],
        [-0.4908],
        [-0.3985],
        [ 0.1635]], dtype=torch.float64)


In [349]:
grand_params = torch.hstack([param1, param2])
grand_params.requires_grad_()

tensor([[ 0.4247, -0.1508],
        [ 0.3773,  0.2260],
        [-0.2421,  0.3971],
        [ 0.1600,  0.3871],
        [ 0.3172,  0.2799],
        [ 0.0552,  0.1420],
        [ 0.0297, -0.4159],
        [-0.2581, -0.3384],
        [-0.4069,  0.3986],
        [ 0.3972,  0.1064],
        [ 0.4004, -0.4908],
        [ 0.1331, -0.3985],
        [-0.1610,  0.1635]], dtype=torch.float64, requires_grad=True)

In [350]:
#Double RELU function that takes advantage of the matrix multpilication
def grand_pred(params):
    result = torch.matmul(torch.tensor(Xt), params)
    result = torch.clip(result, 0.)
    result = result[:,0] + result[:,1]
    return result[:,None]

In [351]:
#Define mae_loss
def grand_mae_loss(params):
    return mae(y_t, grand_pred(params))

In [352]:
grand_loss = grand_mae_loss(grand_params)
grand_loss

tensor(0.6797, dtype=torch.float64, grad_fn=<MeanBackward0>)

In [353]:
grand_loss.backward()

In [354]:
#Print the gradient of each 
grand_params.grad

tensor([[ 0.1567, -0.0579],
        [-0.0238,  0.0227],
        [-0.1583,  0.0442],
        [ 0.2289,  0.2249],
        [-0.2289, -0.2249],
        [ 0.1052,  0.1994],
        [-0.1001, -0.1933],
        [-0.1548, -0.2496],
        [-0.1172, -0.0379],
        [ 0.1797,  0.1875],
        [-0.1866, -0.1608],
        [-0.0132, -0.0813],
        [ 0.2795,  0.4017]], dtype=torch.float64)

In [355]:
#Take loop through the gadient descent
for i in range(20):
    grand_loss = grand_mae_loss(grand_params)
    grand_loss.backward()
    with torch.no_grad(): grand_params -= grand_params.grad*0.01
    print(f'step={i}; loss={grand_loss:.2f}')

step=0; loss=0.68
step=1; loss=0.66
step=2; loss=0.64
step=3; loss=0.60
step=4; loss=0.56
step=5; loss=0.52
step=6; loss=0.47
step=7; loss=0.43
step=8; loss=0.41
step=9; loss=0.39
step=10; loss=0.37
step=11; loss=0.35
step=12; loss=0.34
step=13; loss=0.32
step=14; loss=0.31
step=15; loss=0.30
step=16; loss=0.30
step=17; loss=0.30
step=18; loss=0.30
step=19; loss=0.30


In [356]:
#What is the final grand parameter
grand_params

tensor([[ 0.1929,  0.0450],
        [ 0.3915,  0.1051],
        [ 0.0137,  0.2922],
        [-0.0961,  0.0514],
        [ 0.5733,  0.6156],
        [-0.0907, -0.2422],
        [ 0.1265, -0.0912],
        [-0.0277,  0.0952],
        [-0.2602,  0.3273],
        [ 0.1795, -0.1568],
        [ 0.6057, -0.2741],
        [ 0.1905, -0.2667],
        [-0.5955, -0.4883]], dtype=torch.float64, requires_grad=True)

In [357]:
#Let us use this value to make predictions and check the accuracy_score of our prediction
y_pred = grand_pred(grand_params)
y_pred = [1 if pred>0.5 else 0 for pred in y_pred]
print(f"Accuracy score: {accuracy_score(y_t, y_pred).round(2)}")

Accuracy score: 0.77


In [366]:
naive_model=round(df["Survived"].value_counts(normalize=True).max(), 4)
print(f"A simple naive model wil have an accuracy of {naive_model}")

A simple naive model wil have an accuracy of 0.5955


From this, we see that the RELUnet that we made does far well than a simlple and naive model.

In [457]:
#Trying both model on the test set
#first define wrangle function

def wrangle(filepath):
    
    #Read in filepath
    data = pd.read_csv(filepath)
    #Obtain dataindex
    data_index = data["PassengerId"]
    
    #Drop the following stated columns
    data.drop(columns = ["PassengerId", "Cabin", "Name", "Ticket"], inplace = True)
    
    #Fill missing values
    data["Age"].fillna(data["Age"].mode()[0], inplace=True)
    data["Fare"].fillna(data["Fare"].mean(), inplace=True)
    #Convert the Pclass column to string type
    data["Pclass"] = data["Pclass"].astype(str)
    
    #create dummy
    dummy = [1] * len(data)
    
    #Add transformation to the dataset
    train_trans = transform_pipe.fit_transform(data)        
    
    #Concatenate dummy array into train_trans
    dummy_array = np.array(dummy)[:, None]     
    train_trans = np.concatenate([train_trans, dummy_array], axis = 1)  
    return train_trans, data_index

In [460]:
test_train, idx = wrangle("test.csv")

In [404]:
test_train.shape

(331, 13)

In [456]:
tst

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...
413,1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
414,1306,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
415,1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
416,1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


In [442]:
def pred_data(data, model_parameters):
    result = torch.matmul(torch.tensor(data), model_parameters)
    result = torch.clip(result, 0.)
    if model_parameters.shape[1] > 1:
        result = result[:,0] + result[:,1]
        return result[:,None]
    return result[:,None]

In [443]:
RELU1_pred = pred_data(test_train, parameters_t)
RELU1_pred = [1 if pred>0.5 else 0 for pred in RELU1_pred]
print(RELU1_pred)

[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 

In [444]:
RELU2_pred = pred_data(test_train, grand_params)
RELU2_pred = [1 if pred>0.5 else 0 for pred in RELU2_pred]
print(RELU2_pred)

[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 

In [461]:
pred = pd.Series(RELU2_pred, index=idx).rename("prediction").to_frame()

In [472]:
pred1 = pd.Series(RELU1_pred, index=idx).rename("prediction").to_frame()

In [473]:
pred1.reset_index(inplace=True)

In [474]:
pred1.rename(columns={"index": "PassengerId", "prediction":"Survived"}, inplace=True)

In [477]:
pred1.to_csv("RELU20221prediction.csv", index=False)

## Limitations

- The whole process can be more flexible than many hardcodcoding done in this notebook. I only did this as a form of practical learning and trying my hands on what I was taught. A suitable way to go about this might be encapsulate it all as a class

- The model built does not take into consideration of setting aside a validation dataset upon which the loss function is compared and subsequently adjusting the parameters. As a result, the model is very prone to overfitting.

- A double RELUnet will possibly take more time(epochs) to train than a single RELUnet, thus on a more refined model construction, the double RELUnet model should do better than the single net model.

## Image of entry into the titanic dataset competition with self built RELU network

![image](Images/kaggle.png)