In this notebook, we will explore how to use pytorch to build a neural network model to predict survivorship on Titanic. 

In [1]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt

## Loading the data

In [2]:
df = pd.read_csv("../data/Titanic/train_processed.csv")
df.shape

(889, 10)

In [3]:
train_df = df[:int(0.8 * len(df))]
valid_df = df[int(0.8 * len(df)):]

In [4]:
len(train_df), len(valid_df)

(711, 178)

In [5]:
df.head()

Unnamed: 0,Survived,Pclass,Age,SibSp,Parch,Fare,Sex_numeric,Embarked_C,Embarked_Q,Embarked_S
0,0,3,22.0,1,0,7.25,0,0,0,1
1,1,1,38.0,1,0,71.2833,1,1,0,0
2,1,3,26.0,0,0,7.925,1,0,0,1
3,1,1,35.0,1,0,53.1,1,0,0,1
4,0,3,35.0,0,0,8.05,0,0,0,1


## Define the model

Suppose we want to create a simple model with 1 middle layer that has 5 neurons. In pytorch, to create a model, we need to 
 - create a class that inherits nn.Module. 
 - the architecture (components) of the model will be defined in the constructor
 - the feed forward part needs to be defined in a function called forward. It tells how the data will flow through the network and how outputs are calculated

[Python class inheritance](https://www.w3schools.com/python/python_inheritance.asp)

As you can see, defining a model in pytorch is more complex compared to Scikit-learn. However this means there's more flexibility and freedom of how you define your models

## Loading the data into dataset and dataloader

Unlike scikit-learn where you can directly pass in pandas dataframe or numpy array into model training, we need to pass data into a dataset object then dataloader instead

To define our customized dataset, there are three requirements: 
- we will initialize the class with the content of the data
- we need to implement the len dunder function to tell pytorch how large our dataset is
- we need to implemnt the getitem function to tell pytorch how to get a piece of data

[Python Dunder methods](https://mathspp.com/blog/pydonts/dunder-methods#:~:text=In%20Python%2C%20dunder%20methods%20are,__%20or%20__add__%20.)

In PyTorch, the batch size is the number of samples that are processed by the model in one forward-backward pass. It is one of the hyperparameters that needs to be set when training a model with a DataLoader.

When the batch size is set to 1, the model processes one sample at a time, which is called online or stochastic training. When the batch size is set to the size of the entire dataset, the model processes the entire dataset in one forward-backward pass, which is called batch training. In practice, the batch size is usually set to a value between 1 and the size of the dataset, and is a trade-off between the accuracy of the gradient and the speed of training. A larger batch size can lead to faster training, but may also lead to a less accurate gradient.

For example, if you have a dataset of 100 samples and you set the batch size to 10, the DataLoader will return 10 samples at a time, and it will take 10 iterations to process the entire dataset.

It's important to note that the batch size can also affect the memory usage during the training and inference, as the model needs to hold all the batch data in memory. So it's important to choose a batch size that fits the amount of memory available.

## Define the training and validation process