<a href="https://colab.research.google.com/github/Aashish-2002/fmml-python/blob/main/Lab14_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FOUNDATIONS OF MODERN MACHINE LEARNING, IIIT Hyderabad
## Lecture 14 Project
### Lab Coordinator: Shantanu Agrawal

In [None]:
!pip install torch torchvision

In [1]:
# Importing required libraries and packages
import pandas as pd
import tqdm

from sklearn.preprocessing import LabelEncoder

from torch.utils.data import Dataset, DataLoader
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam

from google.colab import files

In [None]:
# Now we import the Dataset module to inherit various functions such as __getitem__(), __len__(), etc predefined in the library. 
class TitanicDataset(Dataset):
  def __init__(self,csvpath, mode = 'train'):
    self.mode = mode
    df = pd.read_csv(csvpath)
    le = LabelEncoder()        
    """
    <------Some Data Preprocessing---------->
    Removing Null Values, Outliers and Encoding the categorical labels etc

    Also, look at the difference between the train or test modes
    """
    if self.mode == 'train':
      # df = df.dropna()
      self.inp = df.iloc[:,2:]
      for i in range(self.inp.shape[1]):
        le = LabelEncoder()
        le.fit(self.inp.iloc[:,i])
        self.inp.iloc[:,i] = le.transform(self.inp.iloc[:,i])
      self.inp = self.inp.values
      self.oup = df.iloc[:,1].values.reshape(len(self.inp),1)
    else:
      self.inp = df.iloc[:,1:]
      for i in range(self.inp.shape[1]):
        le = LabelEncoder()
        le.fit(self.inp.iloc[:,i])
        self.inp.iloc[:,i] = le.transform(self.inp.iloc[:,i])
      self.inp = self.inp.values
    
  def __len__(self):
    return len(self.inp)
    
  def __getitem__(self, idx):
    """
      Look at how result has been returned in the form of dictionary 
      for easier understanding as well as easier accessing.
    """
    if self.mode == 'train':
      inpt  = torch.Tensor(self.inp[idx])
      oupt  = torch.Tensor(self.oup[idx])
      return { 'inp': inpt,
              'oup': oupt,
          }
    else:
      inpt = torch.Tensor(self.inp[idx])
      return { 'inp': inpt
      }

In [None]:
# Use this to import train.csv file given to you
files.upload()

In [None]:
## Initialize the Training DataSet
data = TitanicDataset('train.csv')

## --- TASK 1 ---
## Choose batch size of your own for the data loader
## Fill in place of "??"
BATCH_SIZE = ??

## DataLoader has been initialized as below
## Look for the use of this
data_train = DataLoader(dataset = data, batch_size = BATCH_SIZE, shuffle =False)

**NOTE:** You have to pass your dataset object resulting from the previous function as your argument. According to the number of batches, the result will be a multidimensional tensor of the shape **(no_of_batches, batch_size, size_of_the_vector)**.

It's the time for model generation, you will be creating mutiple perceptron layers to get the final model.

**---- TASK 2 ----**<br>
This is your task to design your layers yourselves.<br>
Makes sure:
- Input length of the first should be equal to total number of features.
- Output length of final layer should be equal to 1.
- You are using non-linearitites (i.e, activation layers) in between the layers to ensure the property.
- Look for BatchNorm1D layer as well, for better results.

You can examine the dataset yourself by reading the csv using the panda library (via pd.read_csv() function) to explore more about it.

In [None]:
class Network(nn.Module):

    def __init__(self):
        super().__init__()
        #define various layers

    def swish(self, x):
        return x * F.sigmoid(x)

    def forward(self,x):
        # use this swish function or F.relu() in place of that

        return x

In [None]:
# Training paradigm has been setup for you here
def train(model, x, y, optimizer, criterion):
    model.zero_grad()
    output = model(x)
    loss =criterion(output,y)
    loss.backward()
    optimizer.step()

    return loss, output

In [None]:
device = torch.device("cpu")
if torch.cuda.is_available():
  device = torch.device("cuda:0")
  print("Cuda Device Available")
  print("Name of the Cuda Device: ", torch.cuda.get_device_name())
  print("GPU Computational Capablity: ", torch.cuda.get_device_capability())

In [None]:
# ---- TASK 3 ----
# Edit the number of epochs (more than once (and then run the code)) to check the result(s)
# Fill in place of "??"
EPOCHS = ??

net = Network()
optm = Adam(net.parameters(), lr = 0.001)
data_train = DataLoader(dataset = data, batch_size = BATCH_SIZE, shuffle =False)
criterion = nn.MSELoss()

for epoch in range(EPOCHS):
    epoch_loss = 0
    correct = 0
    for batch in data_train:
        x_train, y_train = batch['inp'], batch['oup']
        x_train = x_train.view(-1,10)
        x_train = x_train.to(device)
        y_train = y_train.to(device)
        loss, predictions = train(net,x_train,y_train, optm, criterion)
        for idx, i in enumerate(predictions):
            i  = torch.round(i)
            if i == y_train[idx]:
                correct += 1
        acc = (correct/len(data))
        epoch_loss+=loss
    print('Epoch {} Accuracy : {}'.format(epoch+1, acc*100))
    print('Epoch {} Loss : {}'.format((epoch+1),epoch_loss))

Above shown results are only **training loss and accuracy**. There are various factors for this 100% result.<br>
You should also look for **test/validation loss** as well for better understanding of how model is working.