# Lab-6 : Self-Practice

In this week self-practice, you will implement a neural network model for a regression problem. You will use the [*admission*](./Admission_Predict.csv) dataset attached, used in the previous lab



### 1. Load the dataset and do all the necessary preprocessing

In [57]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [58]:
df = pd.read_csv('Admission_Predict.csv')
df.head(5)

df = df.drop(['Serial No.'], axis=1)
df.describe()

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
count,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0
mean,316.8075,107.41,3.0875,3.4,3.4525,8.598925,0.5475,0.72435
std,11.473646,6.069514,1.143728,1.006869,0.898478,0.596317,0.498362,0.142609
min,290.0,92.0,1.0,1.0,1.0,6.8,0.0,0.34
25%,308.0,103.0,2.0,2.5,3.0,8.17,0.0,0.64
50%,317.0,107.0,3.0,3.5,3.5,8.61,1.0,0.73
75%,325.0,112.0,4.0,4.0,4.0,9.0625,1.0,0.83
max,340.0,120.0,5.0,5.0,5.0,9.92,1.0,0.97


In [59]:
X = df.iloc[:, :-1].values
y  = df.iloc[:, -1].values

In [60]:
from sklearn.preprocessing import StandardScaler

In [61]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)

In [62]:
## Scale all the features
scaler = StandardScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test) 

## 2. Create custom pytorch `Dataset`

You should create a class `CustomDataset` that inherits  the abstract class `torch.utils.data.Dataset` from pytorch. 

> **Note** You should overwrite `__getitem__`, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite `__len__`, which is expected to return the size of the dataset by many `~torch.utils.data.Sampler` implementations and the default options of `~torch.utils.data.DataLoader`.

#### Split your dataset into train and test data loaders
You can create a `CustomDataset` instance with the entire dataframe and use [`random_split`](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split) to split it into training and testing datasets. And then, create test and train dataloader. Or you can split using `train_test_split` from sklearn and past the splitted sets to your Custom dataset class. 

Create train and test dataloader with `batch_size = 32` each

In [63]:
import torch
from torch.utils.data import Dataset, DataLoader, random_split

class CustumData(Dataset):
    def __init__(self, X, y):
        super().__init__()
        self.y = torch.tensor(y).float()
        self.X = torch.tensor(X).float()

    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return self.X[idx, :], self.y[idx]

In [66]:
train_dataset = CustumData(X_train, y_train)
test_dataset = CustumData(X_test, y_test) 

train_dataloader = DataLoader(train_dataset, 32, shuffle=True)
test_dataloader = DataLoader(test_dataset, 32, shuffle=False)
print(train_dataset.X.shape)
print(test_dataset.X.shape)
print(train_dataset.y.dtype)
print(test_dataset.y.dtype)


torch.Size([320, 7])
torch.Size([80, 7])
torch.float32
torch.float32


In [46]:
data, label = next(iter(train_dataloader))
label.shape

torch.Size([32])

## Create the model

Using `nn`, Create a neural network with 1 hidden layers of size 100, each must be followed by a `leaky_relu` activation function and define the forward function

In [47]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self, n_hidden_unit = 100):
        super(Net, self).__init__()
        # Write 3 lines to define 3 more linear layers.
        # 2 hidden layers with number of neurons numbers: 250 and 100
        # 1 output layer that should output 10 neurons, one for each class.
        self.fc1 = nn.Linear(7, n_hidden_unit) 
        self.fc2 = nn.Linear(n_hidden_unit, 1) 

    def forward(self, x):
        # the linear layers fc1, fc2, fc3, and fc4
        # accepts only flattened input (1D batches)
        # while the batch x is of size (batch, 28 * 28)
        # define one line to flatten the x to be of size (batch_sz, 28 * 28)
        
        #x = F.sigmoid(self.fc1(x))
        #x = F.sigmoid(self.fc2(x))
        x = F.leaky_relu(self.fc1(x))
        x = F.leaky_relu(self.fc2(x)) 
        return x

# use_cuda = torch.cuda.is_available()
device = torch.device("cpu")
model = Net(n_hidden_unit = 100).to(device)
#model = nn.Sequential(nn.Linear(7, 1)).to(device)

### Training loop

Define the appropriate loss function and the training loop for the training and the testing dataloader (as done in the lab). Print the final loss on the test data

In [48]:
epochs = 20
lr = 0.01
momentum = 0.5
seed = 1
log_interval = 2

loss_fn = nn.MSELoss(reduction='mean')

In [49]:
def train( model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        #print(batch_idx)
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad() 
        output = model(data).squeeze()  
        loss = loss_fn(output, target) 
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))

In [50]:
def test( model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            # Do the same that was done in the previous function.
            # But without backprobagating the loss and without running the optimizers
            # As this function is only for test.
            # write 3 lines to transform the data to the device, get the output and compute the loss
            data, target = data.to(device), target.to(device)
            output = model(data).squeeze() 
            test_loss += loss_fn(output, target).item()  # sum up batch loss 

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}\n'.format(
        test_loss))

In [51]:
import torch.optim as optim

optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)


for epoch in range(1, epochs + 1):
    train(model, device, train_dataloader, optimizer, epoch)
    test(model, device, test_dataloader)



Test set: Average loss: 0.0060


Test set: Average loss: 0.0024


Test set: Average loss: 0.0016


Test set: Average loss: 0.0012


Test set: Average loss: 0.0008


Test set: Average loss: 0.0006


Test set: Average loss: 0.0005


Test set: Average loss: 0.0004


Test set: Average loss: 0.0003


Test set: Average loss: 0.0003


Test set: Average loss: 0.0003


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002


Test set: Average loss: 0.0002



In [52]:
zy_pred_nn = model(torch.tensor(X_test).float()) 

## Compare your Neural network model to a Linear Regression
Train a simple linear regression model on the training set and print MSE on the testing set (`X_test`). Also print the MSE on the test set using the your neural model. 

> Compare the results (which performs best) and justify why

In [53]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

In [54]:
lr=LinearRegression()

lr.fit(X_train, y_train)

y_pred = lr.predict(X_test) 

In [56]:
#import detach
from torch import detach
print("MSE neural network", mean_squared_error(y_test, detach(zy_pred_nn).numpy()))
print("MSE Linear regressionn", mean_squared_error(y_test, y_pred))

MSE neural network 0.004361042830272186
MSE Linear regressionn 0.003332077754891382
