# Hands On Session: Autoencoders and Machine Learning
## By: Sabera Talukder

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/SaberaTalukder/Chen_Institute_DataSAI_for_Neuroscience/blob/main/07_08_22_day4_autoencoders_and_ML_introduction/code/diy_notebooks/neural_autoencoders.ipynb)

In [2]:
# All Imports - alphabetically ordered with shortcuts
import matplotlib.pyplot as plt
import numpy as np
import torch
import torchvision

from scipy.io import loadmat
from sklearn.decomposition import PCA
from torch.utils.data import Dataset, DataLoader

SEED = 38
np.random.seed(SEED)
_ = torch.random.manual_seed(SEED)

In [3]:
# uncomment code below to pull directly from github
# !wget https://github.com/SaberaTalukder/Chen_Institute_DataSAI_for_Neuroscience/blob/main/07_08_22_day4_autoencoders_and_ML_introduction/data/hypothalamus_calcium_imaging_remedios_et_al.mat
# !mv hypothalamus_calcium_imaging_remedios_et_al.mat\?raw\=true hypothalamus_calcium_imaging_remedios_et_al.mat
# hypothalamus_data = loadmat('hypothalamus_calcium_imaging_remedios_et_al.mat')

path = '/Users/A*****/Desktop/sabera_chen/Chen_Institute_DataSAI_for_Neuroscience/07_08_22_day4_autoencoders_and_ML_introduction/data/'
hypothalamus_data = loadmat(path + 'hypothalamus_calcium_imaging_remedios_et_al.mat')

## Throwback!! 😎😎😎 This is the same neural data that we worked with on our first day!
#### Since you know so much about it please pull out the neural data array from they hypothalamus data!

In [None]:
# Enter Code Here:

## Now shorten this array, and for every neuron take only the first 1000 time steps
#### Hint: You should be able to do this in one line of code

In [None]:
# Enter Code Here:

## Now run PCA on the shortened neural data, color according to time!
#### Hint: It's very similar to the dimensionality reduction code from the first day, but you're only doing it on 1000 time step data

In [None]:
# Enter Code Here:

# Now, I'm going to give you the code for the custom dataset class, but I'm going to ask you questions so read each line in depth!

### A custom dataset class allows us to load our data into a pytorch data loader. This is important when building pytorch models!

In [None]:
class CustomDataset(Dataset):
    def __init__(self, data):
        self.data = data
        
    def __len__(self):
        return self.data.shape[0]
    
    def __getitem__(self, idx):
        instance = self.data[idx, :]
        sample = {"data": instance}
        return sample

### Because we are dealing with pytorch, it's best to pass in our data as a tensor of floats.
#### A float is a 'floating-point number', it's just a number with decimals out to some level of precision. 
#### A tensor is a container that stores data in N dimensions. A matrix is a special case of a tensor that is 2D.

In [None]:
neural_data_float_tensor = torch.tensor(neural_data_short).float()

#### What is the shape of our tensor of floats?

In [None]:
# Enter Code Here:

## Now apply the CustomDataset class to our neural data tensor!

In [None]:
# Enter code here:

## What happens if you call __len__() on the data? What does this length represent?

In [None]:
# Enter code here:

## How about __getitem__(idx)? What is the idx variable that __getitem__() is indexing? 

In [None]:
# Enter code here:

## Now we are going to break our data into a train (90% of the data) and test (10% of the data) split

#### How many neurons should be in the train set? How about the test set?

In [None]:
# Enter code here:

## Now using the torch.utils.data.random_split() function split the data into a train set and a test set
#### Hint: Google is your friend 🤗
#### Hint Hint: You should be able to do this in one line of code 😱

In [None]:
# Enter code here:

## Now using the torch.utils.data.DataLoader() make a train data loader and a test data loader.
#### Make sure to use the dataset, batch_size, and shuffle parameters when you call the function.
#### For simplicity, set the batch size to be larger than all the data you have. This isn't practical for large datasets, but with our small data it will work great!

In [None]:
# Enter code here:

## Enumerate through your train_loader and test loader is the index, and data what you expect to see?
#### Hint: Enumerate is an important word!

In [None]:
# Enter code here:

# Woot Woo!! Nicely Done 😎 Now we're going to build our very first neural network!

### There are 3 steps! Make sure to do them all!

In [None]:
class Autoencoder(torch.nn.Module):
    def __init__(self):
        super().__init__()
          
        # Implementing a linear encoder.
        # Each layer is composed of a linear layer followed by a Relu activation function.
        # The last layer is just a linear layer!
        # We take a data point from a dimension of 1000, to 100, to 4.
        
        self.encoder = torch.nn.Sequential(
            # STEP 1: Enter code here
        )
          
        # Implementing a linear decoder.
        # Each layer is composed of a linear layer followed by a Relu activation function.
        # The last layer is just a linear layer!
        # We take a data point from a dimension of 4, to 100, to 1000.
        self.decoder = torch.nn.Sequential(
            # STEP 2: Enter code here
        )
        
    # You first want to pass your input data through the encoder. This creates an embedding.
    # You then want to pass this embedding through your decoder. This creates your reconstruction.
    # You then wan your forward function to return your reconstruction.
    def forward(self, x):
        # STEP 3: Enter code here
        return reconstruction

# Amazing work! Now that you've built an autoencoder, let's train it!

In [None]:
# Enter Code Here:

# Training Parameters
epochs = 5
outputs = []
losses = []
validation_outputs = []
validation_losses = []

# Model Initialization
model = Autoencoder()
# Using Mean-Squared-Error MSE Loss function
loss_function = torch.nn.MSELoss()
# Using an Adam Optimizer with lr = 0.1
optimizer = torch.optim.Adam(model.parameters(), lr = 1e-2, weight_decay = 1e-8)
# learning rate scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer=optimizer, step_size=1, gamma=0.98)

# ----------------------------------------------

for epoch in range(epochs):
    # training on train set
    model.train()
    
    # Loop through your training data
    for batch_idx, batch in enumerate(train_loader):
        # STEP 1: pull out the data from your batch        
        # STEP 2: get the reconstructed data from the Autoencoder Output
        # STEP 3: calculate the loss function between the reconstrucion and original data
        
        # set gradients to zero
        optimizer.zero_grad()
        # the gradient is computed and stored
        loss.backward()
        # perform the parameter update
        optimizer.step()
        # Storing the losses in a list for plotting
        losses.append(float(loss.detach()))
       
    # put model into evaluation mode
    model.eval()
    # loop through your testing/validation data
    for validation_batch_idx, validation_batch in enumerate(test_loader):
        pass
        # STEP_4: pull out the data from your validation batch
        # STEP 5: get the reconstructed data from the Autoencoder Output
        # STEP 6: calculate the loss function between the reconstrucion and original data        
        # STEP 7: append the validation losses to the validation loss list
        
    # STEP 8: append the outputs to the train outputs lists in the form of:
    # (epochs, original data, reconstruction). Don't forget to transform your tensors into numpy arrays!!!
    
    # STEP 9: append the outputs to the validation outputs lists in the form of:
    # (epochs, original data, reconstruction). Don't forget to transform your tensors into numpy arrays!!!    

    print('Finished Epoch: ', epoch)

## What is the difference between the training loop and the validation loop?

In [None]:
# Enter answer here

## What is the length of your ouputs? What does this correspond to?

In [None]:
# Enter code here?

## For the last epoch, what are each of items in that tuple?

In [None]:
# Enter code here:

## For the last epoch, and a neuron (you pick the number) plot the original data time series and the reconstruction time series for the training data

In [None]:
# Enter code here:

## For the last epoch and all the neurons at once, plot the original training data and the reconstructed training data. How do you think it looks?
#### Hint: This of this represenation as an image!

In [None]:
# Enter code here:

## For the last epoch and all the neurons at once, plot the original testing data and the reconstructed testing data. How do you think it looks?
#### Hint: This of this represenation as an image!

In [4]:
# Enter code here:

## Plot the loss curves for the training and validation on top of one another in two different colors! Given what we've learned about loss curves, is the model done training? has it overfitted?

## Retrain your network until you think the model has trained properly (i.e. not underfitting, not overfitting, just right 😊. Write down the 3 epoch values! One for underfitting, one for overfitting, and one for just right!

In [None]:
# Enter Code Here:

# 🛑✋ STOP ✋🛑 only once you've trained your model propely should you continue onto the next section!

## Now visualize your trained original data and trained reconstructions on the same plot with 2D PCA

In [None]:
# Enter code here:

## Now visualize your test original data and test reconstructions on the same plot with 2D PCA

In [None]:
# Enter coer here:

# Do you notice anything interesting about these plots?