# Hands on session: Dimensionality reduction with PCA
In this example, we will explore Principal Component Analysis (PCA) for dimensionality reduction of neural data.

This exercise refers to Chapter 2 "Principal component analysis (PCA)" of the "Dimensionality reduction in neuroscience" course (tutor: Fabrizio Musacchio, Oct 17, 2024)

## Acknowledgements:
This tutorial is adapted from the 2023's course 'data analysis techniques in neuroscience'  by the Chen Institute for Neuroscience at Caltech: 

<https://github.com/cheninstitutecaltech/Caltech_DATASAI_Neuroscience_23>

The dataset (also provided by the above tutorial) is from the paper:

Remedios, R., Kennedy, A., Zelikowsky, M. et al. Social behaviour shapes hypothalamic neural  ensemble representations of conspecific sex. Nature 550, 388–392 (2017). <https://doi.org/10.1038/nature23885>

## Dataset
We will work with calcium imaging data from one male mouse. The calcium imaging recordings are already converted into into continuous neural signals. The male mouse had different visitors in its cage throughout the recording (female or male), associated with a corresponding behavioral response (attack or no attack). With PCA, we will explore the neural responses to these different social stimuli.

## Environment setup
For reproducibility:

```bash
conda create -n dimredcution python=3.11 mamba -y
conda activate dimredcution
mamba install ipykernel matplotlib numpy scipy scikit-learn -y
```

We begin by loading the necessary libraries:

In [1]:
# %% IMPORTS
import os
import matplotlib.pyplot as plt
import numpy as np
import time

from mpl_toolkits import mplot3d
from numpy.linalg import svd
from scipy.io import loadmat
from sklearn.decomposition import PCA

# set global properties for all plots:
plt.rcParams.update({'font.size': 14})
plt.rcParams["axes.spines.top"]    = False
plt.rcParams["axes.spines.bottom"] = False
plt.rcParams["axes.spines.left"]   = False
plt.rcParams["axes.spines.right"]  = False

Next, we define the path to the data. If you are running this script in a Google Colab environment, you need upload the data file `hypothalamus_calcium_imaging_remedios_et_al.mat` from the GitHub repository to your Google Drive; please follow further instructions [here](https://www.fabriziomusacchio.com/blog/2023-03-23-google_colab_file_access/).

In [2]:
# %% DEFINE PATHS
DATA_PATH = '../data/'
DATA_FILENAME = 'hypothalamus_calcium_imaging_remedios_et_al.mat'
DATA_FILE = os.path.join(DATA_PATH, DATA_FILENAME)

RESULTSPATH = '../results/'
# check whether the results path exists, if not, create it:
if not os.path.exists(RESULTSPATH):
    os.makedirs(RESULTSPATH)

Now we load the data and inspect its structure:

In [3]:
# load the data:
hypothalamus_data = loadmat(DATA_FILE)

## 📝 Inspect the data
Inspect the type and structure (e.g., shape, keys) of the data:

In [None]:
# Your code goes here:



## 📝 Extract the $N$ main data arrays into $N$ separate variables

In [15]:
# Your code goes here:



## 📝 What is the dimensionality of each of the $N$ data arrays? What do you think the dimensions represent?

In [None]:
# Your code goes here:



## 📝  Plot the neural data with Matplotlib's imshow function. What do you see?

In [None]:
# Your code here:



## 📝 Plot the attack vector: What do you see?

In [None]:
# Your code goes here:



## 📝  Plot the gender vector: What do you see?

In [None]:
# Your code goes here:



## PCA principle
We now apply PCA to the neural data to reduce its dimensionality. We will apply PCA as follows: We treat each time step in the neural data as an $N$-dimensional data point, where $N$ is the number of neurons. We use PCA to reduce the dimensionality of the data points to S dimensions, with $S\lt N$. $S$ can be e.g. 3 or 10. We will then plot the data points in the $S$-dimensional space spanned by the principal components (PC).

## 📝 Estimate the number of principal components necessary
Before we perform PCA, we need to estimate how many principal components we should keep:

1. Create a PCA object (model) with 10 components
2. Fit the model to the neural data
3. Plot the explained variance ratio of the principal components (`your_model_fit.explained_variance_ratio_`). How much variance is explained by each of these principal components? What does this tell you about the data?

In [None]:
# Your code goes here:

# n_components  =
# PCA_model_S10 = 


# fit the PCA model to the neural data:


# print the explained variance ratio of the principal components:

# How much variance is explained by each of these principal components? What does this tell you about the data?


## 📝 Perform the PCA
We now apply PCA and set $S=3$. We will then plot the data points in the 3D space spanned by the  first three principal components:

1. Create a PCA object (model) with 3 components.
2. Fit the model to the neural data.
3. What is the dimensionality of the PCAed neural data? What do these dimensions mean?
4. Plot the data points projected on the 3 principal components (PC) in the 3D space (`ax = plt.axes(projection='3d')`, `ax.plot3D()`). Also plot the 3 PC as 2D projections in a separate plot of 3 subplots.

Useful tip: With `ax.view_init(elev=30, azim=45)`, you can change the view of the 3D plot.

In [None]:
# Your code goes here:

# create a PCA object (model) with 3 components:
# n_components = 
# PCA_model_S3 = 


# fit the PCA model to the neural data:

# What is the dimensionality of the PCAed neural data? What do these dimensions mean?


# What interesting things do you notice about your dimensionality reduced data? Why are the axes 
# so different from each other? What do you think they represent?


## 📝 Deciphering the structure of the data in the PCA space – factor: time
We will now try to understand the structure of the data in the PCA space by looking at the different parameters of the data. We start with the factor time:

1. Color-code the data points in the PCA space according to their occurrence in time (3D and 2D plots). What do you see?

In [None]:
# Your code goes here:

# 3D plots:

# 2D plots:

# What do you see? What does this further tell you about the structure of the data in the PCA space?


## 📝 Deciphering the structure of the data in the PCA space – factor: behavior
Next, color-code the data points in the PCA space according to the attack vector. First, plot all  PCs, and then overplot the PC where `attack_vector` is 1. What do you see?

In [None]:
# Your code goes here:

# 3D plots:

# 2D plots:




# What do you see?



## 📝 Deciphering the structure of the data in the PCA space – factor: gender
Next, color-code the data points in the PCA space according the `gender_vector`. What do you see?

In [None]:
# Your code goes here:

# 3D plot:

# 2D plots:

# What do you see?



## 📝 Bonus: PCA with just 2 components
Repeat everything, but this time reduce the data to 2 PCs:

1. Train a model on the neural data with 2 PCs.
2. How much explained variance do these 2 PCs capture? Do you notice anything interesting about these 2 PCs? 
3. How is time visualized in these 2 PCs?
4.  How is attack visualized in these 2 PCs?
5.   How is intruder sex visualized in these 2 PCs?


# Your code goes here:
