# Part 1: Introduction and Downloading the data

## Image reconstruction from brain data

### Motivation and background

This report will aim to explore different methods for retrieving visual information from brain data. This is an extremely novel and interesting task whose effective completion will be a significant milestone in our interpretation of the brain and its function. The key difficulties in reconstructing images from brain data are as follows:
1. brain representations are not concrete, they are abstract. Our brain does not keep our visual imagery in a nice compact embedding in one part of the brain. Information is distributed, abstracted, and partial. The brain does not keep tabs on everything that happens around us, but rather efficiently stores the salient components of it and the ones that we are focused on. 
2. The resolution of our imaging is limited. Especially with functional fmri, we can only get down to a certain resolution which is still much bigger than the level of the neuron. This means we are far from having access to all of the information encoded in the brain. 
3. Processes are still being developed for handling brain data, especially across subjects. Every brain is unique and every brain contains a lot of noise. The problem of extracting useful and shared information from across brains is tricky and is far from being perfected. 


Being able to understand and interpret the brain is one of the greatest challenges posed to the human race, and one whose solution would be a vindication of our grasp of the natural world. The problem of extracting visual, sound or semantic information from the brain is one of the most fundamental first steps to achieving this understanding. This analysis here is my first step into the space, and has been extremely interesting and rewarding. 

### Problem formulation
The problem is as follows. 



Participants are shown a series of images (I, identical across participants). This information is encoded within the brain, and measured by us. Theoretically, everything that we see should be recoverable from within that encoding. Hence, the problem here is to take that encoding → E and find a function f(E), that will reconstruct our original image at minimal loss. 



$\displaystyle \min L( E) \ =\ I$ - f(E)



Where f is our neural network. There are further preprocessing functions (p(E)) we could apply to the data to change it into a lower dimensional form, but this is the primary construction. 


A solution would be constituted by finding an architecture which can recreate the original images to the degree that they are recognisable for what they originally where. Complete image reconstruction will be extremely hard to achieve, getting to a recognizable standard is reasonable objective. 

### Method and justification

The method and reasons will be expounded throughout the body of this notebook, here we provide a brief overview for the tricks and methods we employ:

First, we employ dimensionality reduction. The methods commonly used to achieve this are PCA, and shared response modeling (SRM). PCA is familier, SRM is a more domain specific approach. Further, SRM allows us to align the brain spaces accross subjects, which is extremely useful. 



In SRM we minimize the following optimization problem:



$\displaystyle \min I_{i} \ =\ w_{i} \ \times \ E$



Where E is an embedding which is shared across all subjects, w is a weight matrix specific to each subject and I is the original image. Essentially, we have to find weights and an embedding which most closely reconstructs the original brain space of each subject. This leaves us with a set of weights which we can use to transform each subjects data into a common space. This common space is typically of much smaller dimensionality (as decided by the researcher). 



In this report we will be using both methods and seeing how they perform. SRM is generally preffered, however, because it allows us to align brains across participants, which is an extremly useful thing. It effectively triples (in this case atleast and usually more) the size of our dataset. 


Further, we will also attempt to use GANs to reconstruct more realistic looking images. The problem with direct reconstruction is that we often end up with extremely messy, soupy looking images. A GAN can be used as a generator which searches for the nearest 'real looking neighbor' which produces an embedding similar to the one induced by our brains. To achieve this, theoretically, we train a neural network to take images into the embedding space which we have found using PCA or an SRM. Then we use the GAN to search for an image which most closely encodes to become the encoding that was generated by the brain. 


In this report we have had mixed success with this. It is a tricky task and very finnicky. Our preliminary results will be shared but this is certainly a task that will require more work to perfect. 


### Evaluation methods

Once again, evaluation will be performed throughout the body of the notebook. Given the nature of this task, primarily evaluation will be qualitative. An Image is either reconstructed to some degree or its just a mess. 

More concretely, the metrics that we will use to optimize our functions are
1. MSE (pixel by pixel), which is the primary optimization metric
2. We will also attempt to employ a pretrained vision net to extract image embeddings and use those to evaluate the similarity of our images (Using MSE on the embeddings. Embeddings are extracted by feeding the recreated image and the original image through the vision net and seeing how the encoding looks like in the second to last layer

### Results and insights will be shared throughout the report. 
A key insight is that this problem is really hard. I used largely preprocessed data, but to do this really well, it is probably necessary to have a handle on the whole procedure - i.e. very precisely handle the fmri data so that absolutely no valuable information is lost. We achieved recreation of color and some basic shape, however we are still a far cry from effective image recreation. With more time, I believe these methods can get there, however, much more optimization will be required. 

### Implementation

In this section of the report, we download the data and make sure that it carries information about the images. In this way we can prove that we have imported the data correctly. 

In [2]:
import numpy as np
import torch
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.decomposition import PCA
from tqdm import tqdm
import sys
import brainiak.funcalign.srm
import matplotlib.pyplot as plt
%autosave 5

Autosaving every 5 seconds


After downloading our data from the OpenNeuro, we start with a simple classifier to test that there are no problems with the data. Here we only test on one subject because the brains have not yet been aligned. See make_data.py to see how the data is downloaded and preprocessed.

Repetitively taking a subsample of 2 images (5 ims each class so 10 data points) to classify on.

In [32]:
# Loading the data
sub_01_labels = np.load("np_data/sub_01_labels_train.npy")
sub_01_fmri = np.load("np_data/sub_01_fmri_train.npy")
sub_01_images = np.load("np_data/sub_01_images_train.npy")

# PCA on the fmri to help the classifier
pca = PCA(n_components=300)
decomposed_fmri = pca.fit_transform(sub_01_fmri)
accuracies = []

# Define the classifier
clf = svm.SVC(gamma = 'auto')

for i in tqdm(range(10000)):
    ss = np.random.randint(1, 1200, size = 2) # taking a subsample of 2 categories
    if ss[0] == ss[1]: # so we don't sample twice the same number
        ss[1] = ss[0]+1
    ss_coords = np.in1d(sub_01_labels, ss).nonzero()[0] # getting coords for all occurences of the class
    shuffle = np.random.permutation(len(ss_coords)) # shuffle
    labels_ss = (sub_01_labels[ss_coords])
    fmri_ss = (decomposed_fmri[ss_coords])

    # keeping the training and test data balanced
    mask = np.hstack([np.random.choice(np.where(labels_ss == _class)[0], 4, replace=False)
                      for _class in np.unique(labels_ss)])
    
    # Extract 
    selected = np.where(~np.in1d(np.arange(len(labels_ss)), mask))[0]

    train_x, train_y = fmri_ss[mask], labels_ss[mask]
    test_x, test_y = fmri_ss[selected], labels_ss[selected]

    clf.fit(train_x, train_y)
    accuracies.extend((clf.predict(test_x) == test_y))

100%|██████████| 10000/10000 [00:04<00:00, 2303.90it/s]


In [33]:
print(f"yielding us {sum(accuracies)/len(accuracies)} in a binary classification task")

yielding us 0.56005 in a binary classification task


slightly above 50% accuracy. Not great, but not too bad considering we are only training on 8 datapoints per instance and are using a crude model. However, this is sufficient evidence to prove that the data is carrying information about the images. 

They have two versions of the data. Downloading the second version to see if it carries more info. 


In [3]:
# Loading the data
sub_01_labels = np.load("np_data_v2/sub_01_labels.npy")
sub_01_fmri = np.load("np_data_v2/sub_01_fmri.npy")
sub_01_images = np.load("np_data_v2/sub_01_images.npy")

# PCA on the fmri to help the classifier
pca = PCA(n_components=300)
decomposed_fmri = pca.fit_transform(sub_01_fmri)
accuracies = []

# Define the classifier
clf = svm.SVC(gamma = 'auto')

for i in tqdm(range(10000)):
    ss = np.random.randint(1, 1200, size = 2) # taking a subsample of 2 categories
    if ss[0] == ss[1]: # so we don't sample twice the same number
        ss[1] = ss[0]+1
    ss_coords = np.in1d(sub_01_labels, ss).nonzero()[0] # getting coords for all occurences of the class
    shuffle = np.random.permutation(len(ss_coords)) # shuffle
    labels_ss = (sub_01_labels[ss_coords])
    fmri_ss = (decomposed_fmri[ss_coords])

    # keeping the training and test data balanced
    mask = np.hstack([np.random.choice(np.where(labels_ss == _class)[0], 4, replace=False)
                      for _class in np.unique(labels_ss)])
    
    # Extract 
    selected = np.where(~np.in1d(np.arange(len(labels_ss)), mask))[0]

    train_x, train_y = fmri_ss[mask], labels_ss[mask]
    test_x, test_y = fmri_ss[selected], labels_ss[selected]

    clf.fit(train_x, train_y)
    accuracies.extend((clf.predict(test_x) == test_y))

100%|██████████| 10000/10000 [00:04<00:00, 2437.79it/s]


In [4]:
print(f"yielding us {sum(accuracies)/len(accuracies)} in a binary classification task")

yielding us 0.53865 in a binary classification task


53% classification accuracy vs 56%. Seems like the first version performs better, we will use the first version of the data in this report. 