# Problem 2: Higher-order visual representations and DNNs



In [None]:
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
import torchvision.transforms as transforms
# if you get an error re: torchvision, run this line first: 
# !pip install torchvision

## Helper functions

First we define some functions - you can skip down to the `Load Data` Section.

In [None]:
from scipy.stats import zscore

def visualize_rdm(rdm, ax=None, title=None, vmax=1.75, cmap='viridis'):
    if ax is None:
        fig, ax = plt.subplots(1, 1)

    im = ax.imshow(rdm,  vmin=0, vmax=vmax, cmap=cmap)
    for i_cat in range(0, 49, 7):
      ax.plot([i_cat, i_cat], [0, 49], 'k')
      ax.plot([0, 49], [i_cat, i_cat], 'k')
    ax.set(xticks = np.arange(4, 49, 7))
    ax.set_xticklabels(categories, rotation = 45, ha="right")
    ax.set(yticks = np.arange(4, 49, 7))
    ax.set_yticklabels(categories, rotation = 45, ha="right")
    ax.set(xlim = [0, 49], ylim = [49, 0])

    plt.colorbar(im, ax=ax, label='Dissimilarity')

    if title is not None:
        ax.set(title=title)
    return im


def compute_RDM(resp):
  """Compute the representational dissimilarity matrix (RDM)
  Args:
    resp (ndarray): S x N matrix with population responses to
      each stimulus in each row
  Returns:
    ndarray: S x S representational dissimilarity matrix
  """

  # z-score responses to each stimulus
  zresp = zscore(resp, axis=1)

  # Compute RDM
  RDM = 1 - (zresp @ zresp.T) / zresp.shape[1]

  return RDM

def visualize_image_and_filter(image, conv_filter):
    fig, axes = plt.subplots(1, 2)

    axes[0].imshow(image, vmin = -5, vmax = 5, cmap = 'gray')
    axes[0].set(xticks = [0.5, 1.5, 2.5],
                yticks = [0.5, 1.5, 2.5],
                xticklabels = '',
                yticklabels = '',
                title = 'Image')

    for r in range(4):
      for c in range(4):
        axes[0].annotate(image[c, r], (r, c), color = 'r', fontsize = 20)
    axes[0].grid(color = 'r')


    axes[1].imshow(conv_filter, vmin = -5, vmax = 5, cmap = 'gray')
    axes[1].set(xticks = [0.5, 1.5, 2.5],
                yticks = [0.5, 1.5, 2.5],
                xticklabels = '',
                yticklabels = '',
                title = 'Convolutional filter')

    for r in range(3):
      for c in range(3):
        axes[1].annotate(conv_filter[c, r], (r, c), color = 'r', fontsize = 20)
    axes[1].grid(color = 'r')


## Load data

We will be analyzing neural data like that used in the [Yamins 2014 paper](https://www.pnas.org/content/111/23/8619) mentioned in class.

The authors presented images to monkeys while recording from areas V4 and IT and compared the neural data to performance and activity of several artificial neural network models.

**Images:** They presented images from 7 different categories (Animals, Cars, Chairs, Faces, Fruits, Planes, and Tables). In each category, they had 7 separate objects. They created 40 configurations of each object, showing it on top of different backgrounds and in different sizes, shapes, and rotations. They presented each of these 1960 images (7 categories x 7 objects x 40 examples/configurations) around 40 times.

We won't load in the images as it will take too much memory, but you can [click here](https://drive.google.com/file/d/10PYDy_naIE88aVBEhrGsw8uIWpl-6uBs/view?usp=sharing) to view 5 examples for 4 of the 7 objects in the Animals category.

**Neural data**: While presenting the above images, the authors recorded 168 cells from area IT and 128 cells from area V4.

Execute the next cell to load in the data.


In [None]:
# Load data
IT_resps = np.load('IT_responses.npy')
V4_resps = np.load('V4_responses.npy')

categories = np.array(['Animals', 'Cars', 'Chairs', 'Faces', 'Fruits', 'Planes', 'Tables'])

print("IT_resps:" + str(IT_resps.shape))
print("V4_resps:" + str(V4_resps.shape))

<br>**Info about neural data:**
<br><br>
`IT_resps` is a numpy array of shape 7 x 7 x 40 x 168. The first dimension corresponds to 7 categories, the second to  7 objects per category. The third dimension is the 40 different examples/configurations. And the fourth dimension corresponds to the 168 area IT neurons.

`V4_resps` is set up similarly, except there were 128 neurons recorded.

The numbers inside each array convey the average response to each image for each neuron. The units are NOT firing rates, but normalized, baseline-adjusted spike counts. Note that any temporal information within spiking responses is ignored by this approach.

If you're interested, here's the detailed description of the preprocessing procedure, from the [Supplemental Material](https://www.pnas.org/doi/10.1073/pnas.1403112111#supplementary-materials): <br>
>_"For each image repetition and electrode, scalar firing rates were obtained from spike trains by averaging spike counts in the period 70-170 ms after stimulus presentation, a measure of neural response that has recently been shown to match behavioral performance characteristics very closely. Background firing rate, defined as the mean within-block spike count for blank images, was subtracted from the raw response. Additionally, the signal was normalized such that its per-block variance is 1. Final neuron output responses were obtained for each image and site by averaging over image repetitions."_

<br><br>

# Part 1: Individual neural responses

### 2a: Thinking about selectivity

Execute the following cell to visualize an example neuron's responses to the 1960 images.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(10, 4))

ax.plot(IT_resps[:, :, :, 1].reshape((-1,)), 'k')
ylim = ax.get_ylim()
for i in range(0, 1900, 7*40):
  ax.plot([i, i], ylim, 'r')

_ = ax.set(xticks = np.arange(7*40/2, 1960, 7*40),
       xticklabels = categories,
       ylabel = 'Neural Response');

<br>**2a. Interpret the plot and evaluate this neuron's selectivity.**

i) Does this neuron seem to be category selective? Why or why not?

ii) Does this neuron seem to be strongly object selective? Why or why not?


<font color="#2AAA8A">**Answer**


### 2b: Selectivity index

We will quantify the object selectivity of each neuron by using a selectivity index. First, we compute the mean activity per object from the neuron. Then we can compute the selectivity index as follows:

>selectivity = $\frac{\mu_{max} - \mu_{-max}}{\mu_{max} + \mu_{-max}}$

where $\mu_{max}$ is the highest object mean activity and $\mu_{-max}$ (read as mu _not_ max) is the mean activity across all the other objects. This gives us how much higher the mean activity for the preferred object is than for the rest of the objects, normalized by that neuron's overall amount of activity.

i) If the neuron had the same mean activity for all objects, what would this selectivity index equal?

ii) If the neuron only responsed to one object, what would this selectivity index equal?

iii) Would you expect V4 neurons to have lower or higher selectivity indices than area IT neurons, on average?

<font color="#2AAA8A">**Answer**


### 2c: Quantifying object selectivity (coding)


Compute this selectivity index for all of the IT neurons (stored in the array `selectivity_indices_IT` and for V4 in `selectivity_indices_V4`. Both of these arrays should be of shape (N neurons,). 

Hint: You can code this with a for-loop or with vectorized operations (the latter is typically more efficient, but the former may be more intuitive).


In [None]:
# TODO: Compute selectivity indices

...


print("The median SI for IT is {:.3f} and the median SI for V4 is {:.3f}\n".format(
    np.median(selectivity_indices_IT), 
    np.median(selectivity_indices_V4)))

# Visualize results with histograms
fig, axes = plt.subplots(2, 1)
_, bins, _ = axes[1].hist(selectivity_indices_IT, 100, color='#3c649f');
axes[1].set(title='Selectivity indices for cells in IT', xlabel='Selectivity Index', ylabel='Count')
_ = axes[0].hist(selectivity_indices_V4, bins, color='#2c456b');
axes[0].set(title='Selectivity indices for cells in V4', ylabel='Count')
plt.tight_layout()


### 2d: Evaluate selectivity indices in V4 and IT populations

i) What are your conclusions about area V4 and IT object selectivity based on these histograms?

ii) Do these results support the idea of area IT containing "grandmother cells", or single object detectors? Why or why not?

<font color="#2AAA8A"><span style="font-size:larger;">
**Answer**<br>
<br>
i) 
<br><br>
ii) 
<br><br>

### 2e: Quantitative comparison of SI distributions

Write code to run a statistical test to quantitatively compare the selectivities in V4 and IT. Are the distributions significantly different?

Choose an appropriate statistical test! This might help: https://docs.scipy.org/doc/scipy/reference/stats.html 


In [14]:
### TODO



# Part 2: Population representations via RDMs

### 2f: RDM for area IT (coding)

Let's look at how the different objects and categories are represented in area IT by constructing a representational dissimilarity matrix (RDM).

**Step 1)** We will compute the dissimilarity using the mean response per object, after averaging over configurations. Thus the first step is to average the neural responses over the 40 different configurations of each object.

**Step 2)** Compute the representational dissimilarity matrix for this data and store it in a variable called `rdm`. This should be a 49 x 49 matrix where each entry equals 1 minus the correlation of the neural responses to two images (the image represented by the row number and the image represented by the column number). So the entry at row 4, column 8 is 1 minus the correlation of the responses to images 4 and 8.

Try to vectorize this code! If you want, you can compute it in a loop first and then use that to check your vectorized answer. 


In [None]:
# Compute RDM for IT responses

# TODO

# Step 1


# Step 2



# Visualize rdm (pre-defined in helper code at top)
visualize_rdm(rdm);


### 2g: Interpreting RDMs

**Answer the following questions. RDMs for both the V4 neural responses and the IT neural responses are shown in the cell below.**

i) What does the number in a given row and column of the RDM represent? Be specific about what the neural responses are, what we've averaged over, etc.

ii) Within a category, does the population of IT neurons often respond similarly for different objects? What category is this especially true of? What does this tell you about the type of coding in area IT, e.g. the complexity of the features encoded?

iii) Complete this sentence by filling in the appropriate object categories. "We can predict that a simple linear decoder trained on IT activity would be able to discriminate between [these categories] with high accuracy, while decoding accuracy between [these categories] would be much lower."

iv) How does the V4 RDM differ from the IT RDM? What does this tell you about V4 processing as compared to IT processing?

v) If you recorded from V1 neurons and constructed an RDM, what would you expect it to look like?




In [None]:
V4_rdm = compute_RDM(np.mean(V4_resps, axis = 2).reshape((49, -1)))
IT_rdm = compute_RDM(np.mean(IT_resps, axis = 2).reshape((49, -1)))

# Visualize RDMs
fig, (ax, ax2) = plt.subplots(ncols=2,figsize=(9,4),
                  gridspec_kw={"width_ratios":[1,1]})
fig.subplots_adjust(wspace=0.3)

im = visualize_rdm(V4_rdm, ax=ax, title='V4')
visualize_rdm(IT_rdm, ax=ax2, title='IT')

plt.tight_layout()

<font color="#2AAA8A"><span style="font-size:larger;">**Answer**<br>
<br>
i) 
<br><br>
ii) 
<br><br>
iii) 
<br><br>
iv) 
<br><br>
v) 