# Intro...
So, over the past couple of weeks I’ve been working on applying Bayesian RSA to our data set rather than the more classical RSA we’ve been currently using. I’m going to try to explain in broad terms that method to you, why it's better than our current one, and how I've been implementing it for our specific design.

# What we do…
Our application of RSA assumes that fMRI responses are related to task-related events through a general linear model (GLM): 

Y = X · β + ε. 

where:
- epsilon is, of course, error
- Y is the fMRI time series from an experiment with a certain number of time points from a certain number of brain voxels
- X is the design matrix - or, the HRF-convoluted event time courses for each task condition, 
- and β is the spatial response pattern (ie response across all voxels) to each experimental condition [each row represents a condition, each column represents a voxel]

The goal of RSA is just to estimate the similarity between the rows of β - a matrix we’ll call U. 

Our strategy for doing so involves estimating β using our knowledge of Y and X with ordinary least square estimation, and then deriving our similarity measure U from it. This is where RSA goes wrong according to Mingbo.

Without getting in the math, our estimate of β includes a bias term that depends on both our design matrix and the properties of our noise. So when we derive our similarity matrix from this biased estimate, issues like temporal autocorrelation in fMRI noise can introduce spurious structure into our measurements that doesn’t really exist. The big mistake of this classical approach to RSA is that it treats our noisy estimate of β as the real thing and then performs a secondary analysis (correlation) on this noisy estimate.

# Mingo’s idea…
Since the bias comes from inferring the similarity matrix indirectly from our estimation of β, what Mingo’s Bayesian approcah does is try to estimate the similarity matrix directly. It performs maximum likelihood estimation of the shared covariance structure of activity patterns directly from the data. 

The details are more complicated than our current strategy but can be summarized easily. It starts with a generative model based on the same GLM assumptions as we’ve already discussed, but also allowing the noise term and the signal-to-noise ratio to vary across voxels. Then under these constraints/allowances, it finds the similarity matrix that maximizes the likelihood of our observed time courses.

# A review of our design…
For a sec, let me review our design. This turns out, in fact, to be a review of an old version of our analysis strategy, but reflects the way I've been approaching the problem so far ad 

EXPERIMENT DESIGN
There are four objects (bed, bench, chair, table). Participants viewed each object 20 times per run.
Runs 1 & 2 -- reserved for analyses like searchlight
Runs 3 & 4 -- pretest phase 
Four training runs involving practice drawing two of the trained objects. 
Runs 5 & 6 -- posttest phase

GOAL
Compare similarity between Trained object representations before and after training (vs. Control) in several anatomically-defined ROIs. To do this, for each phase, we produce an 8 by 8 similarity matrix sorted such that each row/column corresponds to an object category for a specific run, and object categories are sorted such that the first two correspond to the object categories participants will practice (trained) and the other two correspond to the object categories they won’t (control).

e.g. (rows)
run 1 bed [trained]
run 1 chair [trained]
run 1 bench [control]
run 1 table [control]
run 2 bed [trained]
…

With this structure, we can consider the top right 4x4 block as M, and split that into four different 2x2 matrices that themselves can be divided into elements A, B, C, D:

[_A_|_B_]
[_C_|_D_]

We can call the top left 2x2 part of M, M-trained, and the bottom right part of M, M-control, because their elements reflect the representational similarity of object categories between these two conditions. 

For either the trained or untrained 2x2 submatrix, its diagonal elements (A, D) reflect the representational similarity of object categories with themselves (same object similarity between phases), and its off-diagonal elements (C, B) reflect the similarity of of object categories between one another (the similarity between different objects across runs). And so…

Mean of (B,C) - Mean(A,D) = Representational distance between objects in this phase for one condition. 

We compute this for our trained object categories before the practice phase and after the practice phase to understand how representational similarity has changed for each of our conditions. Our hypothesis is that the increase in representational distance will be higher for our trained object categories than our control ones.

# How I fit BRSA into this…
This is the part where I’m admittedly a bit unclear about after reflecting on our design the past few days.

The bias term that BRSA seeks to eliminate in similarity estimations is relatively negligible for designs in which many stimuli are presented many times in a well randomized order - the bias would only be reflected in the diagonal elements of our similarity matrices, which we don’t use for our analysis. 

That would seem to suggest that BRSA won’t really serve us much for our project. However, the kind of RSA we’ve traditionally been doing hasn’t *just* been between the 4 object categories that define our stimulus set, though. We’ve been doing comparisons of representational similarity between runs, which is why we derive M from an 8 by 8 matrix. This is strange, though, because these between run comparisons have at least originally been within-phase, not the pre-post kind of comparison that is the main aspect our analysis. The idea seems to have been to isolate the similarity between representations of object categories that does not consist of the similarity that comes with them both being objects.

The BRSA has two required inputs: a design matrix and a timecourse of responses over each timepoint. 

Were we to stick with that, though, generating the relevant correlation matrix involves treating the same object categories that are in different runs as different conditions altogether. There's no real trouble with this implementation-wise; it just requires changing the design matrix so that object categories between runs aren't treated as the same condition. 

At some point along the line, though, our project got a bit more complicated; the β from which we derived our similarity measure (which itself is derived in a somewhat more complicated way) and then computed representational distance turned out to be a splice of β matrices from runs in different phases. Should we try proceeding with that one? If so, it might be worth reviewing what that is and discussing the kind of input to BRSA() necessary to produce it.

Or maybe there's a different strategy we might like to take up?