# Project 2

Project 2 concerns predicting a multidimensional output from a multidimensional input. You will start with simple tools like regression and try to work your way up to something more sophisticated like a Kalman Filter or a Wiener Filter.

## <a id='sec_bci'>1 - Brain Computer Interfaces</a>

Neuroscientists study a field called Brain Computer Interfaces in which brain activity is used to control a prosthetic or orthotic. There are many ways to achieve this but a common one is to surgically place multiple electrode arrays in a brain. Each array contains multiple electrodes and each electrode can record from multiple neurons. Its hard to know ahead of time how many neurons each electrode will be able to record from. Its typically 0-4, although more are sometimes possible, and it even fluctuates from day to day.

The scientists use specialized equipment that can measure when each neuron _fires_. You don't need to understand what this means to complete this assignment, but it's sort of like an electrical twitch that occurs when the neuron recieves enough input. A prevailing theory in neuroscience is that the brain's thoughts are encoded by neurons fire faster or slower. For example, Neuron 1 might fire faster if you plan on moving your hand left, while Neuron 2 might fire slower if you plan on moving your hand left. If you notice that Neuron 1 starts firing faster while Neuron 2 slows down, you might assume that the subject is planning to move his hand to the left.

The problem is that there are hundreds of neurons and the scientists don't know ahead of time how each neuron thinks. And furthermore, neuron firing is probabilistic. A neuron might be _more likely_ to fire more, but it doesn't mean it actually will. Its like saying a good team has a higher probability of winning a game, but they could still lose three in a row. Our goal is to build different mathematical models that can predict movement from neurons firing.

If you want more background and you have 18 minutes to spend, you might enjoy [watching this video](https://www.youtube.com/watch?v=HV-k7EwZVNQ "YouTube"). It tries to capture a bit of what I'm explaining here in words.

## <a id='sec_data'>2 - Data</a>

The data for this assignment is:

O'Doherty, Joseph E., Cardoso, Mariana M. B., Makin, Joseph G., & Sabes, Philip N. (2020). Nonhuman Primate Reaching with Multichannel Sensorimotor Cortex Electrophysiology [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3854034

The paper that describes the data collection and data analysis is:

Makin, J. G., O’Doherty, J. E., Cardoso, M. M. B., & Sabes, P. N. (2018). Superior arm-movement decoding from cortex with a new, unsupervised-learning algorithm. Journal of Neural Engineering, 15(2), 026010. https://doi.org/10.1088/1741-2552/aa9e95

The supplementary Juptyer Notebook will show you how to load and navigate the files. There are 47 `.mat` data files. You don't need to download all of them (its 24GB all together). Downloading one or two should be more then sufficient to complete this assignment.

## <a id="sec_todo">3 - What to Do</a>

Download one or two `.mat` files and use the `examine_data.ipynb` notebook as a guide to familiarize yourself with the contents. Also, look through `refh_results.csv` (also from the [data site](https://zenodo.org/record/3854034)) to see the different experiments that the study's authors have attempted. Your goal will be to re-create some of the rows in that spreadsheet.

### <a id='sec_tvt'>3.1 - Training vs. Test</a>

In all kinematic prediction tasks, train on the first `num_training_samples` bins and then test on the remainder (see `refh_results.csv`). This is roughly equivalent to fitting a line-of-best-fit to the first 100 data points in a sequence and then measuring the $r^2$ or mean-squared-error on the remaining data points.

### <a id='sec_regression'>3.2 - Regression</a>
Start with a _regression_. This is basically a multi-variable line-of-best-fit between neuron firing rates and a kinematic variable such as cursor x-position or cursor y-position. If there are, say 100 neurons, then your regression essentially says $pos[t] = b_0 + \sum b_ix_i[t]$ where the $b_i$ values are coefficients that are learned during training and the $x_i$ values are the spike rates for neuron $i$ at each point in time. Predict cursor position using neural firing rates, and compute the r-squared metric as a measure of how good the prediction is (it probably won't be great but that's ok).

### <a id="sec_kf">3.3 - Kalman Filter</a>
A Kalman filter is a (yet another) technique for making predictions based on noisy measurements from hidden items. It allows us to take advantage of the fact that physics allows us to make good predictions of cursor position in the future based on its current position and velocity. Simple regression doesn't take advantage of this information. The Kalman Filter basically says _given my current position and velocity, my future predicted position and velocity are mostly known and therefore neural spike counts are only needed to refine this estimate_.

The other feature of Kalman Filters is that in addition to modeling the data, they also model how certain we are that the model is any good. If it thinks the model is good, it might choose to weight the neural-based prediction over the physics-based prediction. Conversely if it has no confidence in its neuron model, it will ignore it and make predictions mostly from the physics alone. This confidence can fluctuate in time.

There are many online resources for learning Kalman Filters. My favorite by far is [KalmanFilter.net](https://www.kalmanfilter.net/).
My advice is to read from the beginning up through the section titled "Kalman Filter in One Dimension".

Other good resources are:
- [Kalman Filters Explained Simply](https://thekalmanfilter.com/kalman-filter-explained-simply/)
- [Kalman Filtering Demo](https://www.codeproject.com/articles/326657/kalmandemo)
- [Kalman Filter For ~Dummies~ 5033 Students](http://bilgin.esme.org/BitsAndBytes/KalmanFilterforDummies)

Your goal is to attempt to predict cursor location from neural activity using a Kalman Filter. Try it on more or more signals and compute your performance in terms of $r^2$. How does performance compare to [Regression](#sec_regression)?


### <a id="sec_wf">3.4 - Wiener Filter</a>

Wiener Filters are another well-known signal estimation technique. Again, there are many good resources online but I found [this one](https://webee.technion.ac.il/people/shimkin/Estimation09/ch3_Wiener.pdf) to be especially helpful

Other good resources are:
- [Stanford EE264](https://web.stanford.edu/class/archive/ee/ee264/ee264.1072/mylecture12.pdf)
- [MIT 6.011](https://ocw.mit.edu/courses/6-011-introduction-to-communication-control-and-signal-processing-spring-2010/f135b328c7448bf21c4939ea9ff8f8fb_MIT6_011S10_chap11.pdf)

Attempt to implement a Wiener filter that can predict cursor position. Try it on one or more signals and compute your performance in terms of $r^2$. How does performance compare to [Regression](#sec_regression) and [Kalman Filters](#sec_kf)?

## <a id="sec_turnin">4 - What to Turn In</a>

You can work in groups of two or three. 

At a bare minimum, I expect __everyone__ to be able to produce functional Linear Regression and to try to implement either Kalman or Wiener Filtering. __Strong__ teams will succeed at one or the other; __excellent__ teams will succeed at both. __Superlative__ teams can attempt some of the other more sophisticated filters described in [Makin 2018](https://doi.org/10.1088/1741-2552/aa9e95) such as Extended or Unscented Kalman Filters. 

All teams must prepare a 10-minute presentation for the last day of class (Monday December 5th). You don't need to prepare slides if you don't want, but at a minimum be prepared to explain to the class what you did, what were your results, etc. If you are going to present a Jupyter Notebook, be sure it is uncluttered and clear. Your "presentation" notebook doesn't have to be the same as your "work" notebook. In particular, I'd be interested in hearing what assumptions you've made, how well your findings agree with the results in `refh_results.csv`, and any relevant observations or findings you've made along the way. Plots that show predicted versus true cursor position would be appreciated.

Finally, use Canvas to turn in one ore more Jupyter Notebooks that show your work. Again, these should be clear and uncluttered.

## <a id="sec_ack">Acknowledgements</a>

Thanks to [Dr. Joey O'Doherty](http://neuroengineer.com/) for pointing me to this data and answering my question.