# Day 2 Exercises (NumPy + Matplotlib)

## Part 1: Basic NumPy Operations
a) Generate an array of numbers 0-24. Reshape to a 5x5 matrix.

In [4]:
import numpy as np
x = np.arange(25)
print(x)
y = x.reshape(5,5)
print(y)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


b) Extract the diagonal of this matrix.

In [8]:
z = np.diag(y)
print(z)

[ 0  6 12 18 24]


c) Multiply the matrix by an identity matrix of the same shape. Confirm that it is identical to the original.

Hint: Use `np.all` command to confirm all equal. 

In [20]:
w = np.identity(5)
#print(w)
m = y @ w 
print(m)
print(y)
#l = m == y
#np.all(l)
np.all(m==y)

[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


True

d) Join the matrix with itself and return a new matrix with shape (2,5,5).

In [29]:
np.concatenate([m,m])
np.ndim(np.concatenate([m,m]))
m3d = np.concatenate([m,m]).reshape(2,5,5)
np.ndim(m3d)
print(m3d)

[[[ 0.  1.  2.  3.  4.]
  [ 5.  6.  7.  8.  9.]
  [10. 11. 12. 13. 14.]
  [15. 16. 17. 18. 19.]
  [20. 21. 22. 23. 24.]]

 [[ 0.  1.  2.  3.  4.]
  [ 5.  6.  7.  8.  9.]
  [10. 11. 12. 13. 14.]
  [15. 16. 17. 18. 19.]
  [20. 21. 22. 23. 24.]]]


e) Compute the mean of the concatenated matrix along the first axis. Confirm its equal to the original matrix.

In [30]:
np.mean(m3d,axis=0)

array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.],
       [20., 21., 22., 23., 24.]])

f) Return the indices of the matrix where the elements are greater than 15.

In [34]:
print(m)
np.where(m>15)

[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]]


(array([3, 3, 3, 3, 4, 4, 4, 4, 4]), array([1, 2, 3, 4, 0, 1, 2, 3, 4]))

g) Using `np.where`, set all elements of the matrix greater than 15 to 1, else 0.


In [37]:
np.where(m>15,1,0)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

h) Set all elements of the matrix greater than 15 to 2, less than 5 to 1, else 0.

Hint: `np.where` can be passed as an input to `np.where`.

In [40]:
print(m)
np.where(m>15,2,(np.where(m<5,1,0)))

[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]]


array([[1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 2, 2, 2, 2],
       [2, 2, 2, 2, 2]])

i) Return the lower triangle of the original matrix.

In [41]:
print(np.tril(m))

[[ 0.  0.  0.  0.  0.]
 [ 5.  6.  0.  0.  0.]
 [10. 11. 12.  0.  0.]
 [15. 16. 17. 18.  0.]
 [20. 21. 22. 23. 24.]]


j) Define a demean function.

In [42]:
def demean(sec):
    mean_sec = np.apply_along_axis(mean, axis=sec, arr=m).round(2)
    return m - mean_sec

def demean2(sec):
    mean_sec2 = np.mean(sec)
    return sec - mean_sec2

k) Apply the demean function across each row of the matrix.

In [52]:
mean_sec = np.apply_along_axis(demean2, axis=1, arr=m).round(2)
print(mean_sec)
np.mean(mean_sec)

[[-2. -1.  0.  1.  2.]
 [-2. -1.  0.  1.  2.]
 [-2. -1.  0.  1.  2.]
 [-2. -1.  0.  1.  2.]
 [-2. -1.  0.  1.  2.]]


0.0

## : Spike Detection

In the following exercises, you will be manipulating, analyzing, and visualizing preprocessed extracellular electrophysiological data. Specifically, the following 10s recording was taken from the abdomen of a crayfish. Action potentials are readily apparent throughout the entire recording. 

First, we load the data.

In [None]:
import numpy as np

## Load data.
npz = np.load('spikes.npz')
data = npz['data'] * 1e6      # Convert to uV
times = npz['times']

a) Plot the entire raw recording. Do multiple types of spikes appear to be present?

b) In a recent paper, [Rey et al. (2015)](https://www.sciencedirect.com/science/article/pii/S0361923015000684) suggest a simple spike detection technique via data-driven amplitude thresholding. Specifically, they propose an automated amplitude threshold that defined as multiple of an estimate of the standard deviation of the noise:

$$ \text{threshold} = k \cdot \hat{\sigma}_n $$

where $k$ is a constant typically between 3-5; and $\hat{\sigma}_n$ is an estimate of the standard deviation of the noise, defined as:

$$ \hat{\sigma}_n = \frac{\text{median} \left( |X| \right)}{0.6745} $$ 

where $|X|$ is the absolute value of the raw data.

Write a function that returns the amplitude threshold as defined above. The function should accept as arguments the raw data, $X$, and the constant, $k$. 

c) Next we need a function that can detect slices of the raw signal that exceed the threshold. This ultimately becomes a clustering problem (i.e. identifying "islands" of signal rising above an "ocean of noise"). Though this is definitely doable with core NumPy, the SciPy library has built-in functions specifically written for these purposes. 

Because these functions are beyond the scope of the bootcamp, we have provided a peak finding function for you. The function, `peak_finder`, accepts a raw data trace and a threshold. It then finds all clusters of samples above a threshold, and returns the index and signal magnitude corresponding to the peak of each cluster.

The function relies on the `measurements` tools from scipy.ndimage. For a tutorial, see [here](https://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/).



In [None]:
def peak_finder(X, thresh):
    """Simple peak finding algorithm.
    
    Parameters
    ----------
    X : array_like, shape (n_times,)
        Raw data trace.
    thresh : float
        Amplitude threshold.
        
    Returns
    -------
    peak_loc : array_like, shape (n_clusters,)
        Index of peak amplitudes.
    peak_mag : array_like, shape (n_clusters,)
        Magnitude of peak amplitudes.
    """
    import numpy as np
    from scipy.ndimage import measurements
    
    ## Error-catching.
    assert X.ndim == 1
    
    ## Identify clusters.
    clusters, ix = measurements.label(X > thresh)
    
    ## Identify index of peak amplitudes. 
    peak_loc = np.concatenate(measurements.maximum_position(X, labels=clusters, index=np.arange(ix)+1))
    
    ## Identify magnitude of peak amplitudes.
    peak_mag = measurements.maximum(X, labels=clusters, index=np.arange(ix)+1)
    return peak_loc, peak_mag

d) Apply the peak detection algorithm to the raw data using a constant $k=6$. Plot a histogram of the spike amplitudes (try bins of 0-150 in increments of 5 uV). 

How many spikes are detected? How many types of spikes do there appear to be?

e) Plot the first second of the data. Using a scatterplot (or any other method you can think of), indicate the peak for each detected spike.

f) Remake the plot above, but repeating the procedure with a constant $k=2$. How trustworthy is the spike detection algorithm with this more liberal threshold?

g) Returning now to the detected spikes when $k=6$, define a set of boundaries that divides the spikes into three clusters. How many spikes are in each cluster?

h) Action potentials last roughly 1-2 milliseconds. With this in mind, extract a 3 ms window around each detected spike; that is, extract 1.5 ms of samples on either side of the detected peak. Store each epoch according to its cluster. 

Hint: The data were recorded at 10 KHz meaning there are 10 samples per millisecond. 

i) Plot each averaged spike waveform in a single plot. Add a legend denoting the spike cluster.

## RGB images (from MIT Lincoln Labs)

A digital image is simply an array of numbers, which instructs a grid of pixels on a monitor to shine light of specific colors, according to the numerical values in that array.

An RGB-image can thus be stored as a 3D NumPy array of shape-(V,H,3). V is the number of pixels along the vertical direction, H is the number of pixels along the horizontal, and the size-3 dimension stores the red, blue, and green color values for a given pixel. Thus a (32,32,3)

array would be a 32x32 RGB image.

You often work with a collection of images. Suppose we want to store N images in a single array; thus we now consider a 4D shape-(N, V, H, 3) array. For the sake of convenience, let’s simply generate a 4D-array of random numbers as a placeholder for real image data.

Specifically:

* generate a 4D array that holds 500, 48x48 random RGB images (think about the shape this array should have, and use np.random.rand liberally)

* then, normalize those images (by dividing through by max intensity) so that the largest intensity within each color channel within each image is set to 1, but relative intensities are preserved.

## EEG time series (adapted from Richard Gao at UCSD)

The file EEG_exp.mat contains binary MATLAB data that scipy.io.loadmat() (see below) will pull into a dictionary.  Pertinent keys in that dict:

* 'EEG' -- a 1D array of over 700K floats, representing EEG data (in microV) sampled at a rate given by...
* 'fs' -- the sampling frequency, in Hz
* 'trial_info' -- a 2D array (300x2), each row of which contains the time (in seconds) after start of the  experiment at which some stimulus was applied to the subject followed an integer 1,2, or 3 denoting  the kind of stimulus

Assume the trial_info timestamps and EEG data are both sampled ar rate fs so that no "fudging" is necessary with timestamps.

Your goal is to do the following:

1) For each event in trial_info, find the index of the corresponding time in the EEG data

2) Pull out a subarray of EEG data in a window (epoch) around that time, that stretches from 0.5 seconds before the event to 1 second after the event

3) Make an array of times (can be the same for all events) that labels the event time within this window as t=0, with negative/positive t values for parts of the epoch before/after the event.

4) Subtract off a baseline from all the values in each epoch (baseline should be the mean of just the t<0 times in that epoch)

5) Now aggregate all the events of category 1 and generate a single "averaged" de-baselined signal over all category 1 events.  Do likewise for the category 2 & 3 events.

6) Make a single plot (with labelled axes and, ideally, a legend) showing the average signal for each of the three categories of stimulus.

The key is to try to leverage what you've learned about numpy and matplotlib to do these operations efficiently and end up with fairly streamlined (but still readable) code.

Good luck!


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import io # this submodule let's us load the signal we want
%matplotlib inline

In [None]:
# You'll need a scipy utility function to read this MATLAB data file
# scipy loads .mat file into a dictionary
# the details are not crucial, we just have to unpack them into python variables.

# So something like...
EEG_data = io.loadmat('EEG_exp.mat', squeeze_me = True)