Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

# Neuro Data Analysis in Python: Problem Set 5

This is the fourth problem set. It has 2 problems, worth a total of 41 points. It is due before class (i.e. by 10:59 AM) on 11/20/2020. For late policy please see [the syllabus](https://github.com/alexhuth/ndap-fa2020/blob/master/README.md#late-homework--extension-policy). Partial credit will be awarded for partially correct solutions.


## Homework submission

When you've finished, rename the notebook file to `ndap-problem_set_5-YOUREID.ipynb`. For example, if your EID is `ab12345`, you should call it `ndap-problem_set_5-ab12345.ipynb`. Then upload your completed problem set to canvas.

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

import IPython.display as ipd

---
# Problem 1. (12 pts)
You're being given a timeseries (audio) dataset, and your goal in this problem is to analyze it to find what it actually contains.

In [None]:
# Load the data
data = np.load('ps5_dataset_1.npz')['signal']

# The SAMPLING RATE for this data is 44100 Hz (or 44.1 kHz)
# this is typical for audio data
Fs = 44100

In [None]:
# evaluate this cell to create an audio object. hit the play button to hear the raw data
# (this should not sound good)
ipd.Audio(data, rate=Fs)

## (a) Plot the power spectral density (2 pts)
Plot the power spectral density of `data`. Set the sampling rate correctly in your call to the power spectral density function so that the x-axis has the correct range (from zero to the Nyquist frequency). (And remember that you can put a semicolon `;` at the end of a line in jupyter to make it not print the output from that line. This hides all the junk when you plot a power spectrum.)

In [None]:
### YOUR CODE HERE ###

## (b) Plot the spectrogram (2 pts)
Now plot the spectrogram of `data`. Set the sampling rate so that the y-axis is scaled correctly. Add labels for the x- and y-axes.

In [None]:
### YOUR CODE HERE ###

## (c) Filter the signal to remove the noise (8 pts)
From looking at the PSD and spectrogram, you might conclude that `data` contains a lot of noise in the higher frequencies, but something that looks like signal in the low frequencies.

Use `signal.firwin` to create a low-pass filter, and then use `np.convolve` to apply it to `data`. You'll need to select the cutoff frequency and number of taps (the length of the filter), and also set the sampling rate in your call to `firwin`. Then, play the resulting audio and see if you can hear what it says.

Try a few different values for the cutoff frequency and the number of taps. Don't be shy to try large numbers of taps and see what happens.

In [None]:
my_filter = ### YOUR CODE HERE ###

filtered_data = ### YOUR CODE HERE ###

ipd.Audio(filtered_data, rate=44100)

So what does the audio say? Write the answer down here:

---
# Problem 2. (29 pts)

In this problem we'll be plotting and analyzing some EEG data.

In [None]:
# first, load the data
datafile = np.load("ps5_EEG_data.npz")

# what objects are inside this data file?
print(datafile.files)

# load the eeg_data
# this dataset is an EEG recording with 8 channels and 101638 timepoints
eeg_data = datafile["eeg_data"]
print("eeg_data shape:", eeg_data.shape)

# get the sampling rate (Fs) in Hz
eeg_Fs = datafile["Fs"]
print("sampling rate:", eeg_Fs)

## (a) Plot some of the EEG data timeseries (4 pts)

Make 4 plots of the EEG data timeseries:
* One plot showing half a second of data (how many samples is this?)
* One plot showing two seconds of data
* One plot showing 10 seconds of data
* One plot showing 100 seconds data

You can start with just plotting one channel for each, but your finished plot should show all 8 channels on the same axis.

For each plot you need to figure out how many samples to include. You know that the sampling rate (the variable `Fs` that we loaded from the datafile) is 128 Hz, or 128 samples per second.

Please label _at least_ the x-axis of each plot.

**Bonus (+2pts):** Make the x-axis ticks show units of seconds instead of samples.

In [None]:
# plot half a second

In [None]:
# plot 2 seconds

In [None]:
# plot 10 seconds

In [None]:
# plot 1000 seconds

## (b) Plot the power spectrum (psd) of one channel of the EEG data (6 pts)

Use the function `plt.psd` to plot the power spectrum of one EEG channel. Set the sampling rate `Fs` correctly so that you get the correct units of frequency.

Then plot the power spectra for all 8 EEG channels in the same axis.

In [None]:
# plot one power spectrum

In [None]:
# plot the power spectra from each of the 8 channels on the same axis

## (c) Plot a spectrogram of the EEG data (4 pts)
Use the `plt.specgram` function to plot a spectrogram of the first 60 seconds of the EEG data from one channel. You'll need to set the parameter `Fs` appropriately. Label the x- and y-axes appropriately (with units).

(Ungraded) You can also try playing with the `NFFT` and `noverlap` parameters to `plt.specgram`. Some settings of these parameters are illegal and will make `specgram` error--specifically, `noverlap` needs to be smaller than `NFFT`. What effect do these parameters have?

In [None]:
# plot a spectrogram

## (d) Filter the EEG data to remove noise (15 pts)

The big spike at 60 Hz is _definitely_ noise. Let's filter the EEG signal to remove it.

The simplest thing to do would be to low-pass filter just below 60 Hz (since there probably isn't much interesting signal in the 60-64 Hz range anyway, and 64 Hz is the highest frequency we can see here -- Nyquist!!).

**First,** design a low-pass filter using `signal.firwin`. You should set the `cutoff` frequency to something like 55 Hz, and make sure to set the sampling rate `fs` so that `firwin` knows how to handle the cutoff frequency you give it. Look at the docs for `signal.firwin` and check out the demos and notes for lecture 27 to see a demo of how to use this function. You'll also need to choose the number of taps in the filter--remember that fewer taps means a "softer" filter, while more taps means a "sharper" filter. You can play with this parameter to get a result that looks good.

**Second,** plot your filter using `plt.plot` to see what it looks like. Label the x-axis, with units.

**Third,** use `signal.freqz` to get the frequency response of your filter, and plot it. Make sure to use `np.abs` to get the magnitude of the complex-valued numbers that `freqz` gives you. (And remember that the frequencies `freqz` gives you are "helpfully" in units of radians per sample. You should figure out how to convert these units to Hz, i.e. samples per second.)

**Fourth,** apply the filter to the EEG data (use channel 0) using `np.convolve`. Plot the first 10 seconds of the resulting filtered timeseries as well as the first 10 seconds of the original timeseries on the same axis. How do they compare?

**Fifth,** plot the power spectrum of the filtered EEG signal. Make sure the units are correct and labeled.

In [None]:
# design a low-pass filter

In [None]:
# plot the filter

In [None]:
# plot the frequency response of the filter

In [None]:
# filter the signal from one EEG channel

In [None]:
# plot filtered & original data in same axis to compare

In [None]:
# plot power spectrum of the filtered EEG data