# Notebook 3: comparison of linear and nonlinear decoders

In [None]:
!pip install scipy==1.7.3

!git clone https://github.com/Mike-boop/mldecoders.git

import os
os.chdir('mldecoders')

!python setup.py install

In [None]:
# Download the predictions from the CNSP web server 

!wget --no-parent -r 'https://www.data.cnspworkshop.net/data/thornton_data/predictions' -O 'cnsp_workshop_tutorial/data/predictions'
predictions_dir = 'cnsp_workshop_tutorial/data/predictions'
# predictions_dir = 'cnsp_workshop_tutorial/data/single-speaker-predictions'

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr, zscore

# Intro

In part 1, you trained linear decoders to predict the speech envelope from EEG recordings. In part 2, each of you trained one of the DNNs for one participant's data, and sent us the predicted speech envelope values. In this notebook, we will compare the performances of the linear models with those of the DNNs. 

# Loading the predicted speech envelopes

The outputs of the linear models which we saved can be loaded like so (using the first participant as an example):

In [None]:
participant = 0
ridge_filepath = os.path.join(predictions_dir, f"ridge_predictions_P{participant:02d}.npy")
ridge_predictions = np.load(ridge_filepath)

In [None]:
ground_truth = np.load(os.path.join(predictions_dir, 'ground_truth.npy'))
print(pearsonr(ridge_predictions, ground_truth))

Similarly, the predictions of the DNNs can be loaded like so:

In [None]:
fcnn_filepath = os.path.join(predictions_dir, f"fcnn_predictions_P{participant:02d}.npy")
fcnn_predictions = np.load(fcnn_filepath)
print(pearsonr(fcnn_predictions, ground_truth))

The time series can be compared visually:

In [None]:
cnn_filepath = os.path.join(predictions_dir, f"cnn_predictions_P{participant:02d}.npy")
cnn_predictions = np.load(cnn_filepath)

fs = 125
t = np.arange(len(cnn_predictions))/fs
plt.plot(t, ground_truth, label='envelope')
plt.plot(t, zscore(cnn_predictions), label='reconstruction')
plt.legend()

plt.xlim(10, 30)
plt.xlabel('Time [s]')

plt.title(f'correlation: {pearsonr(ground_truth, fcnn_predictions)[0]}')

## Exercise: correlate the CNN predictions with the FCNN predictions. What do you notice? Also compare the DNN predictions with the predictions of the linear models.

# Population-level analysis

Let's collect the reconstruction accuracies (correlation coefficients) for each model and participant. We will also collect the null reconstruction scores.

In [None]:
correlations = {'ridge': [], 'cnn': [], 'fcnn':[]}
null_correlations = {'ridge': [], 'cnn': [], 'fcnn':[]}

for participant in range(13):

    for model in ['ridge', 'cnn', 'fcnn']:

        filepath = os.path.join(predictions_dir, f"{model}_predictions_P{participant:02d}.npy")
        predictions = np.load(filepath)

        score = pearsonr(ground_truth, predictions)[0]
        null_score = pearsonr(ground_truth[::-1], predictions)[0]

        correlations[model].append(score)
        null_correlations[model].append(null_score)

In [None]:
fig, axs = plt.subplots(1,2, tight_layout=True)

axs[0].boxplot(correlations.values(), positions = [1,2,3])
axs[0].set_ylabel('reconstruction score')
axs[0].set_xticks([1,2,3])
axs[0].set_xticklabels(correlations.keys())
axs[0].set_ylim(-0.1, 0.4)

axs[1].boxplot(null_correlations.values(), positions = [1,2,3])
axs[1].set_ylabel('null reconstruction score');
axs[1].set_xticks([1,2,3])
axs[1].set_xticklabels(correlations.keys())
axs[1].set_ylim(-0.1, 0.4)

Clearly, the reconstruction scores are much greater than the null reconstruction scores, so we can be confident that the speech envelope reconstruction is working better than chance. Interestingly, the null reconstruction scores seem to be overall slightly negative. __Exercise: can you think of why this might be? Is this a problem?__

Additionally, the two DNNs appear to perform very similarly. We should perform a quick test to see whether they are significantly better than the linear models (note that really we should be performing multiple comparison corrections here):

In [None]:
# perform single-tailed paired t-tests

from scipy.stats import ttest_rel

print('CNN vs ridge: p =', ttest_rel(correlations['cnn'], correlations['ridge'], alternative='greater')[1])
print('FCNN vs ridge: p =', ttest_rel(correlations['fcnn'], correlations['ridge'], alternative='greater')[1])

# additionally, compare the DNNs with a two-tailed paired t-test

print('FCNN vs CNN: p =', ttest_rel(correlations['fcnn'], correlations['cnn'], alternative='two-sided')[1])

# Extension: two-speaker data