# Electroencephalograpic classification of human non-REM sleep stages 
- Electroencephalography: EEG
- Surface measurement of post-synaptic potentials across millions of neurons
- spatial resolution: brain lobes, approximately

![Human brain lobar anatomy](img/brain_lobes_wiki.png)

EEG cap layout:
- Electrode names 
  - even numbers are over the right hemisphere
  - odd numbers are over the left hemisphere,
  - the letter 'z' (e.g. Fz, Cz, Pz) indicate an electrode over the central line (nasion to inion)
  - the letters **F, T, P, O** indicate the nearest brain lobe, **Fp** denotes fronto-polar, and **C** indicates the region around the central sulcus; the **A** electrodes are attached near the ears and are assumed to record _no_ relevant brain activity

- **Knowledge check**:
    - the precentral gyrus belongs to the __ lobe, and its function is __
    - the postcentral gyrus belongs to the __ lobe, and its function is __
![EEG sensor names](img/21_electrodes_of_International_10-20_system_for_EEG.svg.png)

In [None]:
from ml_nrem import *

# Load sample EEG data
- Every time you execute the code cell below, a pair or random 10 second EEG segments will be shown.
- Below, the frequency spectra (power spectral density) of the two traces is shown as well.
- One trace is taken from an EEG epoch that was scored as wakefulness (**W**) by a human scorer; the other trace was scored as (light) non-REM sleep stage **N1**
- **TASKS**
  - Can you identify which EEG trace is from wakefulness and which one is N1 sleep?
    1. During wakefulness, healthy adults show alpha oscillations (8-12 Hz) over posterior brain regions
    2. The American Association for Sleep Medicine (AASM) Manual for Scoring Sleep (2007) asks the human scorer to label the EEG epoch as N1 (light sleep) "_if alpha rhythm is replaced by low amplitude, mixed frequency activity for more than 50% of the epoch_"
  - The rules are applied to 30 second EEG pages. Explain why some of the 10 sec segments are easier to classify than others
  - Do all subjects have identical peak alpha frequencies?

In [None]:
# run this code cell several times to inspect random 10 sec EEG snippets
fig = show_random_data()
plt.show()

# Classifier

https://miro.medium.com/v2/resize:fit:640/format:webp/1*i0o8mjFfCn-uD79-F1Cqkw.png


## Make spectral features

In [None]:
# select frequency bands: True, False
delta = not True
theta = True
alpha = True
beta = True

In [None]:
X, y, feature_names = make_features(delta, theta, alpha, beta)

plt.figure(figsize=(18,9))
plt.imshow(X.T, cmap=plt.cm.bwr)
ax = plt.gca()
bounds = 1+np.where(np.diff(y))[0]
for b in bounds:
    ax.axvline(b, color='k')
plt.title("EEG spectral features for classification")
plt.tight_layout()
plt.show()

In [None]:
#print(feature_names)

## Setup and optimize classifier
- We will split the input (features `X`) and output (targets `y`) variables into two data sets
  - training data (80%)
  - test data (20%) 
- Discuss why this might be helpful?
- Discuss overfitting

In [None]:
#tmp = np.load("spectral_features.npz")
#X = tmp['X']
#y = tmp['y']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=32)
print("X_train: ", X_train.shape)
print("y_train: ", y_train.shape)
print("X_test: ", X_test.shape)
print("y_test: ", y_test.shape)

In [None]:
test_params = { 
    'max_features': [5, 15, 25], 
}
# ['sqrt', 3, 10]
#'n_estimators': [100, 200, 300],

fixed_params = {
    'max_depth': 10,   
    'n_estimators': 100,
    'n_jobs': -1,
    'random_state': 42,
}

tuning = GridSearchCV(
    cv = 10,
    estimator = RandomForestClassifier(**fixed_params), 
    n_jobs = -1, 
    param_grid = test_params, 
    scoring = 'accuracy',
)
print("Searching for optimal classifier parameters, be patient...")
tuning.fit(X_train, y_train) # this can take some time...
print("Optimal parameters found: ", tuning.best_params_)
print(f"Best score obtained on training data: {tuning.best_score_:.3f}")
clf_opt = tuning.best_estimator_

- Is there anything concerning about the optimal classifier parameters found?
- How would you address this problem?

## Analyze the optimized classifier

In [None]:
test_score = clf_opt.score(X_test, y_test)
print("Accuracy on test data: ", test_score)

- Is the score on test data different from the training data?
- Compare with other groups, is there a systematic difference? Explain the results!

In [None]:
# save the model to disk
f_clf_opt = f"./RFC_opt.pkl"
print(f"Optimized classifier saved as: {f_clf_opt:s}")
with open(f_clf_opt, 'wb') as fp:
    pickle.dump(clf_opt, fp)

Let's ask the classifier which features were the most important for classification!

In [None]:
idx_sort = np.argsort(clf_opt.feature_importances_)[::-1]
feature_importances_sorted = np.array(clf_opt.feature_importances_)[idx_sort]
feature_names_sorted = np.array(feature_names)[idx_sort]
feat_max = 20
for i, fimp in enumerate(feature_importances_sorted[:feat_max]):
    print(f"{i:d}, {feature_names_sorted[i]:s}: {100*fimp:.1f}%")

## Confusion matrix

In [None]:
y_predicted = clf_opt.predict(X_test)
conf_mat = confusion_matrix(y_test, y_predicted)
disp = ConfusionMatrixDisplay(confusion_matrix=conf_mat, display_labels=sleep_stages)
disp.plot()
plt.title(f"Confusion matrix\nRandom Forest Classification of sleep_stages (spectral features)")
plt.show()

- Give a verbal explanation of the confusion matrix, what does it tell you?
- Why is the term _confusion_ used?
- What are common confusions?
- Does the Random Forest Classifier find a similar relationship about which sleep stages are _neighbours_?

In [None]:
acc = clf_opt.score(X_test, y_test)
y_predicted = clf_opt.predict(X_test)
class_report = classification_report(y_test, y_predicted)
print(class_report)

- Look up the definitions of precision and recall (have you heard of sensitivity and specificity?)
- Which sleep stage is the least likely to be confused with another sleep stage?

## Cross-validate classifier

In [None]:
# F1-score cross-validation
n_cv = 10

print(f"\n[+] Cross-validation (N={n_cv:d}) on TRUE labels (wait...)")
folds = StratifiedShuffleSplit(n_splits = n_cv, train_size = 0.8)
scores = []
for idx_train, idx_test in folds.split(X, y):
    X_train, y_train, X_test, y_test = X[idx_train], y[idx_train], X[idx_test], y[idx_test]
    #clf_opt.fit(X_train, y_train)
    y_pred = clf_opt.predict(X_test) # [:, 1]
    f1 = f1_score(y_test, y_pred, average = None, labels = [1])[0]
    scores.append(f1)
scores_mean = np.mean(scores)
scores_std = np.std(scores)   
print(f"F1-scores: mean={scores_mean:.2f}, std={scores_std:.2f}")

print(f"\n[+] Cross-validation (N={n_cv:d}) on SHUFFLED labels (wait...)")
y_shuffled = np.random.permutation(y) # test against shuffled labels
folds = StratifiedShuffleSplit(n_splits = n_cv, train_size = 0.8)
scores_shuffled = []
for idx_train, idx_test in folds.split(X, y_shuffled):
    X_train, y_train, X_test, y_test = X[idx_train], y_shuffled[idx_train], X[idx_test], y_shuffled[idx_test]
    #clf_opt.fit(X_train, y_train)
    y_pred = clf_opt.predict(X_test) # [:, 1]
    f1 = f1_score(y_test, y_pred, average = None, labels = [1])[0]
    scores_shuffled.append(f1)
scores_shuffled_mean = np.mean(scores_shuffled)
scores_shuffled_std = np.std(scores_shuffled) 
print((f"F1-scores: mean={scores_shuffled_mean:.2f}, std={scores_shuffled_std:.2f}"))

In [None]:
from scipy.stats import mannwhitneyu
t, p_mw = mannwhitneyu(scores, scores_shuffled)
print(f"\n[+] Mann-Whitney U test: p = {p_mw:.4f}")
alpha = 0.05
if ((scores_mean > scores_shuffled_mean) and p_mw < alpha):
    print("Classifier performance IS statistically significant.")
else:
    print("Classifier performance IS NOT statistically significant.")