# Using basic features within the same patient

> In this notebook we will be using a very simple feature extraction and then train a very simple classifier within the same patient (we take data from one recording and split it in a train/test split) to assess the differences that may arise between patients. We expect to see that almost all of the patients behave similarly, but we could be surprised.

In [None]:
#| hide
%load_ext autoreload
%autoreload 2

In [None]:
#| hide
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 

In [None]:
#| hide
from nbdev.showdoc import *

In [None]:
import os
from glob import glob
from collections import Counter
from typing import List, Dict

from rich.progress import track
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import mne
import yasa

from sklearn.model_selection import train_test_split, cross_validate
from sklearn.ensemble import RandomForestClassifier

from sleepstagingidal.data import *
from sleepstagingidal.dataa import *
from sleepstagingidal.feature_extraction import *

In [None]:
#| hide
path_data = "/media/2tbraid/antonia/PSG/"

In [None]:
path_files = glob(os.path.join(path_data, "*.edf"))

In [None]:
channels = ["C3", "C4", "A1", "A2", "O1", "O2", "LOC", "ROC", "LAT1", "LAT2", "ECGL", "ECGR", "CHIN1", "CHIN2"]

## Looping through patients

> As we only want to perform a very basic check, we are going to be looping through all the patients.

In [None]:
#| output: false

results = {}

for path in track(path_files):
    file_name = path.split("/")[-1]
    raw = read_clean_edf(path, resample=100, bandpass=(0.3, 49))
    epochs, sr = get_epochs(raw, channels=channels)
    bandpowers = calculate_bandpower(epochs, sf=sr)
    labels = epochs.events[:,-1]
    results_cv = cross_validate(RandomForestClassifier(random_state=42), bandpowers, labels)
    results[file_name] = results_cv['test_score']

Output()

ValueError: zero-size array to reduction operation maximum which has no identity

In [None]:
#| hide
results

{'PSG29.edf': array([0.54861111, 0.47222222, 0.44444444, 0.32167832, 0.51748252]),
 'PSG12.edf': array([0.41176471, 0.39215686, 0.61437908, 0.39869281, 0.47368421]),
 'PSG17.edf': array([0.34969325, 0.5617284 , 0.53703704, 0.50617284, 0.20987654]),
 'PSG10.edf': array([0.65789474, 0.8013245 , 0.62913907, 0.34437086, 0.57615894]),
 'PSG23.edf': array([0.87341772, 0.81528662, 0.82165605, 0.87898089, 0.60509554]),
 'PSG25.edf': array([0.31788079, 0.52980132, 0.59602649, 0.37086093, 0.49006623]),
 'PSG35.edf': array([0.65294118, 0.5147929 , 0.53846154, 0.61538462, 0.62721893]),
 'PSG11.edf': array([0.62237762, 0.82517483, 0.76923077, 0.51748252, 0.21678322]),
 'PSG30.edf': array([0.59064327, 0.68421053, 0.71345029, 0.71764706, 0.61176471]),
 'PSG16.edf': array([0.88023952, 0.83832335, 0.94578313, 0.90963855, 0.81927711]),
 'PSG21.edf': array([0.52713178, 0.48062016, 0.51162791, 0.35658915, 0.48837209]),
 'PSG9.edf': array([0.42261905, 0.41916168, 0.35329341, 0.19161677, 0.21556886]),
 'PSG

We can put the logged results into a `DataFrame` and save them as `.csv` to avoid having to repeat the experiment:

In [None]:
df = pd.DataFrame.from_dict(results, orient='index')
df.index.set_names("File", inplace=True)
df['Mean'] = df.mean(axis=1)
df['Std'] = df.std(axis=1)
df.head()

Unnamed: 0_level_0,0,1,2,3,4,Mean,Std
File,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
PSG29.edf,0.548611,0.472222,0.444444,0.321678,0.517483,0.460888,0.078328
PSG12.edf,0.411765,0.392157,0.614379,0.398693,0.473684,0.458136,0.083295
PSG17.edf,0.349693,0.561728,0.537037,0.506173,0.209877,0.432902,0.133771
PSG10.edf,0.657895,0.801325,0.629139,0.344371,0.576159,0.601778,0.148749
PSG23.edf,0.873418,0.815287,0.821656,0.878981,0.605096,0.798887,0.100312


In [None]:
df.to_csv("Results/00_basic_features_own_patient.csv")