# Packages: minimize for better structure

In [1]:
import numpy as np
import os
import pandas as pd
import sklearn
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter1d
from matplotlib.patches import Patch

# Usage of Epochs to classify Responders/Non Responder

This is a notebook to map and structure the use of classifying nearby epochs, in order to make more accurate classifications. Hence, it does a sandbox for epoch-training.

Main Idea: (aim -> individual level)
- Initialize the data
- Use epochs? Maybe this should be during the training proccess or data initialization?
- These epochs are classified to be responder or non responder. 
  - How is this done? Maybe start with a simple architecture so it trains fast for testing
  - It has to be unsupervised. Labels aren't given yet.
  - These assigned labels has to go through the threshold in order to pass. (visualize this to evaluate)
        - A value is chosen to be set as a treshold for how many seconds from literature
        - It could be supervised. By running unsupervised and using the approximated labels to train a model.


# Step 1: Initialize the data, and make sure this is done correct:
Explanation:

To begin, a folder directory and path are specified to access the files. These paths are joined using the os.path.join() function to handle the paths correctly and ensure they point to the right destination. Empty arrays are initialized for storing values. For instance, the patient_numbers list stores the IDs for each patient, which could be formatted like p1, p2, and so on.

A for loop is then used to iterate over each file in the directory, which corresponds to iterating through each patient. The filenames are processed using the os.fsdecode() function, and the patient number is extracted by splitting the filename based on underscores and selecting the first element. The patient number is then added to the patient_numbers array that was initialized earlier.

Next, the full directory of each patient file is constructed by joining the base path with the specific filename. This ensures that the data can be properly accessed. The data itself is read from the CSV files using pd.read_csv(), as the files are in CSV format.

After loading the data, some preprocessing steps are applied. Column names are adjusted, and any text-based attributes are converted into numerical values for easier handling. A set of specific columns is chosen for standardization, which is done using the StandardScaler from sklearn. This step ensures that features are on the same scale, preventing large values from dominating the training process. By standardizing, the mean of each column is set to 0, and the standard deviation is set to 1, ensuring uniformity in the dataset.


In [16]:
path = 'CSV_features_NEW'
folder = 'C:\\Users\\RJEN0307\\Desktop\\Bachelorprojekt\\Bachelor_project_2024\\'

# Combine them using os.path.join for proper path handling
full_path = os.path.join(folder, path)
#print(os.listdir(full_path))


patient_numbers = []
patient_data = {}

for file in os.listdir(full_path):
    filename = os.fsdecode(file)
    patient_number = filename.split('_')[0]  # This will give 'p3' from 'p3_features.csv'
    patient_numbers.append(patient_number)
    patient_file_dir = os.path.join(full_path, filename)
    data = pd.read_csv(patient_file_dir)
    data.rename(columns={'Unnamed: 0': 'Index'}, inplace=True)
    data['Event'] = data['Event'].map({'R': 0, 'M': 1, 'F': 2})
    standarize_list = ['PSD Delta', 'PSD Delta_N', 'PSD Theta', 'PSD Theta_N', 'PSD Alpha', 'PSD Alpha_N', 'PSD Beta', 'PSD Beta_N', 'PSD Gamma', 'PSD Gamma_N', 'PSD SE', 'PSD MSF', 'PSD Sef90', 'PSD Sef95', 'PE', 'wSMI', 'Kolmogorov', 'Mean RR', 'Std RR', 'Mean HR', 'Std HR', 'Min HR', 'Max HR', 'Freq_Slope mean', 'Freq_Slope std']

    sklearn.preprocessing.StandardScaler().set_output(transform='pandas') 
    data[standarize_list] = sklearn.preprocessing.StandardScaler().fit(data[standarize_list]).transform(data[standarize_list])

    patient_data[patient_number] = data

#print(patient_numbers)
#print(data)
print("Keys in patient_data dictionary:", list(patient_data.keys()))


Keys in patient_data dictionary: ['p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p27', 'p28', 'p29', 'p2', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p3', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p4', 'p50', 'p51', 'p52', 'p53', 'p54', 'p56', 'p57', 'p58', 'p59', 'p5', 'p61', 'p62', 'p63', 'p65', 'p66', 'p67', 'p68', 'p69', 'p6', 'p71', 'p72', 'p73', 'p74', 'p75', 'p76', 'p77', 'p78', 'p79', 'p7', 'p80', 'p8', 'p9']


# Step 2: Initialize the Unsupervised learning
This is the most important step. Here, the model gets its performance from.

For testing purpose, a simple model is first initialized. It is important that it looks at each patient, and not all the data at the same time. Since the initialization of the data is structured such that it stores each patients data seperately in the patient_data dictionary, the values can be extract from this. 

In [None]:
# Use case (maybe delete this)
for patient in patient_data.keys():
    print(f"Patient: {patient}")
    print(patient_data[patient].head())


In [None]:
for patient in patient_data.keys():
    a = 1 