RAW (Base repo) to CSV

This code convert the data sets from RAW format to CSV format using MOABB.

It has been specifically conceived for BCI data.

This script is for bi2014b

--- 

**Important Note: bi2014b**

The **bi2014b** database is a multiplayer version of bi2014a. Only the **solo sessions** (`group_XX_sujet_XX.csv`) are used here because they are the only ones that follow the oddball paradigm.

**How the files work:**

* The data is stored in **shared CSV files** containing two subjects at once.
* In `group_01_sujet_01.csv`, **Subject 1** is doing the task while Subject 2 is resting.
* In `group_01_sujet_02.csv`, **Subject 2** is doing the task while Subject 1 is resting.

**Processing:**

1. Files were moved manually from nested folders into a simple structure: `CSV_bi2014b/group_XX_sujet_XX.csv`.
2. A function was made to extract only the working subject's data from these shared files.

In [None]:
import numpy as np
import pandas as pd
import os

# Import decimate 
import sys
import os
sys.path.append(os.path.abspath('..'))
from ConvTools import decimate, df_to_mne, extract_subject_data

In [None]:
# test for 1 file
temp_file = "D:\\Travail\\backupPCgipsa\\taf\\officework\\gipsa bases\\CSV bi2014b\\group_01_sujet_01.csv"

In [None]:
# Read the data using the function specific for bi2014b
df = extract_subject_data(temp_file, 1)

In [None]:
# Downsampling
sfreq = 512
decimation_factor = 2
stim_name = 'STI'

raw = df_to_mne(df, sfreq)
raw_decimated = decimate(raw, sfreq, decimation_factor, stim_name)
data = raw_decimated.get_data()

# Transpose
dataT = data.T

In [None]:
# Extract the last column (stim channel)
stim_col = dataT[:, -1]

# Count the unique values
unique_vals, counts = np.unique(stim_col, return_counts=True)

# Loop through unique values and their counts to print the results
for val, count in zip(unique_vals, counts):
    print(f"Value : {val}, Occurrence count : {count}")

In [None]:
# creating timestamps and header
n_times, n_channels = dataT.shape
timestamps = np.arange(n_times, dtype=int)
data_with_timestamp = np.column_stack((timestamps, dataT))
header = [""] + [str(i) for i in range(n_channels)]

# Removing decimals from timestamps
df = pd.DataFrame(data_with_timestamp, columns=header)
df[""] = df[""].astype(int)

In [None]:
# Test to check csv file
df.to_csv("data.csv", index=False)

In [None]:
# Loop through all subjects
# Path to the directory containing all .csv files of the dataset
file_dir = "D:\\Travail\\backupPCgipsa\\taf\\officework\\gipsa bases\\CSV bi2014b\\"
subject_list = [os.path.join(file_dir, file) for file in os.listdir(file_dir)]

# parameters
sfreq = 512
decimation_factor = 2
stim_name = 'STI'

for i, subject in enumerate(subject_list):

    # Extract subject number (1 or 2) from the group, we need it for extract_subject_data function
    csv_name = os.path.splitext(os.path.basename(subject))[0]
    sub_num = csv_name.split('_')[3]  

    true_sub = i+1 # True subject number

    # Read the data using the function specific for bi2014b
    df = extract_subject_data(subject, int(sub_num))

    # downsampling 
    raw = df_to_mne(df, sfreq)
    raw_decimated = decimate(raw, sfreq, decimation_factor, stim_name)
    data = raw_decimated.get_data()

    # Transpose
    dataT = data.T

    # creating timestamps and header
    n_times, n_channels = dataT.shape
    timestamps = np.arange(n_times, dtype=int)
    data_with_timestamp = np.column_stack((timestamps, dataT))
    header = [""] + [str(i) for i in range(n_channels)]

    # Removing decimals from timestamps
    df = pd.DataFrame(data_with_timestamp, columns=header)
    df[""] = df[""].astype(int)

    # Construct the final filename
    subject_str = f"{true_sub:02d}"
    filename = f"subject_{subject_str}_session_01.csv"

    # Export the processed DataFrame to CSV
    df.to_csv(filename, index=False)
    print(f"Saved file: {filename}")

    # Display information
    events = df.iloc[:, -1]
    n_nt = len(events[events == 1]) 
    n_t = len(events[events == 2]) 
    print(f"Number of Non-Target (1): {n_nt}")
    print(f"Number of Target (2): {n_t}")