RAW (Base repo) to CSV

This code convert the data sets from RAW format to CSV format using MOABB.

It has been specifically conceived for BCI data.

This script is for bi2013a-NAO 

Important Note:

The original bi2013a files from the base repository are organized into one folder per subject_session. Each folder contains 4 .mat files and 4 .csv files, numbered from 1 to 4 (e.g., 1.csv, 2.csv, etc.).

To standardize the processing, a specific function has been designed to rearrange and separate this database into four distinct databases based on the file numbering:

- bi2013-AT: Corresponds to 1.csv (Adaptative - Training).

- bi2013-AO: Corresponds to 2.csv (Adaptative - Online).

- bi2013-NAT: Corresponds to 3.csv (Non-Adaptative - Training).

- bi2013-NAO: Corresponds to 4.csv (Non-Adaptative - Online).

Once separated, the data processing pipeline remains identical to the other databases in this project.


In [None]:
import numpy as np
import pandas as pd
import os

# Import decimate 
import sys
import os
sys.path.append(os.path.abspath('..'))
from ConvTools import decimate, rearrange, df_to_mne

In [None]:
# Rearrange bi2013a-NAO
source = "D:\\Travail\\backupPCgipsa\\taf\\officework\\gipsa bases\\CSV zenodo bi2013a\\"
file_dir = "D:\\Travail\\backupPCgipsa\\taf\\officework\\gipsa bases\\CSV bi2013a-NAO\\"
csv_num = 4
rearrange(csv_num, source, file_dir)

In [4]:
# test for 1 file
temp_file = "D:\\Travail\\backupPCgipsa\\taf\\officework\\gipsa bases\\CSV bi2013a-NAO\\subject_01_session_01.csv"

In [5]:
# Read the data
df = pd.read_csv(temp_file, header=0)

In [6]:
# Downsampling
sfreq = 512
decimation_factor = 2
stim_name = 'STI'

raw = df_to_mne(df, sfreq)
raw_decimated = decimate(raw, sfreq, decimation_factor, stim_name)
data = raw_decimated.get_data()

# Transpose
dataT = data.T

In [7]:
# Label standardization
# Convert 33285 to 2 and 33286 to 1 in stim column
dataT[:, -1] = np.where(dataT[:, -1] == 33285, 2, dataT[:, -1])
dataT[:, -1] = np.where(dataT[:, -1] == 33286, 1, dataT[:, -1])

In [8]:
# Extract the last column (stim channel)
stim_col = dataT[:, -1]

# Count the unique values
unique_vals, counts = np.unique(stim_col, return_counts=True)

# Loop through unique values and their counts to print the results
for val, count in zip(unique_vals, counts):
    print(f"Value : {val}, Occurrence count : {count}")

In [9]:
# creating timestamps and header
n_times, n_channels = dataT.shape
timestamps = np.arange(n_times, dtype=int)
data_with_timestamp = np.column_stack((timestamps, dataT))
header = [""] + [str(i) for i in range(n_channels)]

# Removing decimals from timestamps
df = pd.DataFrame(data_with_timestamp, columns=header)
df[""] = df[""].astype(int)

In [10]:
# Test to check csv file
df.to_csv("data.csv", index=False)

In [None]:
# Iterate through all subject folders/files in the directory
subject_list = [os.path.join(file_dir, file) for file in os.listdir(file_dir)]

for subject in subject_list:
    # Read the data
    df = pd.read_csv(subject, header=0)

    # Signal Processing Parameters
    sfreq = 512
    decimation_factor = 2
    stim_name = 'STI'

    # Convert DataFrame to MNE Raw object and apply decimation (filtering + resampling)
    raw = df_to_mne(df, sfreq)
    raw_decimated = decimate(raw, sfreq, decimation_factor, stim_name)
    data = raw_decimated.get_data()

    # Transpose data to (time_samples, channels) format
    dataT = data.T
    
    # Label standardization for the stimulation channel (last column)
    # Map raw hardware markers to target (2) and non-target (1) labels
    dataT[:, -1] = np.where(dataT[:, -1] == 33285, 2, dataT[:, -1])
    dataT[:, -1] = np.where(dataT[:, -1] == 33286, 1, dataT[:, -1])

    # Generate integer timestamps and prepare the CSV header
    n_times, n_channels = dataT.shape
    timestamps = np.arange(n_times, dtype=int)
    data_with_timestamp = np.column_stack((timestamps, dataT))
    header = [""] + [str(i) for i in range(n_channels)]

    # Format as DataFrame and ensure the timestamp column is integer type
    df = pd.DataFrame(data_with_timestamp, columns=header)
    df[""] = df[""].astype(int)

    # Extract the filename from the path
    filename = os.path.basename(subject)

    # Export the processed DataFrame to CSV
    df.to_csv(filename, index=False)
    print(f"Saved file: {filename}")

    # Display information
    events = df.iloc[:, -1]
    n_nt = len(events[events == 1]) 
    n_t = len(events[events == 2]) 
    print(f"Number of Non-Target (1): {n_nt}")
    print(f"Number of Target (2): {n_t}")