# Dataset creation

This code performs several data processing tasks using pandas to manipulate a series of EEG (Electroencephalography) data files.

The main characteristics of this code are:

1. **Adjusting Size**: It finds the minimum number of rows among all the DataFrames and trims each DataFrame to that minimum length to ensure they all have the same number of rows.

2. **Dividing DataFrames**: It defines a function `divide_into_10` that divides a DataFrame into 10 equal parts and resets the indices of each part. Then, it applies this function to each trimmed DataFrame and stores the parts in a dictionary `divided_dataframes`.

3. **Processing DataFrames**: For each DataFrame in `divided_dataframes`, it applies the transposition function to each of the 10 parts, concatenates the transposed parts into a single DataFrame, and assigns a corresponding target.

4. **Final Combination**: It concatenates all the combined DataFrames with their respective targets into a single final DataFrame `final_combined_dataframe`.


In [1]:
import pandas as pd

In [2]:
column_names = ['timestamp', 'counter', 'eeg', 'attention', 'meditation', 'blinking']
baseline = pd.read_csv('data/baseline.dat', delimiter=' ', names=column_names)
exhalar = pd.read_csv('data/exhalar.dat', delimiter=' ', names=column_names)
golpes1 = pd.read_csv('data/golpes1.dat', delimiter=' ', names=column_names)
golpes2 = pd.read_csv('data/golpes2.dat', delimiter=' ', names=column_names)
cerrados = pd.read_csv('data/cerrados.dat', delimiter=' ', names=column_names)
mentalimagery = pd.read_csv('data/mentalimagery.dat', delimiter=' ', names=column_names)
pestaneos = pd.read_csv('data/pestaneos.dat', delimiter=' ', names=column_names)
inhalar = pd.read_csv('data/inhalar.dat', delimiter=' ', names=column_names)

### Dividir dataset

In [3]:
partition_number = 150

In [4]:
dataframes = [baseline, exhalar, golpes1, golpes2, cerrados, mentalimagery, pestaneos, inhalar]
min_length = min(df.shape[0] for df in dataframes)
dataframes_trimmed = [df.iloc[:min_length] for df in dataframes]

def divide_into(df):
    rows_per_df = len(df) // partition_number
    return [df.iloc[i*rows_per_df: (i+1)*rows_per_df].reset_index(drop=True) for i in range(partition_number)]

divided_dataframes = {}
for i, df in enumerate(dataframes_trimmed):
    divided_dataframes[f'dataframe_{i+1}'] = divide_into(df)

### Create dataset


In [5]:
def transpose_eeg_dataframe(df):
    transposed_df = df['eeg'].to_frame().T
    transposed_df.columns = [f'row_{i}' for i in df.index]
    return transposed_df

targets = {
    'dataframe_1': 'baseline',
    'dataframe_2': 'exhalar',
    'dataframe_3': 'golpes1',
    'dataframe_4': 'golpes2',
    'dataframe_5': 'cerrados',
    'dataframe_6': 'mentalimagery',
    'dataframe_7': 'pestaneos',
    'dataframe_8': 'inhalar'
}

all_combined_dataframes = []
for key, target in targets.items():
    dataframe_list = divided_dataframes[key]
    transposed_dataframes = []
    for df in dataframe_list:
        transposed_df = transpose_eeg_dataframe(df)
        transposed_dataframes.append(transposed_df)
    combined_dataframe = pd.concat(transposed_dataframes, ignore_index=True)
    combined_dataframe['target'] = target
    all_combined_dataframes.append(combined_dataframe)

final_combined_dataframe = pd.concat(all_combined_dataframes, ignore_index=True)

print(final_combined_dataframe)

      row_0  row_1  row_2  row_3  row_4  row_5  row_6  row_7  row_8  row_9  \
0        90    104    104    100    105    114     89     58     36     33   
1        24     38     66     72     56     39     22     18     26     27   
2         7     11     22     23     12      9     43     68     57     34   
3      -374   -342   -326   -339   -341   -331   -334   -347   -370   -387   
4        50     41     28     41     59     54     42     41     44     60   
...     ...    ...    ...    ...    ...    ...    ...    ...    ...    ...   
1195     -9    -26    -27    -13     16     25     34     27      5     -6   
1196     -1     -2    -12    -34    -42    -23      8     11    -20    -19   
1197     54     20     27     51     53     43     19      3     16     57   
1198     -3      3     11     20     20     21     19     21     27     44   
1199     67     60     64     82     91     59     18     26     55     66   

      ...  row_196  row_197  row_198  row_199  row_200  row_201

In [6]:
file_name = f'data/combined_dataset_{partition_number}_partitions.csv'
final_combined_dataframe.to_csv(file_name, index=False)