## What we are working with

- The file *input_data.xlsx* contains the collected input data from all the proposed algorithmic approaches in each study.

- The sheet _raw data_ contains one table entry for each algorithmic approach, so we need to transform that into a matrix of the form:


| Paper title | Input data 1 | Input data 2 | Input data 3 |
| ----------- | ------------ | ------------ | ------------ |
| Title 1     | 0            |  1           |   1          | 
| Title 2     | 1            |  0           |   1          | 


- We will also create a third sheet with all the input data types, that must be hand-filled with the respective categories. E.g.:

| Input data | Category |
| ----------- | ------------ |
| raw EEG     | raw physiological signal |
| EEG cyclic profile | cyclic profile from physiological signal |


In [1]:
# third-party
import pandas as pd

# local
from get_matrix import get_individual_occurrences

-----
## Creating _'papers and inputs'_

In [2]:
filepath = 'aux_files/input_data.xlsx'
df = pd.read_excel(filepath, sheet_name='raw data')
df

Unnamed: 0,Title,Input
0,Seizure Forecasting from Subcutaneous EEG Usin...,raw physiological signal
1,Seizure Forecasting from Subcutaneous EEG Usin...,raw physiological signal and ToD
2,Seizure Forecasting by High-Frequency Activity...,cyclic profile of physiological signal
3,Seizure Forecasting by High-Frequency Activity...,cyclic profile of physiological signal
4,Prediction of seizure likelihood with a long-t...,features from physiological signal
5,Prediction of seizure likelihood with a long-t...,features from physiological signal
6,Comparison between epileptic seizure predictio...,raw physiological signal
7,Daily resting-state intracranial EEG connectiv...,features from physiological signal
8,Daily resting-state intracranial EEG connectiv...,features from physiological signal
9,Can heart rate variability identify a high-ris...,features from physiological signal


In [3]:
df_input = pd.concat([df['Title'], get_individual_occurrences(df['Input'], ' and ')], axis=1)

-----
## Creating _'input categories'_

In [4]:
try:
    pd.read_excel(filepath, sheet_name='input categories')
    df_input_categories = pd.read_excel(filepath, sheet_name='input categories')

except ValueError:
    df_input_categories = pd.DataFrame(columns=['Input data', 'Category'])
    df_input_categories['Input data'] = df_input.columns[1:].unique()
    
df_input_categories

Unnamed: 0,Input data,Category
0,raw physiological signal,raw physiological signal
1,ToD,ToD
2,cyclic profile of physiological signal,cyclic profile of physiological signal
3,features from physiological signal,features from physiological signal
4,seizure cyclic profile,seizure cyclic profile
5,seizure times,seizure times
6,other cyclic profiles,other cyclic profiles
7,other info,other info


----
## Now, saving it as a new sheet into the same Excel file

In [5]:
with pd.ExcelWriter(filepath, engine='openpyxl', mode='w') as writer:  
    df.to_excel(writer, sheet_name='raw data', index=False)
    df_input.to_excel(writer, sheet_name='papers and inputs', index=False)
    df_input_categories.to_excel(writer, sheet_name='input categories', index=False)