The goal of this notebook is to use the methods in the module ```extraction_classes.py``` and the datasources in ```csv``` to generate demographic and time series embeddings.

in the first cell, we import all constants and modules necessary for this notebook, then in the following cells, we create dataframes from extracted features using the functions mentionned above.

And finally, we concatenate the dataframe to create our fusion dataframe to be used in machine learning.

Note: in this notebook, we only use a cohort of 10 patients to make the data processing and extraction easier for our tests. Objects from classes are intantiated  inside each extraction line. For example:

``extraction_classes.Event_extraction(patient).extract_chart_events(patient)`` in this code, the loop creates an object ``Event_extraction`` for each ``patient`` in the cohort and apllies the method  ``extract_chart_events``.


To generate embeddings for all patients, more processing time is required, especially for time series feature extraction. 



### Imports

In [None]:
import os
os.chdir('../')

from src.data import constants
from src.utils import extraction_classes

import pandas as pd


## Create chart events features for patient cohort

Start by creating an empty list that will be filled with the extracted features for chart events:

In [None]:
chart_fusion = []
for patient in constants.cohort:

    chart_fusion.append(extraction_classes.Event_extraction(patient).extract_chart_events(patient))
    

### Concatenate chart events fusion features

Concatenate features along the 0 axis for all patients:

In [None]:
chart_fusion = pd.concat(chart_fusion, axis=0)

## Create lab events features for patient cohort

Start by creating an empty list that will be filled with the extracted features for chart events:

In [None]:
lab_fusion = []
for patient in constants.cohort:
    
    lab_fusion.append(extraction_classes.Event_extraction(patient).extract_lab_events(patient))
    

### Concatenate lab events fusion features

Concatenate features along the 0 axis for all patients:

In [None]:
lab_fusion = pd.concat(lab_fusion, axis=0)

## Create procedure events features for patient cohort

Start by creating an empty list that will be filled with the extracted features for chart events:

In [None]:
procedure_fusion = []
for patient in constants.cohort:
   
    procedure_fusion.append(extraction_classes.Event_extraction(patient).extract_procedure_events(patient))
    

### Concatenate procedure events fusion features

Concatenate features along the 0 axis for all patients:

In [None]:
procedure_fusion = pd.concat(procedure_fusion, axis=0)

## Create demographic features for patient cohort

Create and object `Demographic_extraction` and call medthod `extract_demographics` :

In [None]:
demographics_fusion = extraction_classes.Demographic_extraction().extract_demographics()

Filter dataframe using patients in the `cohort`:

In [None]:
demographics_fusion = demographics_fusion[demographics_fusion['subject_id'].isin(constants.cohort)]

# Concatenate fusion features horizontally for all patients

Concatenate the generated dataframes to combine  all features

In [None]:
fusion_dataframe = pd.concat([demographics_fusion, chart_fusion, lab_fusion, procedure_fusion ], axis=1)

### Renaming the index as subject_id

Rename the genarated dataframe index for the purpose of exporting to csv:

In [None]:
fusion_dataframe.index.names = ["subject_id"]

### Exporting the dataframe to a csv file

In [None]:
fusion_dataframe.to_csv("csvs/fusion_dataframe.csv", index=True)