## Merging the observation of chart and lab event

To collect data from both the chartevents and labevents tables, we merge them into a single dataset while ensuring that we prioritize values from the labevents when there are duplicate records based on (subject_id, itemid, charttime, storetime).

In [None]:
import gc
import pandas as pd
chartevents_df = pd.read_csv('CSV\Exports\o04_icu_chartevent.csv')
labevents_df = pd.read_csv('CSV\Exports\o04_icu_labevents.csv', low_memory=False)

merged_df = pd.concat([chartevents_df, labevents_df], ignore_index=True)

# Sort by priority: labevents should overwrite chartevents
merged_df = merged_df.sort_values(by=['subject_id', 'itemid', 'charttime', 'storetime'], ascending=[True, True, True, True])

# Drop duplicates based on (subject_id, itemid, charttime, storetime), keeping the last (labevents) record
merged_df = merged_df.drop_duplicates(subset=['subject_id', 'itemid', 'charttime', 'storetime'], keep='last')

# Save chartevent dataset to a CSV file
merged_df.to_csv('CSV\Exports\o04_icu_chart_lab_merged.csv', index=False)

# Free RAM
gc.collect()