# Goal of this assignment is to:
#### 1. Get familiar with the json object format contained in each line of the 4 files:
        a. Patient.ndjson
        b. Condition.ndjson
        c. Encounter.ndjson
        d. EncounterICU.ndjson
        
#### 2. For each patient, create an array of conditions associated. The expected output for this is a dictionary with patient_id as key and an array of Condition json as value.

#### 3. For each condition, assign an estimated time for the condition using the corresponding encounter in the Encounter.json or EncounterICU.ndjson.

#### Choose the start_time in the Encounter to associate time to each condition.

#### 4. Finally, create a csv file with the following columns:
        a. Patient_id (Column name: pid)
        b. Timestamp (unix format) (Column name: time)
        c. Condition code (Column name: code)
        d. Condition string (Column name: description)

#### 5. You are required to submit the csv file as well as the jupyter notebook used to generate the csv file

# Libraries required for the assignment

In [1]:
import json
import pandas as pd
from datetime import datetime

In [3]:
# Function to read JSON file line by line and return a list of JSON objects  (all files are in the same folder)
def read_ndjson(file_path):
    data = []
    with open(file_path, 'r') as file:
        for line in file:
            data.append(json.loads(line))
    return data

In [4]:
# Read Patient.ndjson
patients = read_ndjson("Patient.ndjson")

In [5]:
# Read Condition.ndjson
conditions = read_ndjson("Condition.ndjson")

In [6]:
# Read Encounter.ndjson
encounters = read_ndjson("Encounter.ndjson")

In [7]:
# Read EncounterICU.ndjson
encounters_icu = read_ndjson("EncounterICU.ndjson")

In [8]:
# Step 2: Create dictionary for each patient with associated conditions
patient_conditions = {}
for condition in conditions:
    patient_id = condition["subject"]["reference"].split("/")[-1]
    if patient_id not in patient_conditions:
        patient_conditions[patient_id] = []
    patient_conditions[patient_id].append(condition)

In [11]:
# Step 3: Assign estimated time for each condition
def get_encounter_start_time(condition, encounters):
    encounter_id = condition["encounter"]["reference"].split("/")[-1]
    for encounter in encounters:
        if encounter["id"] == encounter_id:
            start_time = encounter["period"]["start"]
            return start_time
    return None

In [12]:
# Update conditions with estimated time
for patient_id, patient_condition_list in patient_conditions.items():
    for condition in patient_condition_list:
        start_time = get_encounter_start_time(condition, encounters)
        if not start_time:
            start_time = get_encounter_start_time(condition, encounters_icu)
        condition["estimated_time"] = start_time

In [13]:
# Step 4: Create DataFrame
data = []
for patient_id, patient_condition_list in patient_conditions.items():
    for condition in patient_condition_list:
        data.append({
            "pid": patient_id,
            "time": condition["estimated_time"],
            "code": condition["code"]["coding"][0]["code"],
            "description": condition["code"]["coding"][0]["display"]
        })

df = pd.DataFrame(data)

In [14]:
# Step 5: Write DataFrame to CSV
df.to_csv("patient_conditions.csv", index=False)