<a href="https://colab.research.google.com/github/alvumu/TFM/blob/main/SerializacionPacientes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Leemos los archivos CSV y se convierten a JSON

In [1]:
import csv
import json

def csv2json(archivo_csv, archivo_json):
    datos = []
    with open(archivo_csv, 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        for fila in csv_reader:
            datos.append(fila)

    with open(archivo_json, 'w') as json_file:
        json_file.write(json.dumps(datos, indent=4))




In [2]:
# Utiliza la función csv_a_json con los nombres de tus archivos CSV y JSON
csv2json('patients.csv', 'patient.json')
csv2json('admissions.csv', 'admissions.json')

Combinamos los datos para el recurso del paciente

In [6]:
# Función para cargar datos de un archivo JSON
def load_json(filename):
    with open(filename) as f:
      return json.load(f)

In [13]:
import json

# Crear un diccionario para almacenar la información combinada
combined_data = {}

# Construir un diccionario de pacientes para una búsqueda eficiente
patients_dict = {}

for patient in load_json('patient.json'):
    patients_dict[patient['subject_id']] = patient

# Iterar sobre las admisiones médicas y combinar la información del paciente
for admission in load_json('admissions.json'):
    subject_id = admission['subject_id']
    patient = patients_dict.get(subject_id)
    # Comprobar si el paciente está presente en el JSON de pacientes
    if patient:
        # Si el paciente ya está en el diccionario combinado, agregar la admisión médica a su lista
        if subject_id in combined_data:
            combined_data[subject_id]['admissions'].append(admission)
        # Si el paciente no está en el diccionario combinado, crear una nueva entrada
        else:
            combined_data[subject_id] = patient.copy()
            combined_data[subject_id]['admissions'] = [admission]

# Guardar la información combinada como un archivo JSON
with open('combined_data.json', 'w') as json_file:
    json.dump(list(combined_data.values()), json_file, indent=2)




Redactamos el texto de cada paciente

In [36]:
import json

# Función para generar texto a partir de una instancia del JSON
def generate_text_from_instance(instance):
    text_by_id = {}
    output_text = ""
    for patient_data in instance:
        patient_id = patient_data["subject_id"]
        patient_text = "Patient Info:\n"
        for key, value in patient_data.items():
            if key != 'admissions':
                patient_text += f"The {key} of the patient is {value}. "
        for admission in patient_data['admissions']:
            for key, value in admission.items():
                patient_text += f"The {key} is {value}. "
            patient_text += "\n"
        patient_text += "\n"
        output_text += patient_text
        # Almacenar la descripción de texto por ID
        text_by_id[patient_id] = patient_text
    return text_by_id





In [27]:
# Leer los datos del archivo JSON
with open('combined_data.json') as f:
    data = json.load(f)

Buscamos un paciente concreto

In [30]:
# Generar texto para el JSON y almacenarlo por ID
text_dict = generate_text_from_instance(data)

In [31]:
# Función para buscar la información por ID
def search_by_id(patient_id):
    return text_dict.get(patient_id, "ID not found")

# Ejemplo de búsqueda
patient_id_to_search = "10039694"
print(search_by_id(patient_id_to_search))

Patient Info:
The subject_id of the patient is 10039694. The gender of the patient is F. The anchor_age of the patient is 36. The anchor_year of the patient is 2170. The anchor_year_group of the patient is 2014 - 2016. The dod of the patient is . The subject_id is 10039694. The hadm_id is 20374452. The admittime is 2170-06-28 19:41:00. The dischtime is 2170-07-02 16:41:00. The deathtime is . The admission_type is URGENT. The admission_location is TRANSFER FROM HOSPITAL. The discharge_location is HOME. The insurance is Medicare. The language is ENGLISH. The marital_status is SINGLE. The ethnicity is WHITE. The edregtime is . The edouttime is . The hospital_expire_flag is 0. 




# Transfers + Admissions

In [4]:
# Utiliza la función csv_a_json con los nombres de tus archivos CSV y JSON
csv2json('transfers.csv', 'transfers.json')

In [32]:
# Crear un diccionario para almacenar la información combinada
combined_data = {}

# Construir un diccionario de pacientes para una búsqueda eficiente
transfers_dict = {}

for transfers in load_json('transfers.json'):
    transfers_dict[transfers['subject_id']] = transfers

# Iterar sobre las admisiones médicas y combinar la información del paciente
for admission in load_json('admissions.json'):
    subject_id = admission['subject_id']
    transfer = transfers_dict.get(subject_id)
    # Comprobar si el paciente está presente en el JSON de pacientes
    if transfer:
        # Si el paciente ya está en el diccionario combinado, agregar la admisión médica a su lista
        if subject_id in combined_data:
            combined_data[subject_id]['admissions'].append(admission)
        # Si el paciente no está en el diccionario combinado, crear una nueva entrada
        else:
            combined_data[subject_id] = transfer.copy()
            combined_data[subject_id]['admissions'] = [admission]

# Guardar la información combinada como un archivo JSON
with open('combined_data_TA.json', 'w') as json_file:
    json.dump(list(combined_data.values()), json_file, indent=2)


In [33]:
# Leer los datos del archivo JSON
with open('combined_data_TA.json') as f:
    data = json.load(f)

In [37]:
# Generar texto para el JSON y almacenarlo por ID
text_dict_TA = generate_text_from_instance(data)

In [39]:
# Función para buscar la información por ID
def search_by_id(patient_id):
    return text_dict_TA.get(patient_id, "ID not found")

# Ejemplo de búsqueda
patient_id_to_search = "12964119"
print(search_by_id(patient_id_to_search))

Patient Info:
The subject_id of the patient is 12964119. The hadm_id of the patient is . The transfer_id of the patient is 31561509. The eventtype of the patient is ED. The careunit of the patient is Emergency Department. The intime of the patient is 2146-01-07 19:39:00. The outtime of the patient is 2146-01-07 19:45:00. The subject_id is 12964119. The hadm_id is 23948311. The admittime is 2145-06-20 22:46:00. The dischtime is 2145-06-23 20:32:00. The deathtime is . The admission_type is EU OBSERVATION. The admission_location is EMERGENCY ROOM. The discharge_location is . The insurance is Medicare. The language is ENGLISH. The marital_status is MARRIED. The ethnicity is WHITE. The edregtime is 2145-06-20 13:17:00. The edouttime is 2145-06-21 00:28:00. The hospital_expire_flag is 0. 
The subject_id is 12964119. The hadm_id is 29615974. The admittime is 2141-05-26 05:44:00. The dischtime is 2141-05-26 21:40:00. The deathtime is . The admission_type is EU OBSERVATION. The admission_locati