# ICU CXR Patients

In this notebook, the goal is to find the patients that have both icu and cxr data.

We start by reading the following data sources:

- mimic_cxr_chexpert : the file containing all patients with a chest pathology measurements, meaning those who have a chest radiology image



- icustays : the file containing all patients with an icu stay, meaning those who have a stay_id in the icustays table


Then, we find the intersection between the two datasets.

### Imports

In [3]:
import os

os.chdir('../')

import pandas as pd

from pandas import read_csv

from src.data import constants

### Read data from local source

In [4]:
df_mimic_cxr_chexpert = read_csv(constants.mimic_cxr_chexpert)
df_icustays = read_csv(constants.icustays)

In [6]:
print('The number of patients with a chest radiology image is:')

print(len(df_mimic_cxr_chexpert["subject_id"].unique()))

print('The number of patients with an icu stay is:')

print(len(df_icustays["subject_id"].unique()))


The number of patients with a chest radiology image is:
65379
The number of patients with an icu stay is:
53150


### Find the intersection between the two datasets:

In [7]:
icu_cxr_patients = pd.Series(list(set(df_mimic_cxr_chexpert["subject_id"]).intersection(set(df_icustays["subject_id"]))))


In [10]:
print('The number of patients with both a chest radiology image and an icu stay is:')

len(icu_cxr_patients)

The number of patients with both a chest radiology image and an icu stay is:


20245

### Saving the result to a csv file

In [None]:
icu_cxr_patients.to_csv("csvs/icu_cxr_patients.csv")

### Selecting a cohort of 10 patients and saving the result to a csv file

This cohort can be used for the tests and embeddings generation

In [13]:
icu_cxr_patients_sample10 = icu_cxr_patients.sample(10, random_state=0)

In [14]:
icu_cxr_patients_sample10

8535     14734813
14315    11888614
4057     16934248
5473     17074638
17937    16762272
12032    14888240
14206    13198693
7539     18135918
19179    11658675
4843     14449150
dtype: int64

In [16]:
icu_cxr_patients_sample10.to_csv("csvs/icu_cxr_patients_sample10.csv")