# (EX) Electronic medical record (as a SQL refresher)
This example utilizes a subset of the [MIMIC III demo](https://physionet.org/content/mimiciii-demo/1.4/) dataset in illustrating the use of database using DuckDB.

*Quick notes about DuckDB:*  

- DuckDB is a relational database for analytics processing (i.e., OLAP)
- DuckDB is columnar-oriented
- DuckDB scales reasonably for *relatively* large datasets and works well for local development

For MIMIC-III, here is a [full summary](https://mit-lcp.github.io/mimic-schema-spy/) where you can understand the relationship between tables.

In [1]:
!pip install duckdb==1.2.2
import duckdb



In [3]:
# establish connection
conn = duckdb.connect('dataset/mimic.db', read_only=True)

conn.sql('SHOW TABLES;')

┌────────────┐
│    name    │
│  varchar   │
├────────────┤
│ ADMISSIONS │
│ DRGCODES   │
│ D_ICDPROCS │
│ ICUSTAYS   │
│ PATIENTS   │
│ PROCS_ICD  │
└────────────┘

We will use a subset of the tables provided in the MIMIC-III database:


*   ADMISSIONS.csv
*   DRGCODES.csv
*   D_ICD_PROCEDURES.csv
*   ICUSTAYS.csv
*   PATIENTS.csv
*   PROCEDURES_ICD.csv
*   PRESCRIPTIONS.csv (new)



In [4]:
# metadata for a particular table
conn.sql(
    """
    SELECT column_name, data_type FROM information_schema.columns
    WHERE table_name = 'PRESCRIPTIONS';
    """
)

┌─────────────┬───────────┐
│ column_name │ data_type │
│   varchar   │  varchar  │
├─────────────┴───────────┤
│         0 rows          │
└─────────────────────────┘

In [None]:
# admission table at a glance
conn.sql(
    """
    SELECT sum(expire_flag) FROM PATIENTS;
    """
)

In [None]:
# number of patients
conn.sql(
    """
    SELECT COUNT(*) FROM PATIENTS;
    """
)

## Refresher for simple queries

1. How many records are in each of the available tables?
2. How many patients are female?
3. How many patients passed away during the hospital stay?
4. How many different admission types are there? What are they?
5. What is the earliest and the latest admission time in the database?

In [None]:
conn.sql(
    """
    SELECT count(*) from PATIENTS
    WHERE gender = 'F'
    """
)

In [None]:
conn.sql(
    """
    SELECT COUNT(*) FROM PATIENTS
    WHERE dod_hosp IS NOT NULL
    """
)

conn.sql(
    """
    SELECT COUNT(*) FROM ADMISSIONS
    WHERE deathtime IS NOT NULL
    """
)

In [None]:
conn.sql(
    """
    SELECT DISTINCT admission_type FROM admissions
    """
)

In [None]:
conn.sql(
    """
    select min(admittime), max(admittime) FROM ADMISSIONS
    """
)

## Slightly more complicated queries
1. Create a table with all ICU stays with their respective patient information.
2. Create a table to show all unique DRG (diagnosis-related group) codes and the number of associated admissions.

In [None]:
conn.sql(
    """
    SELECT * FROM ICUSTAYS
    JOIN admissions
    ON ICUSTAYS.subject_id = admissions.subject_id
    """
)

## Derivative queries

1. Calculate the age of each patient at the time of admission.  
    *Hint:* `cast(patients.dob as date)` allows for addition and subtraction of dates (in days)
2. Identify if a patient passes away when they are in the ICU.
3. Calculate the average duration of admission.

In [None]:
conn.sql(
    """

    """
)

# Creating your own database

The example below refers to creating your own database and loading data into tables.

In [None]:
# establish connection
conn = duckdb.connect('dataset/mimic_new.db', read_only=False)

conn.sql('SHOW TABLES;')