In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

In [3]:
%load_ext chapyter



In [4]:
import os
os.environ["NOTEBOOK_NAME"] = "09-CountingPatientsByAge.ipynb"

In [5]:
%%mimicSQL

Retrieve all data from PATIENTS table where patients database. 

Sure, here is a SQL query to retrieve all data from the PATIENTS table:

```sql
SELECT * 
FROM mimic.mimiciii.patients;
```

This query will return all columns and rows from the PATIENTS table in the MIMIC-III database. Please replace 'mimic.mimiciii' with your actual database schema if it's different.


Unnamed: 0,row_id,subject_id,gender,dob,dod,dod_hosp,dod_ssn,expire_flag
0,234,249,F,2075-03-13 00:00:00.000,,,,0
1,235,250,F,2164-12-27 00:00:00.000,2188-11-22 00:00:00.000,2188-11-22 00:00:00.000,,1
2,236,251,M,2090-03-15 00:00:00.000,,,,0
3,237,252,M,2078-03-06 00:00:00.000,,,,0
4,238,253,F,2089-11-26 00:00:00.000,,,,0


In [8]:
%%mimicPython

How many men and women are there?

To find out the number of men and women in the dataframe, you can use the `value_counts()` function on the 'gender' column. Here is the Python code:

```python
df['gender'].value_counts()
```

This will return the count of each unique value in the 'gender' column, which should be 'M' for male and 'F' for female.


Result : gender
M    26088
F    20386
Name: count, dtype: int64


In [9]:
%%mimicSQL

Now retrieve the same dataset from MIMIC-III, except only patients over 18.

In the MIMIC-III database, the patient's age is not directly given. However, we can calculate it using the difference between the admission time (from the ADMISSIONS table) and the date of birth (from the PATIENTS table). 

Please note that for patients over 89, the ages are shifted to 300 to de-identify the data. Here is a SQL query to retrieve the data for patients over 18:

```sql
SELECT p.*
FROM mimic.mimiciii.patients p
JOIN mimic.mimiciii.admissions a ON p.subject_id = a.subject_id
WHERE DATE_PART('year', AGE(a.admittime, p.dob)) > 18;
```

This query will return all columns from the PATIENTS table for patients over 18 years old at the time of their admission. Please replace 'mimic.mimiciii' with your actual database schema if it's different.
Query failed! State: FAILED
Reason: FUNCTION_NOT_FOUND: line 4:25: Function 'age' not registered
SQL RETRIEVAL FAILED!


In [10]:
%%mimicSQL

Try again.

I apologize for the confusion. The AGE function is not available in AWS Athena. We can instead use the DATEDIFF function to calculate the age. Here is the corrected SQL query:

```sql
SELECT p.*
FROM mimic.mimiciii.patients p
JOIN mimic.mimiciii.admissions a ON p.subject_id = a.subject_id
WHERE DATEDIFF('year', p.dob, a.admittime) > 18;
```

This query will return all columns from the PATIENTS table for patients over 18 years old at the time of their admission. Please replace 'mimic.mimiciii' with your actual database schema if it's different.
Query failed! State: FAILED
Reason: FUNCTION_NOT_FOUND: line 4:7: Function 'datediff' not registered
SQL RETRIEVAL FAILED!


In [14]:
%%mimicSQL

Try again please.

I apologize for the confusion earlier. AWS Athena uses a different syntax for date operations. Here is the corrected SQL query:

```sql
SELECT p.*
FROM mimic.mimiciii.patients p
JOIN mimic.mimiciii.admissions a ON p.subject_id = a.subject_id
WHERE DATE_DIFF('year', DATE(p.dob), DATE(a.admittime)) > 18;
```

This query will return all columns from the PATIENTS table for patients over 18 years old at the time of their admission. Please replace 'mimic.mimiciii' with your actual database schema if it's different.


Unnamed: 0,row_id,subject_id,gender,dob,dod,dod_hosp,dod_ssn,expire_flag
0,234,249,F,2075-03-13 00:00:00.000,,,,0
1,234,249,F,2075-03-13 00:00:00.000,,,,0
2,234,249,F,2075-03-13 00:00:00.000,,,,0
3,235,250,F,2164-12-27 00:00:00.000,2188-11-22 00:00:00.000,2188-11-22 00:00:00.000,,1
4,236,251,M,2090-03-15 00:00:00.000,,,,0


In [15]:
%%mimicPython

How many men and women are there?

You can use the `value_counts()` function in pandas to count the number of men and women in the dataframe. Here is the Python code:

```python
df['gender'].value_counts()
```

This will return the count of each unique value in the 'gender' column, which should be 'M' for male and 'F' for female.


Result : gender
M    28402
F    22179
Name: count, dtype: int64


In [16]:
%%mimicPython

How many more people do we have in this than the last time we calculated the number of people?

To calculate the difference in the number of people from the last time, you can subtract the previous count from the current count. Here is the Python code:

```python
previous_count = 26088 + 20386
current_count = df['gender'].value_counts().sum()

difference = current_count - previous_count
difference
```

This will return the difference in the number of people from the last time we calculated.


Result : 4107


In [17]:
%%mimicPython

Why are there more people in that last query, if I put an additional restriction in?

The increase in the number of people, despite adding an age restriction, could be due to the fact that the same patient might have multiple admissions records. When we joined the PATIENTS table with the ADMISSIONS table, we might have counted the same patient multiple times if they had more than one admission. 

To get the unique number of patients, we should count the distinct 'subject_id' in the dataframe. Here is the Python code:

```python
current_unique_patients = df['subject_id'].nunique()
```

This will return the number of unique patients in the current dataframe.


Result : 38388
