# Overview

In this notebook we'll do a quick review of SQL.

# III. SQL
SQL stands for "Structured Query Language". It is used to retrieve data from relational databases, perform aggregations on them, and return them in formats which are useful.

Here is a very quick overview of how SQL works:

## SQL Query
SQL code consists of **queries** which are executed to run commands against the database you're working in. At a very basic level, queries consist of a `select` clause, a `from` clause, and a number of optional clauses clause like a `where` clause, `limit` clause, and `order by`.

## `select` statement
This part of the query specifies which columns of data to return. For example, the following query will select the `employee_id`, `employee_first_name`, and `employee_last_name` values from an imaginary table:

```
select employee_id, employee_first_name, employee_last_name
...
```

## `from` statement
This part of the query tells SQL which tables to retreiev the data from. In this example, we want to get the employee ids and names from a table called `employees`:

```
select employee_id, employee_first_name, employee_last_name
from employees
...
```

## `where` statement
This filters the results of our query to only look at certain values. This query will only return data for employees whose first name is "Alex":
```
select employee_id, employee_first_name, employee_last_name
from employees
where employee_first_name = "Alex"
```

## Other statements:
Only the `select` and `from` statements are needed to run a query. But there are many statements which can be very useful, such as:
- `limit 100`: Limit to the first 100 rows (or whatever number)
- `order by last_name`: Sort the results in alphabetical order according to a specific column, such as `last_name`

Here's an example of the query which puts all of these together:

```
select employee_id, employee_first_name, employee_last_name
from employees
where employee_first_name = "Alex"
order by last_name
limit 100;
```

# IV. MIMIC-II
MIMIC is an openly available clinical database. It's **de-identified**, meaning that any information which would connect a patient to their data has been removed or altered. That means that we have access to it as researchers, students, and developers. 

MIMIC-II has been updated to MIMIC-III, which is similar but contains patients for living patients, while MIMIC-II has only deceased patients. MIMIC-III requires a data usage agreement, so we will instead use the older version. The two versions are very similar and contain a lot of the same data.

Here is a description of MIMIC-III from the [MIMIC website](https://mimic.physionet.org/):

***
MIMIC-III (Medical Information Mart for Intensive Care III) is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.

The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital).

MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors:

- it is freely available to researchers worldwide
- it encompasses a diverse and very large population of ICU patients
- it contains high temporal resolution data including lab results, electronic documentation, and bedside monitor trends and waveforms.
***

We will use Python and SQL to access an instance of SQL which is set up on Google Cloud. You'll need a password to access it; ask your instructor if you don't know it.

First, we'll import the libraries which will allow us to connect to the database:

In [None]:
# Pandas is a library which allows us to work with tabular data from a number of different formats,
# including SQL
import pandas as pd

# pymysql will run MySQL in Python
import pymysql

# Finally, getpass will allow us to type our password in:
import getpass

The host name, username, and database name have been defined for you. When prompted, enter your password:

In [None]:
conn = pymysql.connect(host="35.233.174.193",port=3306,
                       user="jovyan",passwd=getpass.getpass("Enter password for MIMIC2 database"),
                       db='mimic2')

If you didn't get an error, then that means it worked! Let's run our first query against MIMIC to see what tables are in the database:

In [None]:
# Define a query as a string
query = """
show tables;
"""

# Pass the query and our MySQL connection to pandas. 
# Store the result a variable called df (DataFrame)
df = pd.read_sql(query, conn)
df