# Introduction to Relational Databases
## Relational Databases and sqlite



### Relational Databases?

- Data stored in **tables** with **fields** (columns) and records (rows)
- All values in a field have the same **type** (Number, Character, Date/Time)
- **Query** to view, sort, filter, and calculate


### Benefits

- **Safety** - Separate data from analysis
- **Speed** - RDBMS are good at the sorting/filtering
- **Quality Control** - Data types are enforced ([usually](https://www.sqlite.org/faq.html#q3))
- **Organization** - _relational_ database
- **Tuned** - Focused on data, and language (SQL) to access it.


### Example: CD4_Frame

<table width="100%">
    <tbody>
    <tr>
        <th>name</th>
        <th>cd4_baseline</th>
        <th>cd4_followup</th>
    </tr>
    <tr>
        <td>Jane</td>
        <td>364</td>
        <td>448</td>
    </tr>
    <tr>
        <td>Jill</td>
        <td>836</td>
        <td>NaN</td>
    </tr>
    <tr>
        <td>Joe</td>
        <td>2117</td>
        <td>1959</td>
    </tr>
</tbody>
</table>

### Structure

More rigid than a spreadsheet, but also more robust

- Database (Lowest level). Can be a file or a server
- Tables - Like a worksheet or a Data Frame
- Columns - Definition of the types of data. Not the data itself
- Rows - A set of values for the columns

SQLite Data Types ([sqlite.org](https://www.sqlite.org/datatype3.html))

- NULL
- **INTEGER**
- **REAL**
- **TEXT**
- BLOB


### Relational Database Management Systems

Some free/open source. Differ in some data types and advanced functionality, but all do tables and queries.

PostgreSQL, MySQL, Oracle, Microsoft SQL Server, **SQLite**


### Using SQLite

    SQLite version 3.8.5 2014-08-15 22:37:57
    Enter ".help" for usage hints.
    Connected to a transient in-memory database.
    Use ".open FILENAME" to reopen on a persistent database.
    sqlite>

Stuck? **;** or **Control-C**


### Creating a table

Looking at the stack, we have the database

Now we need a table and columns

- cd4
    - name
    - cd4_baseline
    - cd4_followup


Create table:

    sqlite> CREATE TABLE cd4 (
       ...>   name TEXT,
       ...>   cd4_baseline REAL,
       ...>   cd4_followup REAL
       ...> );
   
List tables:

    sqlite> .tables
    cd4

See definition with `.schema`

### Exercise: Start sqlite3

1. Open a command prompt or terminal
2. `cd` to the directory containing **cd4.db**
3. Open sqlite3:
    - `conda run sqlite3`
    - `sqlite3`

When you see 

    sqlite>

type **`.open cd4.db`**


### What's inside?

Tables. List them with **`.tables`**.

    sqlite> .tables
    cd4

To _query_ a table, you **SELECT** from it

    SELECT * from cd4;

    Jane|364.0|448.0
    Jill|836.0|
    Joe|2117.0|1959.0
    John|815.0|792.0

### Exercise: Change format

`.mode` and `.headers`

1. Turn on headers with **`.headers on`**.
2. Type **`.mode`** alone to see options.
3. Change the mode to `column`.

Try some other modes, like **line** or **tabs**

    sqlite> .mode column
    sqlite> .headers on
    sqlite> SELECT * FROM cd4;
    name        cd4_baseline  cd4_followup
    ----------  ------------  ------------
    Jane        364.0         448.0       
    Jill        836.0                     
    Joe         2117.0        1959.0      
    John        815.0         792.0   


### Populating a database

- **INSERT** to add rows.
- Unlike spreadsheets, *rows* have no inherent order.
- *Column* order does matter!

    `sqlite> INSERT INTO cd4 VALUES ('Jimmy',900, 800);`

Single quotes around names, end with a semicolon

### Exercise: Add some rows

Inserting:

    INSERT INTO cd4 VALUES ('Jessie',500,654.3);

Selecting:

    SELECT * FROM cd4;

<table width="100%">
    <tbody>
<TR><TH>name</TH>
<TH>cd4_baseline</TH>
<TH>cd4_followup</TH>
</TR>
<TR><TD>Jane</TD>
<TD>364.0</TD>
<TD>448.0</TD>
</TR>
<TR><TD>Jill</TD>
<TD>836.0</TD>
<TD></TD>
</TR>
<TR><TD>Joe</TD>
<TD>2117.0</TD>
<TD>1959.0</TD>
</TR>
<TR><TD>John</TD>
<TD>815.0</TD>
<TD>792.0</TD>
</TR>
<TR><TD>Jimmy</TD>
<TD>900.0</TD>
<TD>800.0</TD>
</TR>
<TR><TD>Jessie</TD>
<TD>500.0</TD>
<TD>654.3</TD>
</TR>
</tbody>
</table>

### Sorting and Filtering

Sorting - `ORDER BY` a column name. Can do ascending or descending

    SELECT * FROM cd4 ORDER BY cd4_baseline;
    SELECT * FROM cd4 ORDER BY name DESC;

Filter rows with `WHERE`. This is like the filtering in a DataFrame 

`=`,`<`,`>`,`like`

    SELECT * FROM cd4 WHERE cd4_baseline < 850;
    SELECT * FROM cd4 WHERE name like 'Ji%';
    SELECT * FROM cd4 WHERE cd4_baseline < 850 AND name like 'Ji%';

**NULL** is the new **NaN**, and it's special:

    SELECT * FROM cd4 WHERE cd4_followup IS NULL;
    SELECT * FROM cd4 WHERE cd4_followup IS NOT NULL;    
    
**IN** for a list of values:

    SELECT * FROM cd4 WHERE name IN ('Jane','Jill')

### Columns

Specify columns by naming them instead of `*`

    SELECT name FROM cd4;
    
Can reorder too:

    SELECT cd4_baseline, cd4_followup, name FROM cd4;

Or even rename:

    SELECT cd4_baseline AS baseline, cd4_followup AS followup, name from cd4;

### Calculations 

Generate calculated columns:

    SELECT cd4_followup - cd4_baseline FROM cd4;

Let's build that **percent_change** calculation we had in pandas. 

1. diff = followup - baseline
2. percent = (diff / baseline) * 100

We can use parenthesis to group calculations

    SELECT *, cd4_followup - cd4_baseline FROM cd4;

    SELECT *, (cd4_followup - cd4_baseline) / cd4_baseline FROM cd4;

    SELECT *, (100 * (cd4_followup - cd4_baseline) / cd4_baseline) FROM cd4;

    SELECT *, (100 * (cd4_followup - cd4_baseline) / cd4_baseline) as percent_change FROM cd4;

### Moving data between systems

- Existing data in a database
- Bringing data to analysis _(or analysis to data)_
- Plaintext or CSV is universal


### Export to CSV

Using `.header` and `.mode` [again](https://www.sqlite.org/cli.html)

Just one new command: `.once`

    sqlite> .header on
    sqlite> .mode csv
    sqlite> .once cd4_export.csv
    sqlite> SELECT * FROM cd4;

Exercise: Create a CSV file with data sorted by **cd4_baseline** that includes **percent_change**.

### Import from CSV

The `.import` command does this

    .import file.csv table_name

Two cases: creating a new table or importing to an existing table

Let's import the **long_data.csv** file into a database


### Exercise: Importing a CSV

    sqlite> .open long_data.db
    sqlite> .mode csv
    sqlite> .import long_data_cleaned.csv long_data

Now we can query long_data. What's the schema?