### SQL and Pandas Data Frames

- Pandas can read/write SQL databases to/from data frames
- Works with many databases
- SQLite3 support is built-in

First, import pandas and sqlite3

In [None]:
import pandas as pd
import sqlite3

Let's see what's in our directory

In [None]:
!ls *.db

### Reading Data Frame from SQL

First, you need to get a database connection. Pandas doesn't read the file directly, it needs a connection object.

In [None]:
conn = sqlite3.connect('cd4.db')

Pandas can now issue SQL queries to that connection and create a **DataFrame**

In [None]:
pd.read_sql('select * from cd4 order by name',conn)

See that NULL has become NaN

And these are Data Frames like any other. We can get their info or describe them:

In [None]:
cd4 = pd.read_sql('select * from cd4',conn)
cd4.info()

Or add a column:

In [None]:
cd4['diff'] = cd4['cd4_baseline'] - cd4['cd4_followup']

In [None]:
cd4

But it's a copy of the database - changing the data frame does not change the underyling database

In [None]:
pd.read_sql('select * from cd4',conn)

This should not be surprising, CSV behaves the same way. To update the database with this new column, we'll use `to_sql`

In [None]:
cd4.to_sql('cd4', conn)

In [None]:
pd.read_sql('select * from cd4_diff', conn)

conn.close()

## Interoperability with CSV

Start with a data frame, e.g. from CSV:

In [None]:
long_data = pd.read_csv('long_data_cleaned.csv', index_col=0)

In [None]:
long_data.info()

In [None]:
long_data[0:5]

And we can take this CSV data and write it to a database system.
Again create a connection.

In [None]:
long_data_conn = sqlite3.connect('long_data.db')
long_data.to_sql('long_data',long_data_conn, if_exists='replace')


Let's read that back to see how it compares

In [None]:
pd.read_sql("select * from long_data", long_data_conn)

## Exercise: Filter and export data

Write a new table containing just the long_data rows with the following analytes:

- **p31**
- **p24**

Hint: More than one way to do this, depending on what you choose to `append`, or how to filter.

In [None]:
p31 = long_data[long_data['analyte'] == 'p31']
p24 = long_data[long_data['analyte'] == 'p24']
subset = p31.append(p24)

long_data_conn = sqlite3.connect('long_data.db')
subset.to_sql('long_data_subset',long_data_conn, if_exists='replace')
long_data_conn.close()
subset
