# Using SQL and Python Together

In this notebook, we'll go over some code to use an SQL query to bring in a table from a PostgreSQL database into a pandas DataFrame.

First, we start as usual by loading the appropriate packages.

In [5]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import sqlite3

We're going to be using the `connect` function in `sqlite3` to connect to the database. Then, we'll use `pandas`, which has some of the SQL reading functionality built into it already, to bring in a table as a DataFrame.

## Creating a Connection

We start by creating a connection to the database. The code below doesn't actually connect just yet -- we're just creating the engine with which we will connect to the database.

In [6]:
conn = sqlite3.connect("lodes.db")

Similar to when we read in CSV files, we're just pointing to a file path containing our database, `lodes.db`. In this case, we have it conveniently located in the same folder as this notebook, so you just need to specify the name of the database. 

If we were to use a different flavor of SQL (such as PostgreSQL), then we would create our connection to the database slightly differently. For example, we could use the `create_engine` function inside the `psycopg2` package to connect to a PostgreSQL database. After you create the connection, though, everything afterwards is the same -- you can use that connection to write SQL code to do all the same things such as bringing in a table as a DataFrame.

## Reading SQL tables using Pandas

Now that we've created our engine to connect to the database, we can use the `read_sql` function in `pandas` to write SQL queries and get tables out as DataFrames.

In [8]:
df = pd.read_sql("SELECT * FROM lodes.ca_wac_2015",conn)

Here, `pd.read_sql()` outputs the table that the SQL query that we wrote as a string would return. In this case, it's simply the `ca_wac_2015` table. Of course, you can include more complicated queries, such as joins, if you'd like.

Let's look at the data to make sure we got what we wanted.

In [4]:
df.head()

Unnamed: 0,w_geocode,c000,ca01,ca02,ca03,ce01,ce02,ce03,cns01,cns02,...,cfa02,cfa03,cfa04,cfa05,cfs01,cfs02,cfs03,cfs04,cfs05,createdate
0,60014001001007,30,2,16,12,4,2,24,0,0,...,0,0,0,0,0,0,0,0,0,20170919
1,60014001001008,4,0,1,3,0,0,4,0,0,...,0,0,0,0,0,0,0,0,0,20170919
2,60014001001011,3,2,1,0,0,3,0,0,0,...,0,0,0,0,0,0,0,0,0,20170919
3,60014001001017,11,3,3,5,2,2,7,0,0,...,0,0,0,0,0,0,0,0,0,20170919
4,60014001001024,10,3,3,4,7,1,2,0,0,...,0,0,0,0,0,0,0,0,0,20170919
