# Illinois Dashboard - Day 2

#### Description

In this notebook, you will begin buidling the fundamental components of the Illinois Dashboard. At the end, you will have a notebook that does the following:

- Queries economic data from a database using SQL
- Creates a static, geospatial plot of the data

## Python Setup

Before writing any of the code for queries or plotting, you'll need to import the necessary Python packages. Afterwards, you'll create a connection to the database from which you will query the data.

In [None]:
# Package for database connection
from sqlalchemy import create_engine

# Packages for data manipulation
import pandas as pd
import numpy as np
import geopandas as gpd

# Packages for visualizations
import matplotlib.pyplot as plt
import seaborn as sns

# Ignore warnings. This is to prevent distracting notices of new packages that are unnecessary
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Database connection
engine = create_engine('postgresql://@10.10.2.10/appliedda')

## Geographic Components with `geopandas`

### Get County Shapes

In [None]:
### statefp: 17 for IL ###
qry = """
SELECT countyfp, name,
    ST_Transform(geom, 102698) geom 
FROM tl_2016_us_county 
WHERE statefp = '17'
"""
counties = gpd.read_postgis(qry, engine, geom_col='geom')
counties['coords'] = counties.geometry.apply(lambda x: x.representative_point().coords[0])

## Number of Jobs

### Query Data

In [None]:
# The code below pulls from the entire wage data - it will take a while to run.
# One person from your group can run it but in what follows, use the Random Sample table.

# qry = """
# select cnty, count(*) as jobs
# from ada_18_uchi.dashboard_data_il
# where year = 2006 and qtr = 1
# group by cnty
# """
# df = pd.read_sql(qry, engine)

In [None]:
qry = """
select cnty, count(*) as jobs
from ada_18_uchi.dashboard_data_il_jobs_rs
where year = 2006 and qtr = 1
group by cnty
order by cnty
"""
df = pd.read_sql(qry, engine)

In [None]:
df.head()

In [None]:
df.tail()

Keep in mind that this is a random sample of the overall data: numbers here are much lower than the actual number of jobs in these counties.

### Merge with County Shapefile

In [None]:
cnty_df = pd.merge(counties, df, left_on=['countyfp'], right_on=['cnty'])

In [None]:
cnty_df.head()

### Plot

In [None]:
# Configure plot settings
sns.set_style('white')
f, ax = plt.subplots(1, figsize=(6,8))
colmap = sns.cubehelix_palette(8, start=2.9, rot=0, dark=.1, light=.95, as_cmap=True)
cnty_df.plot('jobs', ax=ax, legend=True, edgecolor='black', cmap=colmap)
ax.axis('off')
plt.show()

## Average Earnings

### Query Data

In [None]:
qry = """
select cnty, avg(wage) as avg_wage
from ada_18_uchi.dashboard_data_il_jobs_rs
where year = 2006 and qtr = 1
group by cnty
order by cnty
"""
df = pd.read_sql(qry, engine)

In [None]:
df.head()

### Merge with County Shapefile

In [None]:
cnty_df = pd.merge(counties, df, left_on=['countyfp'], right_on=['cnty'])

### Plot

In [None]:
# Configure plot settings
sns.set_style('white')
f, ax = plt.subplots(1, figsize=(6,8))
colmap = sns.cubehelix_palette(8, start=2.9, rot=0, dark=.1, light=.95, as_cmap=True)
cnty_df.plot('avg_wage', ax=ax, legend=True, edgecolor='black', cmap=colmap)
ax.axis('off')
plt.show()

## Task 1
Using the above code, generate the plot visualizing the **number of jobs** by county **in Q3 of 2013**. 

Save the plot to your personal folder in the `Projects/ada_18_uchi/user/` folder.

In [None]:
# Query:





In [None]:
# Merge County Shapefile





In [None]:
# Plot



# Save plot to your personal folder.

## Change in Number of Jobs

After observing the number of jobs in a given quarter, let's see how the number of jobs per county changed between two quarter: in this case, between Q1 of 2007 and Q1 of 2009.

### Query Data

In [None]:
qry = '''
select a.cnty
    , a.jobs as jobs_a
    , b.jobs as jobs_b
    , b.jobs - a.jobs as change_in_jobs
    , (b.jobs - a.jobs)/a.jobs as change_in_jobs_pct
from(
    select cnty, count(*) as jobs
    from ada_18_uchi.dashboard_data_il_jobs_rs
    where year = 2007 and qtr = 1 
    group by cnty
) as a
full join (
    select cnty, count(*) as jobs
    from ada_18_uchi.dashboard_data_il_jobs_rs
    where year = 2009 and qtr = 1 
    group by cnty
) as b
on a.cnty = b.cnty
order by cnty
'''
df = pd.read_sql(qry, engine)

In [None]:
df.head()

Does the above data look correct?

> The `change_in_jobs_pct` variables is yielding unexpected results (only whole numbers). This is because SQL is supposing that an operation (a division, in this case) between two integers is an integer. In order to get the correct results, we have to `cast` at least one of the two it as a `decimal`.

In [None]:
qry = '''
select a.cnty
    , a.jobs as jobs_a
    , b.jobs as jobs_b
    , b.jobs - a.jobs as change_in_jobs
    , cast(b.jobs - a.jobs as decimal)/a.jobs as change_in_jobs_pct
from(
    select cnty, count(*) as jobs
    from ada_18_uchi.dashboard_data_il_jobs_rs
    where year = 2007 and qtr = 1 
    group by cnty
) as a
full join (
    select cnty, count(*) as jobs
    from ada_18_uchi.dashboard_data_il_jobs_rs
    where year = 2009 and qtr = 1 
    group by cnty
) as b
on a.cnty = b.cnty
order by cnty
'''
df = pd.read_sql(qry, engine)

In [None]:
df.head()

That's more like it!

### Merge with County Shapefile

In [None]:
cnty_df = pd.merge(counties, df, left_on=['countyfp'], right_on=['cnty'])

### Plot

In [None]:
# Configure plot settings
sns.set_style('white')
f, ax = plt.subplots(1, figsize=(6,8))
colmap = sns.cubehelix_palette(8, start=2.9, rot=0, dark=.1, light=.95, as_cmap=True)
cnty_df.plot('change_in_jobs_pct', ax=ax, legend=True, edgecolor='black', cmap=colmap)
ax.axis('off')
plt.show()

## Task 2
Using the above code, generate the plot visualizing the **change in number of jobs** by county **between Q2 of 2010 and Q3 of 2013**. 

Save the plot to your personal folder in the `Projects/ada_18_uchi/user/` folder.

In [None]:
# Query:





In [None]:
# Merge County Shapefile





In [None]:
# Plot



# Save plot to your personal folder.

## Change in Average Earnings

The final metric we will explore today is the change in average earnings by county over a time period. Using the previous examples, write the relevant query below, and visualize the results by running the following cells.

### Query Data

In [None]:
# WRITE THE QUERY YOURSELF (use any two years and quarters)
qry = '''





'''
df = pd.read_sql(qry, engine)

In [None]:
df.head()

### Merge with County Shapefile

In [None]:
cnty_df = pd.merge(counties, df, left_on=['countyfp'], right_on=['cnty'])

### Plot

In [None]:
# Configure plot settings
sns.set_style('white')
f, ax = plt.subplots(1, figsize=(6,8))
colmap = sns.cubehelix_palette(8, start=2.9, rot=0, dark=.1, light=.95, as_cmap=True)
cnty_df.plot('change_in_jobs_pct', ax=ax, legend=True, edgecolor='black', cmap=colmap)
ax.axis('off')
plt.show()

## Task 3
Generate the plot visualizing the **change in average earnings** by county **between Q2 of 2010 and Q3 of 2013**. 

Save the plot to your personal folder in the `Projects/ada_18_uchi/user/` folder.

Also save the **query you wrote** in your personal folder as a `.txt` file. 

In [None]:
# Query:





In [None]:
# Merge County Shapefile





In [None]:
# Plot



# Save plot to your personal folder.