# Illinois Dashboard - Day 4

#### Description

In this notebook, you will add an additional non-trivial feature to your dashboard: the number jobs in "new"
firms (created less than 5 years ago). The steps will be the following:
- Write a SQL Query that creates that feature
- Incorporate the feature into your dashboard

## Python Setup

In [None]:
# Package for database connection
from sqlalchemy import create_engine

# Packages for data manipulation
import pandas as pd
import numpy as np
import geopandas as gpd

# Packages for visualizations
import matplotlib.pyplot as plt
import seaborn as sns

# Ignore warnings. This is to prevent distracting notices of new packages that are unnecessary
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Database connection
engine = create_engine('postgresql://@10.10.2.10/appliedda')

## SQL Exploration

Let's start by taking a look at the data we have at our disposal:

In [None]:
# Dashboard Data (random sample)
query = '''
SELECT *
FROM ada_18_uchi.dashboard_data_il_jobs_rs
LIMIT 5;
'''
df = pd.read_sql(query, engine)

In [None]:
df.head()

The table has variables `ein_start_year` and `ein_start_qtr` which designate the first quarter the EIN is observed in the data. For every observation, let's create a `new_employer` flag for whether or not the employer existed 5 years before the date of the observation. An employer is a new employer if the EIN's first occurrence was within the last 5 years, or 20 quarters.

In [None]:
query = '''
SELECT *, 
    (year-ein_start_year)*4 + (qtr - ein_start_qtr) as qtrs_of_existence,
    CASE WHEN (year-ein_start_year)*4 + (qtr - ein_start_qtr) < 20 THEN 1 ELSE 0 END AS new_employer
FROM ada_18_uchi.dashboard_data_il_jobs_rs
LIMIT 5;
'''
df = pd.read_sql(query, engine)

In [None]:
df.head()

One issue remains: XXX. Therefore all data from 2005, 2006, 2007, 2008, etc. are flagged as "new employers". We call this issue a right-hand side censorship problem.

To avoid plotting false numbers, one solution is to exclude all observation from the first 5 years of data (all data before 2010). For subsequent data, there will be no doubt regarding whether the employer was created more or less than 5 years ago.

In [None]:
query = '''
SELECT *, 
    (year-ein_start_year)*4 + (qtr-ein_start_qtr) as qtrs_of_existence,
    CASE WHEN year<2010 THEN 1 ELSE 0 END AS right_censorship_flag,
    CASE WHEN ((year-ein_start_year)*4+(qtr-ein_start_qtr)<20 AND year>=2010) THEN 1 ELSE 0 END as new_employer 
FROM ada_18_uchi.dashboard_data_il_jobs_rs
LIMIT 5;
'''
df = pd.read_sql(query, engine)

In [None]:
df.head()

## Query Metric for a Given Year

Now that we have understood how to define this metric, let's query it for a given year (in this case, Q1 of 2012).

In [None]:
query = '''
SELECT cnty
    , count(*) as jobs
    , avg(wage) as avg_wage
    , sum(CASE WHEN ((year-ein_start_year)*4+(qtr-ein_start_qtr)<20 AND year>=2010) THEN 1 ELSE 0 END) as new_empr_jobs
FROM ada_18_uchi.dashboard_data_il_jobs_rs
WHERE year = 2012 AND qtr = 1
GROUP BY cnty
ORDER BY cnty
'''
df = pd.read_sql(query, engine)

In [None]:
df.head()

## Incorporating in Dashboard

Now that we have created this flag, let's add this flag to the previous `group by` query that we used to generate the dashboard. 

In [None]:
count_qry = """
select cnty, 
    count(*) as jobs, 
    avg(wage) as avg_wage
    
    -- ADD QUERY FOR ADDITIONAL METRIC HERE
    
from ada_18_uchi.dashboard_data_il_jobs_rs
where year = {y} and qtr = {q}
group by cnty
order by cnty
"""

In [None]:
change_qry = '''
select a.cnty,
    cast(b.jobs - a.jobs as decimal)/(a.jobs+1) as change_in_jobs_pct,
    cast(b.avg_wage - a.avg_wage as decimal)/(a.avg_wage+1) as change_in_avg_wage_pct
    
    -- ADD QUERY FOR ADDITIONAL METRIC HERE
    
from(
    select cnty, 
        count(*) as jobs, 
        avg(wage) as avg_wage
        
        -- ADD QUERY FOR ADDITIONAL METRIC HERE
        
    from ada_18_uchi.dashboard_data_il_jobs_rs
    where year = {y0} and qtr = {q0} 
    group by cnty
) as a
full join (
    select cnty, 
        count(*) as jobs, 
        avg(wage) as avg_wage
        
        -- ADD QUERY FOR ADDITIONAL METRIC HERE
        
    from ada_18_uchi.dashboard_data_il_jobs_rs
    where year = {y1} and qtr = {q1}
    group by cnty
) as b
on a.cnty = b.cnty
order by cnty
'''

In [None]:
# Import Dashboard Functions
from ui import DashUI

In [None]:
# Define metrics to plot
statefp = '17' # 17 is statefp for Illinois
list_of_metrics = {'Jobs': 'jobs'
                   , 'Average Quarterly Earnings': 'avg_wage'

                   # Insert additional metric for New Jobs
                   
                  }

In [None]:
# Create Dashboard
dash = DashUI(statefp, list_of_metrics, count_qry, change_qry)

In [None]:
# Display the input panel and the output of the dashboard
display(dash.input_panel)
display(dash.output)

### Task 1

Save the two queries you have written above as `.sql` files in your personal folder.