<img style="float: center;" src="images/CI_horizontal.png" width="600">
<center>
    <span style="font-size: 1.5em;">
        <a href='https://www.coleridgeinitiative.org'>Website</a>
    </span>
</center>

Ghani, Rayid, Frauke Kreuter, Julia Lane, Adrianne Bradford, Alex Engler, Nicolas Guetta Jeanrenaud, Graham Henke, Daniela Hochfellner, Clayton Hunter, Brian Kim, Avishek Kumar, Jonathan Morgan, Ursula Kaczmarek, Benjamin Feder. 

_source to be updated when notebook added to GitHub_

This notebook contains code we wrote to create permanent tables in the `ada_tdc_2019` schema that we reference in the [Dataset Exploration](01_2_Dataset_Exploration_2019.ipynb) notebook.

In [None]:
# pandas-related imports
import pandas as pd

# database interaction imports
import sqlalchemy

In [None]:
# to create a connection to the database, 
# we need to pass the name of the database and host of the database

host = 'stuffed.adrf.info'
DB = 'appliedda'

connection_string = "postgresql://{}/{}".format(host, DB)
conn = sqlalchemy.create_engine(connection_string)

The cells below chart the steps we used to create `ada_tdc_2019.q42014_hoh`, which contains social security numbers of the primary recipient of TANF benefits whose spells ended in 2014 Q4 as well as the state in which they received their benefits.

In [None]:
# find case id and recipient number combinations for the primary recipient of tanf in Illinois
qry = '''
create temp table if not exists prim_recpt as 
select i.ch_dpa_caseid, i.recptno, i.start_date::date, i.end_date::date
from il_dhs.indcase_spells i, il_dhs.member_relation r
where benefit_type = 'tanf46' and end_date between '2014-10-01' and '2014-12-31' and
r.ch_dpa_caseid = i.ch_dpa_caseid and r.recptno = i.recptno and reltogte = 82
'''

conn.execute(qry, conn)

In [None]:
# find ssns corresponding to these case id - recipient pairings for primary recipients in Illinois
qry = '''
create temp table if not exists il_ssns as 
select distinct(ssn_hash), 17 as fips, start_date::date, end_date::date
from prim_recpt p , il_dhs."member" m
where m.recptno = p.recptno and m.ch_dpa_caseid = p.ch_dpa_caseid
'''

conn.execute(qry)

In [None]:
# do the same in Indiana
qry = '''
create temp table if not exists in_ssns as 
select distinct(ssn), 18 as fips, tanf_start_date::date, tanf_end_date::date
from in_fssa.person_month
where tanf_end_date between '2014-10-01' and '2014-12-31' and relat = '01'
'''

conn.execute(qry)

In [None]:
# Union the tables for Illinois and Indiana together
qry = '''
create table if not exists ada_tdc_2019.q42014_hoh as 
select *
from il_ssns
union all
select *
from in_ssns
'''

conn.execute(qry)

The cells below describe the steps we used to create `ada_tdc_2019.q42014_cohort_wage`, which is a combination of all jobs during a specific subset of quarters that TANF recipients whose spells ended in 2014 Q4 worked in Indiana and Illinois.

In [None]:
# find jobs in Indiana for just ssns in the 2014 Q4 data for specific years 
qry = '''
create temp table if not exists in_wages as
select * 
from in_dwd.wage_by_employer 
where (year = 2015 or (year  = 2014 and quarter = 4) or (year = 2016 and quarter = 1)) and 
ssn in (select distinct ssn_hash from ada_tdc_2019.q42014_hoh)
'''

conn.execute(qry)

In [None]:
# do the same thing for Illinois
qry = '''
create temp table if not exists il_wages as 
select *
from il_des_kcmo.il_wage
where (year = 2015 or (year  = 2014 and quarter = 4) or (year = 2016 and quarter = 1)) and 
ssn in (select distinct ssn_hash from ada_tdc_2019.q42014_hoh)
'''

conn.execute(qry)

In [None]:
# union Indiana and Illinois wage data together
qry = '''
create table if not exists ada_tdc_2019.q42014_cohort_wage as
select ssn, year, quarter, uiacct, wages, 18 as state, format('%s-%s-1', year, quarter*3-2)::date as job_yr_q
from in_wages
union all 
select ssn, year, quarter, empr_no, wage, 17 as state, format('%s-%s-1', year, quarter*3-2)::date as job_yr_q
from il_wages
'''

conn.execute(qry)

The following cell describes the query used to create the table `ada_tdc_2019.all_wages`, which contains wage data for all employees in Illinois and Indiana between 2014 Q4 and 2016 Q1.

In [None]:
# all employees wage data between 2014Q4 and 2016Q1
qry = '''
create table if not exists ada_tdc_2019.all_wages as
select ssn, year, quarter, uiacct, wages, 18 as state, format('%s-%s-1', year, quarter*3-2)::date as job_yr_q
from in_dwd.wage_by_employer
where year = 2015 or (year = 2014 and quarter = 4) or (year = 2016 and quarter = 1)
union all 
select ssn, year, quarter, empr_no, wage, 17 as state, format('%s-%s-1', year, quarter*3-2)::date as job_yr_q
from il_des_kcmo.il_wage
where year = 2015 or (year = 2014 and quarter = 4) or (year = 2016 and quarter = 1)
'''

conn.execute(qry)

The following cells contain the queries used to create the table `ada_tdc_2019.all_employers`, which contains data about every employer in Indiana and Illinois in 2015 Q1 that hired at least one individual in the UI wage data in 2015 Q1.

In [None]:
# find sizes of employers in Indiana
qry = '''
create temp table in_all_empl as 
select e.uiacct, e.naics, COUNT(DISTINCT(w.ssn)) as size, 18 as state
from in_dwd.in_qcew_employers e
join in_dwd.wage_by_employer w
on e.uiacct = w.uiacct
where e.uiacct in (select distinct uiacct from ada_tdc_2019.all_wages where state = 18) and 
e.year = 2015 and e.quarter = 1 and w.year = 2015 and w.quarter = 1
group by e.uiacct, e.naics;
'''

conn.execute(qry)

In [None]:
# join Indiana and Illinois employer data
qry = '''
create table if not exists ada_tdc_2019.all_employers as
select *
from in_all_empl
union all 
select empr_no, substring(naics_combined from 1 for 3), GREATEST(empl_month1, empl_month2, empl_month3)::integer, 17 as state
from il_des_kcmo.il_qcew_employers
where empr_no in (select distinct uiacct from ada_tdc_2019.all_wages where state = 17) and
year = 2015 and quarter = 1
group by empr_no, substring(naics_combined from 1 for 3), empl_month1, empl_month2, empl_month3
'''

conn.execute(qry)

The following cells contain the queries used to create the table `ada_tdc_2019.tanf_employers`, which contains data about every employer in Indiana and Illinois in 2015 Q1 that hired at least one individual who was a primary recipient of TANF benefits that ended in 2014 Q4 and was present in the UI wage data in 2015 Q1.

In [None]:
# find sizes of employers of primary recipients of TANF benefits that ended in 2014 Q4 in Indiana
qry = '''
create temp table if not exists in_tanf_empl as 
select e.uiacct, e.naics, COUNT(DISTINCT(w.ssn)) as size, 18 as state
from in_dwd.in_qcew_employers e
join in_dwd.wage_by_employer w
on e.uiacct = w.uiacct
where e.uiacct in (select distinct uiacct from ada_tdc_2019.q42014_cohort_wage where state = 18) and 
e.year = 2015 and e.quarter = 1 and w.year = 2015 and w.quarter = 1
group by e.uiacct, e.naics;
'''

conn.execute(qry)

In [None]:
# join employer data for Indiana and Illinois for those who 
# employed primary recipients of TANF benefits that ended in 2014 Q4
qry = '''
create table if not exists ada_tdc_2019.tanf_employers as
select *
from in_tanf_empl
union all 
select empr_no, substring(naics_combined from 1 for 3), GREATEST(empl_month1, empl_month2, empl_month3)::integer, 17 as state
from il_des_kcmo.il_qcew_employers
where empr_no in (select distinct uiacct from ada_tdc_2019.q42014_cohort_wage where state = 17) and
year = 2015 and quarter = 1 and (multi_unit_code = '1' or multi_unit_code = '2')
group by empr_no, substring(naics_combined from 1 for 3), empl_month1, empl_month2, empl_month3
'''

conn.execute()