## QWI statistics

This notebook will provide a brief example of how to recreate different quarterly workforce indicators from our given data tables. Here are the following steps to do so:

1) Create cohort frame with QWI columns as placeholders

2) Fill in t-4 through t+1 job flag for the cohort
> 'emp_current_qrt','emp_4qtrs_ago','emp_3qtrs_ago','emp_2qtrs_ago'
                 ,'emp_prev_qtr', 'emp_next_qtr'

3) Calculate other statistics' flags
> ,'emp_begin_qtr','emp_end_qtr'
                 ,'emp_full_qtr','accessions_current', 'accessions_consecutive_qtr'
                 ,'accessions_full_qtr','separations','new_hires','recalls'
                 
Now you have a flag of whether each job in your study quarter is one of the given QWI stats.

4) Optionally summarize QWI stats by employer

The following cell contains brief descriptions of some QWI statistics.

* **Flow Employment** (`emp_current_qtr`, `emp_4qtrs_ago`, `emp_3qtrs_ago`, `emp_2qtrs_ago`, `emp_prev_qtr`, `emp_next_qtr`) : These are simple indicators for whether a job existed in each quarter. Recall that by construction, `emp_current_qtr` will always be 1. Changes across the other quarters do not relate directly to changes in employment rates, but rather to the longevity of jobs in the focal quarter. Hence, expect `emp_prev_qtr` to take value 1 more often then `emp_2qtrs_ago`, and so on. 
* **Beginning of Quarter Employment** (`emp_begin_qtr`): Indicates a job that also existed in the prior quarter. Again, because the universe is jobs in the focal quarter, the average of `emp_begin_qtr` should be read as "percentage of jobs in the focal quarter that were carried over from the prior quarter." The same principle applies to all remaining indicators.
* **End of Quarter Employment** (`emp_end_qtr`): Indicates a job that continued to exist in the following quarter. 
* **Full Quarter Employment** (`emp_full_qtr`): Indicates a job that also existed in both the prior and following quarter. This indicator is also known as "stable" employment. Though flow employment can reflect a job that lasted only a short period of time, we will typically assume that a stable job existed for the entirety of the focal quarter. 
* **Accessions** (`accessions_current`): Indicates a job that did NOT exist in the prior quarter. This can include new hires and recalls, which are addressed separately below.
* **Accessions to Consecutive Quarter Status** (`accessions_consecutive_qtr`): Indicates accessions that continued to exist in the following quarter.
* **Accessions to Full Quarter Status** (`accessions_full_qtr`): Indicates an accession that occurred the prior quarter and continued to exist in the following quarter. To be clear, `accessions_consecutive_qtr` is a subset of `accessions_current` but `accessions_full_qtr` is not. These jobs were absent in `m2`, present in `m1`, present in `t`, and present in `p1`. 
* **Separations** (`separations`): Indicates a job that did not continue to exist in the following quarter. 
* **New Hires** (`new_hires`): Indicates an accession that did not exist in **any** observed prior quarter. Note that a recall after more than one year away from the job will be defined here as a new hire.
* **Recalls** (`recalls`): Indicates an accession that **did** exist in some observed prior quarter. To be clear, this is a job that was absent in `m1` but present in `m4`, `m3`, or `m2`. 

In [None]:
import sqlalchemy
import math

In [None]:
##Find the right year and quarter, with i=0 corresponding to the 4th lag   
keyYr = 2010
keyQ = 2

# show quarter selection
for i in range(0,6):
    yr = int(keyYr - 1 + math.floor((keyQ+i-1)/4))
    q = int(keyQ + i - 4*math.floor((keyQ+i-1)/4))
    print('i={} | yr={} | q={}'.format(i, yr, q))

In [None]:
host = 'stuffed'
db = 'appliedda'

conn = sqlalchemy.create_engine('postgresql://{}/{}'.format(host, db))

In [None]:
import time

In [None]:
start_time = time.time()
sql = '''
CREATE TEMP TABLE qwi_cohort AS
SELECT * 
FROM il_des_kcmo.il_wage
WHERE year={} AND quarter={}
'''.format(keyYr, keyQ)

conn.execute(sql)
print('run in {:.2f} secs'.format(time.time()-start_time))

In [None]:
qwi_cols = ['emp_current_qrt','emp_4qtrs_ago','emp_3qtrs_ago','emp_2qtrs_ago' ,
            'emp_prev_qtr', 'emp_next_qtr','emp_begin_qtr','emp_end_qtr' ,
            'emp_full_qtr','accessions_current', 'accessions_consecutive_qtr' ,
            'accessions_full_qtr','separations','new_hires','recalls']

In [None]:
for col in qwi_cols:
    sql='''
    ALTER TABLE qwi_cohort ADD COLUMN {} int
    '''.format(col)
    conn.execute(sql);
    print('{} added'.format(col))

In [None]:
import pandas as pd

In [None]:
# df = pd.read_sql('select * from qwi_cohort limit 50', conn)

In [None]:
# print(df.columns.tolist())

In [None]:
# update current quarter employment flag
start_time = time.time()
sql='''
UPDATE qwi_cohort SET emp_current_qrt = 
    CASE WHEN wage > 0 THEN 1 ELSE 0 END
'''
conn.execute(sql)
print('complete in {:.2f} secs'.format(time.time()-start_time))

In [None]:
## test update

# keyYr = 2010
# keyQ = 2

# i=1
# col = 'emp_3qtrs_ago'

# yr = int(keyYr - 1 + math.floor((keyQ+i-1)/4))
# q = int(keyQ + i - 4*math.floor((keyQ+i-1)/4))

# # update this quarter employment flag
# sql='''
# UPDATE qwi_cohort a SET {} = 
#     CASE WHEN b.wage IS NOT NULL AND b.wage > 0 THEN 1 ELSE 0 END
# FROM il_des_kcmo.il_wage b
# WHERE b.year={} AND b.quarter={} --grab correct quarter
#     AND a.ssn=b.ssn AND a.ein=b.ein --ensure same job
#     AND a.empr_no=b.empr_no AND a.seinunit=b.seinunit
# '''.format(col, yr, q)
# print(sql)

In [None]:
start_time = time.time()
conn.execute(sql)
print('complete in {:.2f} secs'.format(time.time()-start_time))

In [None]:
# df = pd.read_sql('select * from qwi_cohort where emp_3qtrs_ago IS NOT NULL LIMIT 100', conn)

In [None]:
# df[['wage', 'emp_current_qrt', 'emp_4qtrs_ago', 'emp_3qtrs_ago', 'emp_2qtrs_ago']].head()

Did test work (did records return and is `emp_3qtrs_ago` populated)?

In [None]:
# order employment flags properly for 0-5 index below
emp_flags = ['emp_4qtrs_ago', 'emp_3qtrs_ago', 'emp_2qtrs_ago', 'emp_prev_qtr', 
             'skip_curr_qtr', 'emp_next_qtr']

# loop through an integer list
# 0 is 4th lag (4 quarters ago)

for i in range(0,6):
    start_time = time.time()
    if i==4:
        continue # skip study quarter as already in our cohort
#     print(emp_flags[i]) # test/debug

    # select this column
    col = emp_flags[i]

    yr = int(keyYr - 1 + math.floor((keyQ+i-1)/4))
    q = int(keyQ + i - 4*math.floor((keyQ+i-1)/4))

    # update this quarter employment flag
    sql='''
    UPDATE qwi_cohort a SET {} = 
        CASE WHEN b.wage IS NOT NULL AND b.wage > 0 THEN 1 ELSE 0 END
    FROM il_des_kcmo.il_wage b
    WHERE b.year={} AND b.quarter={} --grab correct quarter
        AND a.ssn=b.ssn AND a.ein=b.ein --ensure same job
        AND a.empr_no=b.empr_no AND a.seinunit=b.seinunit
    '''.format(col, yr, q)
    
    # update this column
    conn.execute(sql)
    
    print('completed {} in {:.2f} seconds'.format(col, time.time()-start_time))

### QWI stats
- emp_begin_qtr
- emp_end_qtr
- emp_full_qtr
- accessions_current
- accessions_consecutive_qtr
- accessions_full_qtr
- separations
- new_hires
- recalls

In [None]:
#Beginning of Quarter Employment
# res['qwbt'] = 1*((res['qwmtm1']==1) & (res['qwmt']==1))

sql = '''
UPDATE qwi_cohort a SET emp_begin_qtr = 
    CASE WHEN emp_prev_qtr = 1 AND emp_current_qrt = 1 THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#End of Quarter Employment
# res['qwet'] = 1*((res['qwmt']==1) & (res['qwmtp1']==1))

sql = '''
UPDATE qwi_cohort a SET emp_end_qtr = 
    CASE WHEN emp_current_qrt = 1 AND emp_next_qtr = 1 THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#Full Quarter Employment
# res['qwft'] = 1*((res['qwmtm1']==1) & (res['qwmt']==1) & (res['qwmtp1']==1))

sql = '''
UPDATE qwi_cohort a SET emp_full_qtr = 
    CASE WHEN emp_prev_qtr = 1 AND emp_current_qrt = 1 AND emp_next_qtr = 1 THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#Accessions
# res['qwat'] = 1*((res['qwmtm1']==0) & (res['qwmt']==1))

sql = '''
UPDATE qwi_cohort a SET accessions_current = 
    CASE WHEN emp_prev_qtr = 0 AND emp_current_qrt = 1THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#Accessions to Consecutive Quarter Status
# res['qwa2t'] = 1*((res['qwat']==1) & (res['qwmtp1']==1))

sql = '''
UPDATE qwi_cohort a SET accessions_consecutive_qtr = 
    CASE WHEN accessions_current = 1 AND emp_next_qtr = 1 THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#Accessions to Full Quarter Status
# res['qwa3t'] = 1*((res['qwmtm2']==0) & (res['qwmtm1']==1) & (res['qwmt']==1) 
#                   & (res['qwmtp1']==1))

sql = '''
UPDATE qwi_cohort a SET accessions_full_qtr = 
    CASE WHEN emp_2qtrs_ago = 0 AND emp_prev_qtr = 1 AND emp_current_qrt = 1 
        AND emp_next_qtr = 1 THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#Separations
# res['qwst'] = 1*((res['qwmt']==1) & (res['qwmtp1']==0))

''
sql = '''
UPDATE qwi_cohort a SET separations = 
    CASE WHEN emp_current_qrt = 1 AND emp_next_qtr = 0 THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#New Hires
# res['qwht'] = 1*((res['qwmtm4']==0) & (res['qwmtm3']==0) & (res['qwmtm2']==0) 
#                  & (res['qwmtm1']==0) & (res['qwmt']==1))

sql = '''
UPDATE qwi_cohort a SET new_hires = 
    CASE WHEN emp_4qtrs_ago = 0 AND emp_3qtrs_ago = 0 
        AND emp_2qtrs_ago = 0 AND emp_prev_qtr = 0
        AND emp_current_qrt = 1  THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
#Recalls
# res['qwrt'] = 1*((res['qwmtm1']==0) & (res['qwmt']==1) & (res['qwht']==0))

''
sql = '''
UPDATE qwi_cohort a SET recalls = 
    CASE WHEN emp_prev_qtr = 0 AND emp_current_qrt = 1 AND new_hires = 0 THEN 1 ELSE 0 END
'''
conn.execute(sql)

In [None]:
# read QWI for this study quarter into the notebook to explore
# NOTE: may not be able to pull the entire cohort as it is 7M

In [None]:
# create summary SQL code


col_list = ['emp_current_qrt','emp_4qtrs_ago','emp_3qtrs_ago','emp_2qtrs_ago' ,
            'emp_prev_qtr', 'emp_next_qtr','emp_begin_qtr','emp_end_qtr' ,
            'emp_full_qtr','accessions_current', 'accessions_consecutive_qtr' ,
            'accessions_full_qtr','separations','new_hires','recalls']
avgs = ','.join(['avg('+c+') avg_'+c for c in col_list])
sums = ','.join(['sum('+c+') sum_'+c for c in col_list])


summ_sql = """
SELECT ein, {}, {}
FROM qwi_cohort
GROUP BY ein
""".format(avgs, sums)

print(summ_sql)

In [None]:
df = pd.read_sql(summ_sql, conn)

In [None]:
df.head()