<center><br><br>
    <h4>TANF Data Collaborative </h4>
    <h4>Applied Data Analytics Training | Spring 2022</h4>
    <h1>Creating Employer Measures</h1>
</center>
<center>
    <span style="font-size: 1.5em;">
        <a href='https://www.coleridgeinitiative.org'>Coleridge Initiative</a>
    </span>
    <center>Benjamin Feder, Maryah Garner, Allison Nunez, Rukhshan Mian</center>
    <a href="https://doi.org/10.5281/zenodo.7459730"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.7459730.svg" alt="DOI"></a>

    
</center>

<br>

***

Here are the statistics we will find:
    
Our final output from this notebook is a permanent table with employer-level information aggregated to the calendar year for each employer with at least 5 employees in Indiana that exists in its UI wage records.

Firm characteristics 
- Total Payroll
- Total full quarter employment
- Total employment
- NAICS


Measures of stability
   - Number of new hires who become full quarter employees (hired in t-1 whom we see in t+1)
   - Ratio of full quarter employees to all employees
    
    
Measures of Opportunity
   - Number of new hires
   - Employment t – employment t-1/abs( employment at t +employment t-1)/2
    
 
Measures of Job Quality
- Average earnings per employee
- Average full quarter earnings per employee
- Earnings per employee at 25th percentile
- Earnings per employee at 75th percentile

In [None]:
options(warn=-1)

# Database interaction imports
suppressMessages(library(odbc))

# For data manipulation/visualization
suppressMessages(library(tidyverse))

# For faster date conversions
suppressMessages(library(lubridate))

# Use percent() function
suppressMessages(library(scales))

suppressMessages(library(zoo))
options(warn=0)


In [None]:
# Connect to the server
con <- DBI::dbConnect(odbc::odbc(),
                     Driver = "SQL Server",
                     Server = "msssql01.c7bdq4o2yhxo.us-gov-west-1.rds.amazonaws.com",
                     Trusted_Connection = "True")

## Function for getting the range of quarters between a start and end year-quarter combination

In [None]:
create_quarters <- function(start_yq, end_yq) {
    # converting start and end quarters to year-quarter format that R can understand
    d1 <- yq(start_yq)
    d2 <- yq(end_yq)
    
    # Getting the range between d1 and d2
    dat <- format(seq(d1, d2, by="quarter"), "%Y-%m")
    
    # converting resulting range to a year-quarter format
    q_yr_input <- as.yearqtr(dat, "%Y-%m") #from zoo
    df <- data.frame(q_yr_input)
    names(df) <- c("yr_quarter")

    df$qyr_req <- paste0(tolower(substring(df$yr_quarter, 6, 7)), "_", substring(df$yr_quarter, 1, 4))
    df$yrq = as.numeric(substring(q_yr_input, 1, 4))
    df$title = paste0(df$qyr_req, "_agg")

    return(df)
}

quarter_year <- create_quarters('2005 Q1', '2020 Q4')


In [None]:
head(quarter_year)

## Creating tables
We use a for loop to split up the UI Wage records by our quarters of interest. While doing so, we drop duplicate entries in wage records.

> Note that this is one way to approach duplicate data. There are other ways in handling duplicates such as keeping largest value, summing up wages. For the purpose of this notebook, we drop such instances. 


In [None]:
quarters <- quarter_year$yr_quarter
quarters_sql_save <- quarter_year$qyr_req
quarter_agg_save <- quarter_year$title

In [None]:
for(i in 1:length(quarters)){
    ptm <- proc.time()
    qry <- "with init_wages as (
                select *, CAST(Year as VARCHAR)  + ' Q' + CAST(Quarter as VARCHAR) as yr_quarter
                from ds_in_dwd.dbo.ui_wages
                where CAST(Year as VARCHAR)  + ' Q' + CAST(Quarter as VARCHAR) = '%s'
                and Empr_no is not null and Wage > 0
            ),
        dup as (
                select *, row_number() over (partition by SSN, Empr_no, yr_quarter order by yr_quarter) as rownumber_wg
                from init_wages
            )
                select Empr_no, ssn, Wage, yr_quarter
                into tr_tdc_2022.dbo.%s
                from dup
                where rownumber_wg = 1"
    full_qry = sprintf(qry, quarters[i], quarters_sql_save[i])
    print(paste(quarters[i], "done"))
    DBI::dbExecute(con, full_qry)
    print(proc.time() - ptm)
    }


    

In [None]:
# see example
qry = "
select top 5 * 
from tr_tdc_2022.dbo.q1_2020
"
dbGetQuery(con, qry)

In [None]:
# check_dup_df = dbGetQuery(con, 'SELECT * FROM ds_in_dwd.dbo.ui_wages WHERE Year = 2015 AND Quarter = 2')

## Creating columns for pre-post employment 

Then, we will add columns to track if each `Empr_no`/`ssn` combination within a given quarter exists in the wage record table the quarter before and/or the quarter after. This will be important in tracking full-quarter employment, as well as hiring and separation numbers.

In [None]:
# initialize pre and post employment columns
new_cols <- c('pre_emp', 'post_emp')

for(col in new_cols){
    for(i in 1:length(quarters)){
        qry='
        ALTER TABLE tr_tdc_2022.dbo."%s" ADD "%s" int
        '
        full_qry = sprintf(qry, quarters_sql_save[i], col)
        DBI::dbExecute(con, full_qry)
    }
}

In [None]:
# see example
qry = "
select top 5 * 
from tr_tdc_2022.dbo.q1_2020
"
dbGetQuery(con, qry)

After the `pre_emp` and `post_emp` columns are initialized in each of these temporary tables, we can set these as indicator variables if the `ssn`/`Empr_no` combination that appeared in the UI wage records for the given year/quarter combination also existed in the previous and future quarter.

### Updating pre-post employment columns

Account for edge case using the if condition

In [None]:
ptm <- proc.time()
for(i in 2:length(quarters)){
    # update this quarter employment flag
    qry='
    UPDATE tr_tdc_2022.dbo."%s" SET pre_emp = 
        CASE WHEN b.Wage is null THEN 0 ELSE 1 END
    FROM tr_tdc_2022.dbo."%s" b
    where tr_tdc_2022.dbo."%s".ssn = b.ssn and 
        tr_tdc_2022.dbo."%s".Empr_no = b.Empr_no
    '
    full_qry = sprintf(qry, quarters_sql_save[i], quarters_sql_save[i-1], quarters_sql_save[i], quarters_sql_save[i])
    DBI::dbExecute(con, full_qry)
#     writeLines(full_qry)
    print(paste0("Pre-emp for: ", quarters[i], " done"))
    print(proc.time() - ptm)
    }

In [None]:
for(i in 2:length(quarters)-1){
    # update this quarter employment flag
    qry='
    UPDATE tr_tdc_2022.dbo."%s" SET post_emp = 
        CASE WHEN b.Wage is null THEN 0 ELSE 1 END
    FROM tr_tdc_2022.dbo."%s" b
    where tr_tdc_2022.dbo."%s".ssn = b.ssn and 
        tr_tdc_2022.dbo."%s".Empr_no = b.Empr_no
    '
    full_qry = sprintf(qry, quarters_sql_save[i], quarters_sql_save[i+1], 
                       quarters_sql_save[i], quarters_sql_save[i])
    DBI::dbExecute(con, full_qry)
    paste0("Post-emp for: ", quarters_sql_save[i], " done")

    }

In [None]:
qry <- "
select top 5 *
from tr_tdc_2022.dbo.q4_2019
"
dbGetQuery(con, qry)

In [None]:
# see values of pre_emp
qry = "
select pre_emp, count(*)
from tr_tdc_2022.dbo.q4_2016 group by pre_emp
"
dbGetQuery(con, qry)

In [None]:
# see values of post_emp
qry = "
select post_emp, count(*)
from tr_tdc_2022.dbo.q4_2016 group by post_emp
"
dbGetQuery(con, qry)

Now that we have pre and post-quarter employment indicators for each `ssn`/`Empr_no` combination, we can add hiring and separation indicators into these tables.

## Creating columns that indicate separation or hiring
- separation: `sep`
- hiring: `hire`

In [None]:
new_cols <- c('sep', 'hire')

for(col in new_cols){
    for(i in 2:length(quarters_sql_save)){
        qry='
        ALTER TABLE tr_tdc_2022.dbo."%s" ADD "%s" int
        '
        full_qry = sprintf(qry, quarters_sql_save[i], col)
        DBI::dbExecute(con, full_qry)
    }
}

In [None]:
# take a peek at one of the tables
qry <- "
select top 5 *
from tr_tdc_2022.dbo.q4_2019
"
dbGetQuery(con, qry)

### Updating columns for `sep` and `hire`
- `sep` = 1 if someone isn't employed with the same employer in the next quarter
- `hire` = 1 if someone isn't employed with the same emplyer in the previous quarter

In [None]:
for(i in 2:length(quarters_sql_save)){
    qry='
    UPDATE tr_tdc_2022.dbo."%s" 
    SET 
        sep = CASE WHEN post_emp is null THEN 1 ELSE 0 END,
        hire = CASE WHEN pre_emp is null THEN 1 ELSE 0 END
    '
    full_qry = sprintf(qry, quarters_sql_save[i])
    DBI::dbExecute(con, full_qry)
}

In [None]:
# look at different values of sep
qry = '
select count(*), sep
from tr_tdc_2022.dbo.q4_2019 group by sep
'

dbGetQuery(con, qry)

In [None]:
# look at different values of sep
qry = '
select count(*), hire
from tr_tdc_2022.dbo.q4_2018 group by hire
'

dbGetQuery(con, qry)

## Aggregate by Employer

At this point, we have all the information we need to aggregate on the `Empr_no` values. We will do these aggregations in separate steps, as they will require separate `WHERE` clauses. In the first, we will find values for all measures outside of the full-quarter employee-related ones.

In [None]:
qry <- "
    select distinct top 5 Empr_no, yr_quarter,
    count(ssn) over(partition by Empr_no, yr_quarter) as num_employed,
    sum(Wage) over(partition by Empr_no, yr_quarter) as total_earnings,
    sum(hire) over(partition by Empr_no, yr_quarter) as num_hire,
    sum(sep) over(partition by Empr_no, yr_quarter) as num_sep,
    percentile_cont(0.25) within group (order by Wage) over (partition by Empr_no, yr_quarter) as bottom_25_pctile,
    percentile_cont(0.75) within group (order by Wage) over (partition by Empr_no, yr_quarter) as top_75_pctile
    from tr_tdc_2022.dbo.q1_2015
"
dbGetQuery(con, qry)

For later one for example

In [None]:
qry <- "
    select distinct top 5 Empr_no, yr_quarter,
    count(ssn) over(partition by Empr_no, yr_quarter) as num_employed,
    sum(Wage) over(partition by Empr_no, yr_quarter) as total_earnings,
    sum(hire) over(partition by Empr_no, yr_quarter) as num_hire,
    sum(sep) over(partition by Empr_no, yr_quarter) as num_sep,
    percentile_cont(0.25) within group (order by Wage) over (partition by Empr_no, yr_quarter) as bottom_25_pctile,
    percentile_cont(0.75) within group (order by Wage) over (partition by Empr_no, yr_quarter) as top_75_pctile
    from tr_tdc_2022.dbo.q1_2019
"
dbGetQuery(con, qry)

In a separate table, we can find all of the statistics related to full-quarter employment.

In [None]:
qry <- "
    select distinct top 5 Empr_no, yr_quarter,
    count(ssn) over(partition by Empr_no, yr_quarter) as full_num_employed,
    sum(Wage) over(partition by Empr_no, yr_quarter) as full_total_earnings
    from tr_tdc_2022.dbo.q1_2015
    where post_emp = 1 and pre_emp = 1
"
dbGetQuery(con, qry)

For later one for example

In [None]:
qry <- "
    select distinct top 5 Empr_no, yr_quarter,
    count(ssn) over(partition by Empr_no, yr_quarter) as full_num_employed,
    sum(Wage) over(partition by Empr_no, yr_quarter) as full_total_earnings
    from tr_tdc_2022.dbo.q4_2019
    where post_emp = 1 and pre_emp = 1
"
dbGetQuery(con, qry)

Finally, we need information on these employer's hiring, employment, and separation numbers for the prior quarter to calculate their growth rates.

In [None]:
qry <- "
    select top 5 Empr_no, yr_quarter,
    count(ssn) as num_employed_pre,
    sum(hire) as num_hire_pre,
    sum(sep) as num_sep_pre
    from tr_tdc_2022.dbo.q4_2014
    group by Empr_no, yr_quarter
"
dbGetQuery(con, qry)

For later one for example

In [None]:
qry <- "
    select top 5 Empr_no, yr_quarter,
    count(ssn) as num_employed_pre,
    sum(hire) as num_hire_pre,
    sum(sep) as num_sep_pre
    from tr_tdc_2022.dbo.q3_2019
    group by Empr_no, yr_quarter
"
dbGetQuery(con, qry)

Now that we have all the information we need in three tables, we can join them together based on the `Empr_no` values. 

For later one for example

In [None]:
qry <- "
with full_q as (
    select distinct Empr_no, yr_quarter,
    count(ssn) over(partition by Empr_no, yr_quarter) as full_num_employed_init,
    sum(Wage) over(partition by Empr_no, yr_quarter) as full_total_earnings_init
    from tr_tdc_2022.dbo.q4_2019
    where post_emp = 1 and pre_emp = 1
),
emp as (
    select distinct Empr_no, yr_quarter,
    count(ssn) over(partition by  Empr_no, yr_quarter) as num_employed,
    sum(Wage) over(partition by  Empr_no, yr_quarter) as total_earnings,
    sum(hire) over(partition by  Empr_no, yr_quarter) as num_hire,
    sum(sep) over(partition by  Empr_no, yr_quarter) as num_sep,
    percentile_cont(0.25) within group (order by Wage) over (partition by  Empr_no, yr_quarter) as top_25_pctile,
    percentile_cont(0.75) within group (order by Wage) over (partition by  Empr_no, yr_quarter) as top_75_pctile
    from tr_tdc_2022.dbo.q4_2019
),
tabs as (
    select emp.*, full_q.full_num_employed_init,
    full_q.full_total_earnings_init
    from emp
    left join full_q
    on emp.Empr_no = full_q.Empr_no and emp.yr_quarter = full_q.yr_quarter
    where emp.num_employed >= 5
)
select top 5 Empr_no, yr_quarter, num_employed, total_earnings, num_hire, num_sep, top_25_pctile, top_75_pctile, case 
    when full_num_employed_init is null then 0
    else full_num_employed_init end as full_num_employed,
case
    when full_total_earnings_init is null then 0
    else full_total_earnings_init end as full_total_earnings
from tabs
"
dbGetQuery(con, qry)

## Defining growth rates

To calculate the hiring, separation, and employment growth rates, we will use the following function from <a href='https://academic.oup.com/qje/article-abstract/107/3/819/1873525'>Davis and Haltiwanger (1992)</a> to calculate 1) employment growth rate: `emp_rate`; 2) separation growth rate: `sep_rate`; 3) hire growth rate: `hire_rate`.

$$ g_{et}=\frac{2(x_{et} - x_{e,t-1})}{(x_{et} + x_{e,t-1})} $$

In this function, $g_{et}$ represents employment/separation/hire growth rate of employer $e$ at time $t$. $x_{et}$ and $x_{e,t-1}$ are employer $e$'s employment/separation/hire at time $t$ and $t-1$, respectively. According to Davis and Haltiwanger (1992):

"*This growth rate measure is symmetric about zero, and it lies in the closed interval [-2,2] with deaths (births) corresponding to the left (right) endpoint. A virtue of this measure is that it facilitates an integrated treatment of births, deaths, and continuing establishments in the empirical analysis.*"

In other words, a firm with a $ g_{et} = 2$ is a new firm, while a firm with a $ g_{et} = -2$ is a a firm that exited the economy.
    
> Why do the two endpoints represent firms' deaths and births? Calculate the value of $g_{et}$ when $x_{et}=0$ and when $x_{e,t-1}=0$ and see what you get.

In practice, we will apply this formula for every `uiacct` unless it experienced no hires or separations in the current and previous quarters, where instead of getting a divide by zero error, we will assign it to 0.

## Calculating growth rates

In [None]:
qry <- "
with full_q as (
    select distinct Empr_no, yr_quarter,
    count(ssn) over(partition by Empr_no, yr_quarter) as full_num_employed_init,
    sum(Wage) over(partition by Empr_no, yr_quarter) as full_total_earnings_init
    from tr_tdc_2022.dbo.q4_2019
    where post_emp = 1 and pre_emp = 1
),
emp as (
    select distinct Empr_no, yr_quarter,
    count(ssn) over(partition by Empr_no, yr_quarter) as num_employed,
    sum(Wage) over(partition by Empr_no, yr_quarter) as total_earnings,
    sum(hire) over(partition by Empr_no, yr_quarter) as num_hire,
    sum(sep) over(partition by Empr_no, yr_quarter) as num_sep,
    percentile_cont(0.25) within group (order by Wage) over (partition by Empr_no, yr_quarter) as bottom_25_pctile,
    percentile_cont(0.75) within group (order by Wage) over (partition by Empr_no, yr_quarter) as top_75_pctile
    from tr_tdc_2022.dbo.q4_2019
),
tabs as (
    select emp.*, full_q.full_num_employed_init,
    full_q.full_total_earnings_init
    from emp
    left join full_q
    on emp.Empr_no = full_q.Empr_no and emp.yr_quarter = full_q.yr_quarter
    where emp.num_employed >= 5
),
joined as (
    select Empr_no, yr_quarter, num_employed, total_earnings, num_hire, num_sep, bottom_25_pctile, top_75_pctile, case 
        when full_num_employed_init is null then 0
        else full_num_employed_init end as full_num_employed,
    case
        when full_total_earnings_init is null then 0
        else full_total_earnings_init end as full_total_earnings
    from tabs
),
old_tabs as (
    select Empr_no,
    count(ssn) as num_employed_pre,
    sum(hire) as num_hire_pre,
    sum(sep) as num_sep_pre
    from tr_tdc_2022.dbo.q3_2019
    group by Empr_no
),
    hired as (
    select tmone.Empr_no,
    count(tmone.ssn) as new_hires_fullq
    from tr_tdc_2022.dbo.q3_2019 tmone
    join tr_tdc_2022.dbo.q4_2019 t on tmone.ssn = t.ssn and tmone.Empr_no = t.Empr_no
    where tmone.hire = 1 and t.post_emp = 1
    group by tmone.Empr_no
    )
select top 5 joined.Empr_no, joined.yr_quarter, joined.num_employed, joined.total_earnings, joined.bottom_25_pctile, 
    joined.top_75_pctile, joined.full_num_employed, joined.full_total_earnings, CAST(joined.full_num_employed AS FLOAT)/CAST(joined.num_employed AS FLOAT) as ratio_fullq_total, hired.new_hires_fullq,
    case 
    	when (old_tabs.num_employed_pre is null or old_tabs.num_employed_pre = 0) and joined.num_employed = 0 then 0
    	when old_tabs.num_employed_pre is null and joined.num_employed != 0 then 2
    	else (2.0 * (joined.num_employed - old_tabs.num_employed_pre))/(joined.num_employed + old_tabs.num_employed_pre) end as emp_rate,
    case
        when (old_tabs.num_hire_pre is null or old_tabs.num_hire_pre = 0) and joined.num_hire = 0 then 0
        when old_tabs.num_hire_pre is null and joined.num_hire != 0 then 2
        else (2.0 * (joined.num_hire - old_tabs.num_hire_pre))/(joined.num_hire + old_tabs.num_hire_pre) end as hire_rate, 
    case
        when (old_tabs.num_sep_pre is null or old_tabs.num_sep_pre = 0) and joined.num_sep = 0 then 0
        when old_tabs.num_sep_pre is null and joined.num_sep != 0 then 2
        else (2.0 * (joined.num_sep - old_tabs.num_sep_pre))/(joined.num_sep + old_tabs.num_sep_pre) end as sep_rate
from joined
left join old_tabs on joined.Empr_no = old_tabs.Empr_no
left join hired on joined.Empr_no = hired.Empr_no
"
dbGetQuery(con, qry)

In [None]:
for(i in 3:(length(quarters)-1)){
    qry = '
    with full_q as (
        select distinct Empr_no, yr_quarter,
        count(ssn) over(partition by Empr_no, yr_quarter) as full_num_employed_init,
        sum(Wage) over(partition by Empr_no, yr_quarter) as full_total_earnings_init
        from tr_tdc_2022.dbo."%s"
        where post_emp = 1 and pre_emp = 1
    ),
    emp as (
        select distinct Empr_no, yr_quarter,
        count(ssn) over(partition by Empr_no, yr_quarter) as num_employed,
        sum(Wage) over(partition by Empr_no, yr_quarter) as total_earnings,
        sum(hire) over(partition by Empr_no, yr_quarter) as num_hire,
        sum(sep) over(partition by Empr_no, yr_quarter) as num_sep,
        percentile_cont(0.25) within group (order by Wage) over (partition by Empr_no, yr_quarter) as bottom_25_pctile,
        percentile_cont(0.75) within group (order by Wage) over (partition by Empr_no, yr_quarter) as top_75_pctile
        from tr_tdc_2022.dbo."%s"
    ),
    tabs as (
        select emp.*, full_q.full_num_employed_init,
        full_q.full_total_earnings_init
        from emp
        left join full_q
        on emp.Empr_no = full_q.Empr_no and emp.yr_quarter = full_q.yr_quarter
        where emp.num_employed >= 5
    ),
    joined as (
        select Empr_no, yr_quarter, num_employed, total_earnings, num_hire, num_sep, bottom_25_pctile, top_75_pctile, 
        case 
            when full_num_employed_init is null then 0
            else full_num_employed_init end as full_num_employed,
        case
            when full_total_earnings_init is null then 0
            else full_total_earnings_init end as full_total_earnings
        from tabs
    ),
    old_tabs as (
        select Empr_no, yr_quarter,
        count(ssn) as num_employed_pre,
        sum(hire) as num_hire_pre,
        sum(sep) as num_sep_pre
        from tr_tdc_2022.dbo."%s"
        group by Empr_no, yr_quarter
    ),
    hired as (
    select tmone.Empr_no,
    count(tmone.ssn) as new_hires_fullq
    from tr_tdc_2022.dbo.%s tmone
    join tr_tdc_2022.dbo.%s t on tmone.ssn = t.ssn and tmone.Empr_no = t.Empr_no
    where tmone.hire = 1 and t.post_emp = 1
    group by tmone.Empr_no
    )
    select joined.Empr_no, joined.yr_quarter, joined.num_employed, joined.total_earnings, joined.bottom_25_pctile, 
        joined.top_75_pctile, joined.full_num_employed, joined.full_total_earnings, CAST(joined.full_num_employed AS FLOAT)/CAST(joined.num_employed AS FLOAT) as ratio_fullq_total, 
        hired.new_hires_fullq,
        case 
            when (old_tabs.num_employed_pre is null or old_tabs.num_employed_pre = 0) and joined.num_employed = 0 then 0
            when old_tabs.num_employed_pre is null and joined.num_employed != 0 then 2
            else (2.0 * (joined.num_employed - old_tabs.num_employed_pre))/(joined.num_employed + old_tabs.num_employed_pre) end as emp_rate,
        case
            when (old_tabs.num_hire_pre is null or old_tabs.num_hire_pre = 0) and joined.num_hire = 0 then 0
            when old_tabs.num_hire_pre is null and joined.num_hire != 0 then 2
            else (2.0 * (joined.num_hire - old_tabs.num_hire_pre))/(joined.num_hire + old_tabs.num_hire_pre) end as hire_rate, 
        case
            when (old_tabs.num_sep_pre is null or old_tabs.num_sep_pre = 0) and joined.num_sep = 0 then 0
            when old_tabs.num_sep_pre is null and joined.num_sep != 0 then 2
            else (2.0 * (joined.num_sep - old_tabs.num_sep_pre))/(joined.num_sep + old_tabs.num_sep_pre) end as sep_rate
    into tr_tdc_2022.dbo.%s
    from joined
    left join old_tabs on joined.Empr_no = old_tabs.Empr_no
    left join hired on joined.Empr_no = hired.Empr_no
    '
    full_qry = sprintf(qry, quarters_sql_save[i], quarters_sql_save[i], 
                       quarters_sql_save[i-1], quarters_sql_save[i-1], 
                       quarters_sql_save[i], quarter_agg_save[i])
    DBI::dbExecute(con, full_qry)
    }

### Example

In [None]:
# example for 2015
qry <- "
with tb_2015 as(
    select *, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q1_2015_agg
    union all
    select *, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q2_2015_agg
    union all
        select *, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q3_2015_agg
    union all
        select *, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q4_2015_agg
)
select top 5 Empr_no,
count(*) as num_quarters, avg(num_employed) as avg_num_employed,
avg(cast(total_earnings as bigint)) as avg_total_earnings,
avg(bottom_25_pctile) as avg_bottom_25_pctile,
avg(top_75_pctile) as avg_top_75_pctile,
avg(full_num_employed) as avg_full_num_employed,
avg(cast(full_total_earnings as bigint)) as avg_full_total_earnings,
avg(emp_rate) as avg_emp_rate,
avg(hire_rate) as avg_hire_rate,
avg(sep_rate) as avg_sep_rate, 
avg(avg_earnings) as avg_avg_earnings,
avg(full_avg_earnings) as avg_full_avg_earnings
from tb_2015
group by Empr_no
"
dbGetQuery(con, qry)

In [None]:
# full code
qry <- "
with tb_agg as(
    select *, 2015 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q1_2015_agg
    union all
    select *, 2015 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q2_2015_agg
    union all
    select *, 2015 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q3_2015_agg
    union all
    select *, 2015 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q4_2015_agg
    union all
    select *, 2016 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q1_2016_agg
    union all
    select *, 2016 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q2_2016_agg
    union all
    select *, 2016 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q3_2016_agg
    union all
    select *, 2016 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q4_2016_agg
    union all
    select *, 2017 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q1_2017_agg
    union all
    select *, 2017 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q2_2017_agg
    union all
    select *, 2017 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q3_2017_agg
    union all
    select *, 2017 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q4_2017_agg
    union all
    select *, 2018 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q1_2018_agg
    union all
    select *, 2018 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q2_2018_agg
    union all
    select *, 2018 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q3_2018_agg
    union all
    select *, 2018 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q4_2018_agg
    union all
    select *, 2019 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q1_2019_agg
    union all
    select *, 2019 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q2_2019_agg
    union all
    select *, 2019 as year,total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q3_2019_agg
    union all
    select *, 2019 as year, total_earnings/num_employed as avg_earnings, case 
        when full_num_employed = 0 then 0
        else full_total_earnings/full_num_employed 
        end as full_avg_earnings
    from tr_tdc_2022.dbo.q4_2019_agg
)
select Empr_no, year,
count(*) as num_quarters, avg(num_employed) as avg_num_employed,
avg(cast(total_earnings as bigint)) as avg_total_earnings,
avg(bottom_25_pctile) as avg_bottom_25_pctile,
avg(top_75_pctile) as avg_top_75_pctile,
avg(full_num_employed) as avg_full_num_employed,
avg(cast(full_total_earnings as bigint)) as avg_full_total_earnings,
avg(emp_rate) as avg_emp_rate, 
avg(hire_rate) as avg_hire_rate, 
avg(sep_rate) as avg_sep_rate,
avg(avg_earnings) as avg_avg_earnings,
avg(full_avg_earnings) as avg_full_avg_earnings
into tr_tdc_2022.dbo.employer_yearly_agg
from tb_agg
group by Empr_no, year
"
DBI::dbExecute(con, qry)

In [None]:
qry <- "
select year, count(*)
from tr_tdc_2022.dbo.employer_yearly_agg
group by year 
"
dbGetQuery(con, qry)

# Creating a consolidated query to combine our aggregated tables

In [None]:
string = "
select *, %s as year, 
total_earnings/num_employed as avg_earnings, 
case when full_num_employed = 0 then 0 else full_total_earnings/full_num_employed end as full_avg_earnings 
from tr_tdc_2022.dbo.%s_agg"

In [None]:
create_quarters <- function(start_yq, end_yq) {
    d1 <- yq(start_yq)
    d2 <- yq(end_yq)
    
    dat <- format(seq(d1, d2, by="quarter"), "%Y-%m")
    q_yr_input <- as.yearqtr(dat, "%Y-%m") #from zoo
    df <- data.frame(q_yr_input)
    names(df) <- c("yr_quarter")

    df$qyr_req <- paste0(tolower(substring(df$yr_quarter, 6, 7)), "_", substring(df$yr_quarter, 1, 4))
    df$yrq = as.numeric(substring(q_yr_input, 1, 4))
    df$title = paste0(df$qyr_req, "_agg")

    return(df)
}

quarter_year <- create_quarters('2005 Q1', '2020 Q4')
head(quarter_year)

In [None]:
end_qry = "select Empr_no, year,
count(*) as num_quarters, avg(num_employed) as avg_num_employed,
avg(cast(total_earnings as bigint)) as avg_total_earnings,
avg(bottom_25_pctile) as avg_bottom_25_pctile,
avg(top_75_pctile) as avg_top_75_pctile,
avg(full_num_employed) as avg_full_num_employed,
avg(cast(full_total_earnings as bigint)) as avg_full_total_earnings,
avg(emp_rate) as avg_emp_rate, 
avg(hire_rate) as avg_hire_rate, 
avg(sep_rate) as avg_sep_rate,
avg(avg_earnings) as avg_avg_earnings,
avg(full_avg_earnings) as avg_full_avg_earnings
into tr_tdc_2022.dbo.employer_yearly_agg
from tdc_comb
group by Empr_no, year"

In [None]:
quarters = quarter_year$title
yr = quarter_year$yrq

start_string = "with tdc_comb as ( "
for(i in 3:(length(quarter_year$title)-1)){
    query ="select *, %s as year, total_earnings/num_employed as avg_earnings, 
        case when full_num_employed = 0 then 0 else full_total_earnings/full_num_employed end as full_avg_earnings 
    from tr_tdc_2022.dbo.%s"
    full_qry = sprintf(query, yr[i], quarters[i])
    if (i == 3) {
        start_string = paste0(start_string, full_qry)        
    }
    else if (i == length(quarter_year$title)-1) {
        start_string = paste0(start_string, '\n UNION ALL \n', full_qry, '\n )', '\n', end_qry)        
    }
    else {
        start_string = paste0(start_string, ' \n UNION ALL \n', full_qry)
    }
}

writeLines(start_string)


In [None]:
DBI::dbExecute(con, start_string)