# Summarizing the Joined UI Wage Records File (UIFULLV) to One Record Per Person Per Quarter

Note to users: please read the instructions file in this folder (00_instructions) before using this Jupyter Notebook file.

## SQL Database Connection

This section loads needed packages and connects the IPython Jupyter Notebook to the SQL database. If you are running this code in your own environment, remember to modify the SQL connection string to route the notebook to your own SQL server and database (see the 00_instructions file in this folder for more information). Our code uses the SQLALchemy Python package to interface between python and SQL languages, and uses Jupyter SQL ‘magic’ functions to make the code more concise.

In [10]:
# load sqlalchemy package to interface between Python and SQL databases 
import sqlalchemy

# Replace the SQL connection string below (in quotation marks) with your own SQL connection information 
connection_string = "mssql+pyodbc://@TDI"

# Create the engine connecting to the database server
sqlalchemy.create_engine(connection_string)

# Load the ipython-sql library to use Jupyter ‘magic’ functions, which make your code more concise 
%load_ext sql

# Connect to the database server
%sql $connection_string

The sql extension is already loaded. To reload it, use:
  %reload_ext sql



## Purpose 
In the previous notebook (02_link_TANF_UI), we joined our TANF sample definition file to our UI wage data and created a person-quarter-level view of the data, called UIFULLV. The purpose of this notebook is to create a new view with only one record per person per quarter from UIFULLV. For each person, we will create a measure of total earnings in each quarter by summing records if the person has multiple employers in a quarter, and create a count of the number of employers the person has in each quarter. We will also look at quarterly earnings relative to the person’s participation in the program.on.

1. First, we create a “relative quarter” column that represents the number of quarters that have passed after a specific reference date (in this case, after someone started a program)  In the code below, the relative quarter represents the number of quarters that have elapsed since participants started the program.

2. Then, we sum the number of employers and earnings per quarter for each person on the data file. For example, if someone worked for 2 different employers in the same quarter, we sum the wage amounts and count the employer IDs so that for each quarter, we know a person’s total earnings and how many jobs they held.

3. For people with no earnings in a quarter, we assume that they were not employed in a UI-covered job for that quarter. People who have a positive earnings amount were employed at some point during the quarter. We use that information to create a 0/1 employment indicator column, where 0 indicates no employment and 1 indicates that the person was employed during the quarter.



#### Before creating the view, take a look at a few examples of individuals who worked for 2 employers in a quarter in 2017.
1. The first client worked 2 jobs in quarter 1 of 2017.
2. The second client worked 2 jobs in quarter 3 of 2017.

In [11]:
%%sql

SELECT TOP 10 * FROM UIFULLV
WHERE YEAR(EarnQTR) =2017 and SSN IN  -- use the 2017 wage records as an example

(SELECT SSN -- this subquery identifies SSNs for clients with 2 employers in the same quarter 
FROM dbo.UIFULLV
WHERE YEAR(EarnQTR) = 2017
GROUP BY SSN, EarnQTR
HAVING COUNT(*) >=2) -- select the SSNs for clients with 2 or more records in any quarter

ORDER BY SSN, EarnQTR
;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,WAGES,empid
107200448,2017-01-06,2017-07-05,2017Q1,2017-01-01,11349,481072004.0
107200448,2017-01-06,2017-07-05,2017Q1,2017-01-01,13894,448107200.0
107200448,2017-01-06,2017-07-05,2017Q2,2017-04-01,0,
107200448,2017-01-06,2017-07-05,2017Q3,2017-07-01,8574,448107200.0
107200448,2017-01-06,2017-07-05,2017Q4,2017-10-01,0,
109000560,2017-06-07,2017-12-04,2017Q1,2017-01-01,0,
109000560,2017-06-07,2017-12-04,2017Q2,2017-04-01,1121,560109000.0
109000560,2017-06-07,2017-12-04,2017Q3,2017-07-01,11331,560109000.0
109000560,2017-06-07,2017-12-04,2017Q3,2017-07-01,5070,601090005.0
109000560,2017-06-07,2017-12-04,2017Q4,2017-10-01,0,


In [12]:
%%sql
-- this code ALLOWS THIS PROGRAM to be rerun if there is a problem with the view and it needs to be removed and recreated
DROP VIEW IF EXISTS dbo.UIQuarterlyMeasuresV;


 * mssql+pyodbc://@TDI
Done.


[]

#### For each quarter in our follow-up, this code creates a single record for an individual, summarizing the information across all their records for that quarter.

A relative quarter count is created for each quarterly variable, which reflects the number of quarters that have elapsed since the individual’s program start date. This code segment then creates 1 record for an individual for each quarter, summing the individual's earnings across all records for the quarter, setting an employment indicator (0/1) depending on whether the person had earnings, and counts the number of different employers the person had. 

In [13]:
%%sql
CREATE VIEW  dbo.UIQuarterlyMeasuresV AS

-- The purpose of this code is to calculate total earnings and an employment indicator for each person in each quarter of 
-- data that is covered in the wage data file. The quarters are converted to periods relative to the person’s start date 
-- in the program.  

SELECT  SSN, 
        ProgStart, 
        ProgEnd,
        YR_QTR, 
        EarnQTR,
        MAX(DATEDIFF(QUARTER, ProgStart, EarnQTR)) 
            AS RelativeQTR, -- Calculate how many quarters fall between the person’s start date and the quarter on each record
        SUM(WAGES) AS QTR_Earnings, -- ADD UP WAGES FROM ALL JOBS THIS QUARTER
        MAX(CASE WHEN WAGES > 0 THEN 1 ELSE 0 END) 
            AS QTR_EMPLOYED, -- if WAGES=0, we assume that this person was not employed. If WAGES GT 0, we assume the person is employed
        COUNT(DISTINCT empid) as QTR_NUMEMPLOYERS -- Count each distinct employer ID to calculate number of employers per quarter

FROM dbo.UIFULLV

GROUP BY SSN, ProgStart, ProgEnd, YR_QTR, EarnQTR
;


 * mssql+pyodbc://@TDI
Done.


[]

#### Checking the creation of the quarterly measures.

1. Note that the first person started in the program in quarter 1 of 2017, so 2017Q1 is relative quarter 0; 2017Q2 is relative quarter 1, and so on.

2. Note that the second person started in the program in quarter 2 of 2017, so for this person,  2017Q2 is relative quarter 0, 2017Q3 is relative quarter 1, and so on.

3. Note that the quarter before the person started in the program is designated relative quarter -1.


In [14]:
%%sql
SELECT TOP 10 *
FROM dbo.UIQuarterlyMeasuresV
WHERE YEAR(EarnQTR) =2017 and SSN IN 
    (SELECT SSN
    FROM dbo.UIFULLV
    WHERE YEAR(EarnQTR) = 2017
    GROUP BY SSN, EarnQTR
    HAVING COUNT(*) >=2)
ORDER BY SSN, YR_QTR, EarnQTR
;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,RelativeQTR,QTR_Earnings,QTR_EMPLOYED,QTR_NUMEMPLOYERS
107200448,2017-01-06,2017-07-05,2017Q1,2017-01-01,0,25243,1,2
107200448,2017-01-06,2017-07-05,2017Q2,2017-04-01,1,0,0,0
107200448,2017-01-06,2017-07-05,2017Q3,2017-07-01,2,8574,1,1
107200448,2017-01-06,2017-07-05,2017Q4,2017-10-01,3,0,0,0
109000560,2017-06-07,2017-12-04,2017Q1,2017-01-01,-1,0,0,0
109000560,2017-06-07,2017-12-04,2017Q2,2017-04-01,0,1121,1,1
109000560,2017-06-07,2017-12-04,2017Q3,2017-07-01,1,16401,1,2
109000560,2017-06-07,2017-12-04,2017Q4,2017-10-01,2,0,0,0
115300952,2017-06-13,2017-12-10,2017Q1,2017-01-01,-1,0,0,0
115300952,2017-06-13,2017-12-10,2017Q2,2017-04-01,0,251,1,1


#### Look at employment and earnings trends across calendar quarters.

Percent employed and average quarterly earnings might be expected to correlate to some degree with local seasonal employment trends.

In [15]:
%%sql
SELECT  YR_QTR, 
        count(*) as NumberOfClients,
        avg(QTR_Earnings) as MeanEarnings,
        avg(QTR_EMPLOYED*100) as PercentEmp,
        avg(CASE QTR_Earnings WHEN 0 THEN NULL ELSE QTR_Earnings END) AS EmployedWageAvg
          
FROM dbo.UIQuarterlyMeasuresV
GROUP BY YR_QTR
ORDER BY YR_QTR;

 * mssql+pyodbc://@TDI
Done.


YR_QTR,NumberOfClients,MeanEarnings,PercentEmp,EmployedWageAvg
2017Q1,1012,4219,48,8731
2017Q2,1012,4049,48,8278
2017Q3,1012,3882,47,8236
2017Q4,1012,3804,46,8156
2018Q1,1012,3920,45,8605
2018Q2,1012,4070,48,8338
2018Q3,1012,4046,46,8713
2018Q4,1012,4013,49,8058
2019Q1,1012,3779,46,8120
2019Q2,1012,3945,46,8405


#### Look at employment and earnings trends across relative quarters. 

Since people have different program start dates, the same calendar quarters of earnings can represent different relative periods. In our dataset, everyone has data for the quarter they started participating in the program and at least one quarter beyond that (relative quarters 0 and 1), with less data prior to their program participation, as well as fewer quarters of data beyond their program start date. Statistics (percent employed and average quarterly earnings) calculated for periods where data are available for the full sample would be expected to be more stable and consistent, and less consistent for the quarters where data are not available for the full sample (since outliers affect averages more when sample sizes are small).

In [16]:
%%sql
SELECT  RelativeQTR, 
        count(*) as NumberOfClients,
        avg(QTR_Earnings) as MeanEarnings,
        avg(QTR_EMPLOYED*100) as PercentEmp,
        avg(CASE QTR_Earnings WHEN 0 THEN NULL ELSE QTR_Earnings END) AS EmployedWageAvg
          
FROM dbo.UIQuarterlyMeasuresV
GROUP BY RelativeQTR
ORDER BY RelativeQTR;

 * mssql+pyodbc://@TDI
Done.


RelativeQTR,NumberOfClients,MeanEarnings,PercentEmp,EmployedWageAvg
-15,18,840,22,3780
-14,36,4071,44,9160
-13,58,5144,56,9042
-12,92,4491,55,8102
-11,126,4405,51,8540
-10,159,3959,46,8508
-9,193,5162,55,9226
-8,226,4872,54,8880
-7,326,4824,55,8738
-6,451,4286,50,8553
