# Relative Annual Measures: What happened after program enrollment?

In [1]:
# this code loads needed packages and connect to database.
# load sqlalchemy package
import sqlalchemy

# Define connection string (Projects is the corpmdrc DSN)
connection_string = "mssql+pyodbc://@TDI"

# Create the engine connecting to the database server
sqlalchemy.create_engine(connection_string)

# Load sql magicks 
%load_ext sql

# Connect to the database server
%sql $connection_string

## Purpose:

Analysis questions often focus on what happened after the client enrolled in the program. This code will create a number of measures using the data for the 4 quarters AFTER the client started in the program.  For example:

1. Was the client employed at any time in the year after program enrollment?
2. How many quarters was the client employed (or percent of quarters employed) in the year after enrollment?
3. How much did the client earn during the year after enrollement?   
4. How many quarters did the client earn more than $3500 in the year after enrolling in the program?
5. What was the change in earnings over time for our clients?

### Starting Point

Our source file for these relative measures already has a record for every possible quarter a person could be employed. Each record has information about the earnings reported, a yes/no indicator of employment, and the number of employers who reported earnings for the person during the quarter. We will keep and use only the quarter just before program enrollment, the quarter of program enrollment, and 4 quarters after program enrollment. We have already created a column (RelativeQTR) that indicates how close each quarter is to the client's program enrollment, but we need to add a column that groups the quarters in years because we want "relative yearly" measures.

Let's take a look at the transaction file that we will use as input and view all records for the first client (17 total) and the first record for the second client. Notice:
1. The first client below enrolled in the program in quarter 2 of 2017 (06/01/2017)
2. Earnings reported in quarter 1 of 2017 (1 quarter prior to enrollment) are assigned the RelativeQTR value of -1
3. Earnings reported in quarter 2 of 2017 (the first quarter enrolled) are assigned the RelativeQTR value of 0
4. The second client below enrolled in the program in quarter 1 of 2017
5. Quarter 1 of 2017 is the first quarter of UI wage data we received. Therefore, we have no data for the quarter before this client started the program and there is no record with RelativeQTR value -1 for the second client.

In [2]:
%%sql
CREATE or ALTER VIEW RelYearV as
SELECT *,
(((
    (CASE WHEN RelativeQTR >0 then RelativeQTR END)
  -1)/4)+1) as RelYear -- add column that indicates relative year the quarter belongs to
FROM UIQuarterlyMeasuresV;

SELECT TOP 18 * -- print records for first case
FROM RelYearV
order by SSN, YR_QTR;

 * mssql+pyodbc://@TDI
Done.
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,RelativeQTR,QTR_Earnings,QTR_EMPLOYED,QTR_NUMEMPLOYERS,RelYear
100000000,2017-06-01,2017-11-28,2017Q1,2017-01-01,-1,9214,1,1,
100000000,2017-06-01,2017-11-28,2017Q2,2017-04-01,0,8561,1,1,
100000000,2017-06-01,2017-11-28,2017Q3,2017-07-01,1,12550,1,1,1.0
100000000,2017-06-01,2017-11-28,2017Q4,2017-10-01,2,0,0,0,1.0
100000000,2017-06-01,2017-11-28,2018Q1,2018-01-01,3,0,0,0,1.0
100000000,2017-06-01,2017-11-28,2018Q2,2018-04-01,4,0,0,0,1.0
100000000,2017-06-01,2017-11-28,2018Q3,2018-07-01,5,0,0,0,2.0
100000000,2017-06-01,2017-11-28,2018Q4,2018-10-01,6,0,0,0,2.0
100000000,2017-06-01,2017-11-28,2019Q1,2019-01-01,7,3804,1,1,2.0
100000000,2017-06-01,2017-11-28,2019Q2,2019-04-01,8,17082,1,1,2.0


###  First, a word about data coverage and missing data

Remember that the last quarter of UI Wage data that we received is Q1 2021. We do not have data for Q2 2021 for any client.  Let's take a look at the data for the last clients to enroll in our program. Notice:
1. This client started in our program in December 2020.
2. We only have information about what happend the quarter this client started in our program and 1 quarter after program start.
4. Therefore, any measures we create after relative quarter 1 should exclude this client from the calculations because the data is incomplete.

In [3]:
%%sql
SELECT *
FROM RelYearV
where ProgStart=  -- select the client with a start date equal to the last start date on the file
    (select MAX(ProgStart)
     FROM RelYearV) -- finding the last program start date on the file
;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,RelativeQTR,QTR_Earnings,QTR_EMPLOYED,QTR_NUMEMPLOYERS,RelYear
919951016,2020-12-26,,2017Q1,2017-01-01,-15,0,0,0,
919951016,2020-12-26,,2017Q2,2017-04-01,-14,841,1,1,
919951016,2020-12-26,,2017Q3,2017-07-01,-13,9851,1,1,
919951016,2020-12-26,,2017Q4,2017-10-01,-12,0,0,0,
919951016,2020-12-26,,2018Q1,2018-01-01,-11,6340,1,1,
919951016,2020-12-26,,2018Q2,2018-04-01,-10,5243,1,1,
919951016,2020-12-26,,2018Q3,2018-07-01,-9,0,0,0,
919951016,2020-12-26,,2018Q4,2018-10-01,-8,10271,1,1,
919951016,2020-12-26,,2019Q1,2019-01-01,-7,0,0,0,
919951016,2020-12-26,,2019Q2,2019-04-01,-6,223,1,1,


#### Careful: summary measures will be calculated if ANY data is available 

1. To demonstrate what happens when we create summary measures including missing data, we created just 1 measure below: total earnings for year 1, and did so just for the last client enrolled. 
2. We only keep records that are for relative quarters 1 through 4 (the 4 quarters after the enrollment quarter).
3. We sum earnings for those quarters by SSN.
4. Notice that total earnings for year 1 after enrollment is calculated even though this client has only 1 quarter out of the 4

In [4]:
%%sql
SELECT 
SSN, RelYear,

   SUM(QTR_Earnings) AS EARNy  -- total earning in year 1 after start

FROM RelYearV /* using quarter table view */

where RelativeQTR >= 1 and -- we only want to include records for the quarters AFTER the program enrollment quarter
    ProgStart=  (select MAX(ProgStart)
     FROM RelYearV)  -- we just want look at the last client enrolled as an example of missing data
    
GROUP BY SSN,RelYear /* group by is SQL way of reducing multiple record to 1 record per person */
;

 * mssql+pyodbc://@TDI
Done.


SSN,RelYear,EARNy
919951016,1,0


### Controlling the calculations by checking for the maximum number of quarters of data the client has 

There are lots for ways of managing calculations for those who don't have enough data. If you want every client to have a record in your result set, even though it will have NULL values, you can do this by testing that the count of the number of records is 4 AS you are coding your summary calculations (CASE WHEN COUNT(\*)=4 THEN 1 END). If the count is not 4 then the expression returns NULL. This approach is demonstrated below.

In [5]:
%%sql
CREATE or ALTER VIEW RelYearSumV as
SELECT SSN, RelYear,

/* relative year measures*/
-- employment measures
MAX(QTR_Employed)*(CASE WHEN COUNT(*)=4 THEN 1 END) AS EMPy, -- was employed sometime in year 1
SUM(QTR_Employed)*(CASE WHEN COUNT(*)=4 THEN 1 END) AS KEMPy, -- number of quarter employed
AVG(CAST(QTR_Employed AS DECIMAL(4,2)))*(CASE WHEN COUNT(*)=4 THEN 1 END) AS PerQEMPy, -- Percent of quarter employed
    
-- earnings measures
SUM(QTR_Earnings)*(CASE WHEN COUNT(*)=4 THEN 1 END) AS EARNy, -- total earning in year 1 after start
SUM(CASE WHEN QTR_Earnings >= 3500 THEN 1 ELSE 0 END)*(CASE WHEN COUNT(*)=4 THEN 1 END)
AS KQGT3500y, -- Number of QTRs w/earning ge 3500

(SUM(CASE WHEN QTR_Earnings >= 3500 THEN 1.0 ELSE 0 END)/4)*(CASE WHEN COUNT(*)=4 THEN 1 END)
AS PerQGT3500y, -- Percent of QTRs w/earning ge 3500

-- creating  measures of how much data relative data a client has
COUNT(*) AS NumQTR,
MAX( RelativeQTR) AS LastRelQTR

FROM RelYearV /* using quarter table view */
where RelativeQTR >=1
GROUP BY SSN, RelYear; /* group by is SQL way of reducing multiple record to 1 record per person */

SELECT TOP 8 *
FROM RelYearSumV
ORDER BY SSN, RelYear;

 * mssql+pyodbc://@TDI
Done.
Done.


SSN,RelYear,EMPy,KEMPy,PerQEMPy,EARNy,KQGT3500y,PerQGT3500y,NumQTR,LastRelQTR
100000000,1,1.0,1.0,0.25,12550.0,1.0,0.25,4,4
100000000,2,1.0,2.0,0.5,20886.0,2.0,0.5,4,8
100000000,3,1.0,1.0,0.25,1881.0,0.0,0.0,4,12
100000000,4,,,,,,,3,15
100900056,1,1.0,2.0,0.5,15363.0,2.0,0.5,4,4
100900056,2,1.0,1.0,0.25,7582.0,1.0,0.25,4,8
100900056,3,1.0,3.0,0.75,13006.0,2.0,0.5,4,12
100900056,4,1.0,1.0,0.25,8172.0,1.0,0.25,4,16


### FLATTEN FILE WITH GROUP BY and CASE: Relative Year 1 and Year 2 Measures

1. We reduce the multiple records (displayed above) for an SSN to 1 record for the SSN by using the GROUP BY statement. The GROUP BY statement requires the use of an aggregate function so we use SUM.

2. The CASE statement determines if the record is for the year of interest and if so assigns the value to the column. If the record is not for the quarters of interest the record is skipped.  

3. We use the relative calendar indicator (RelativeQTR) in the WHERE statement to select the rows that we want included in the calculation. As mentioned above, we do not want to include the quarter that the client started in the program in these measures so we start with relative quarter 1.

In [6]:
%%sql
CREATE or ALTER VIEW FlatUIV as
SELECT SSN,

/* relative year measures*/
-- employment measures
SUM(CASE RelYear WHEN 1 THEN EMPy END) AS EMPy1, -- was employed sometime in year 1
SUM(CASE RelYear WHEN 1 THEN KEMPy END) AS KEMPy1, -- number of quarter employed
SUM(CASE RelYear WHEN 1 THEN PerQEMPy END) AS PQEMPy1, -- Percent of quarter employed
    
-- earnings measures
SUM(CASE RelYear WHEN 1 THEN EARNy END) AS EARNy1, -- total earning in year 1 after start
SUM(CASE RelYear WHEN 1 THEN KQGT3500y END) AS KQGT3500y1, -- Number of QTRs w/earning ge 3500
SUM(CASE RelYear WHEN 2 THEN PerQGT3500y END) AS PQGT3500y1, -- Number of QTRs w/earning ge 3500


/* relative year measures*/
-- employment measures
SUM(CASE RelYear WHEN 2 THEN EMPy END) AS EMPy2, -- was employed sometime in year 2
SUM(CASE RelYear WHEN 2 THEN KEMPy END) AS KEMPy2, -- number of quarter employed
SUM(CASE RelYear WHEN 2 THEN PerQEMPy END) AS PQEMPy2, -- Percent of quarter employed
    
-- earnings measures
SUM(CASE RelYear WHEN 2 THEN EARNy END) AS EARNy2, -- total earning in year 2 after start
SUM(CASE RelYear WHEN 2 THEN KQGT3500y END) AS KQGT3500y2, -- Number of QTRs w/earning ge 3500
SUM(CASE RelYear WHEN 2 THEN PerQGT3500y END) AS PQGT3500y2, -- Number of QTRs w/earning ge 3500

-- keep track of how much quarterly data each person had
MAX(LastRelQTR) AS LastQTR

FROM RelYearSumV /* using quarter table view */
GROUP BY SSN;/* group by is SQL way of reducing multiple record to 1 record per person */

SELECT TOP 10 *
FROM FlatUIV
ORDER BY SSN;


 * mssql+pyodbc://@TDI
Done.
Done.


SSN,EMPy1,KEMPy1,PQEMPy1,EARNy1,KQGT3500y1,PQGT3500y1,EMPy2,KEMPy2,PQEMPy2,EARNy2,KQGT3500y2,PQGT3500y2,LastQTR
100000000,1,1,0.25,12550,1,0.5,1,2,0.5,20886,2,0.5,15
100900056,1,2,0.5,15363,2,0.25,1,1,0.25,7582,1,0.25,16
101800112,1,3,0.75,46458,3,0.25,1,2,0.5,11535,1,0.25,14
102700168,1,1,0.25,1324,0,0.25,1,2,0.5,12074,1,0.25,15
103600224,1,2,0.5,14720,2,0.25,1,2,0.5,5117,1,0.25,15
104500280,1,1,0.25,9720,1,0.25,1,3,0.75,15078,1,0.25,15
105400336,1,1,0.25,8254,1,0.5,1,2,0.5,19027,2,0.5,15
106300392,1,4,1.0,27910,3,0.75,1,3,0.75,24574,3,0.75,15
107200448,1,2,0.5,17414,2,0.25,1,2,0.5,8565,1,0.25,16
108100504,1,3,0.75,21789,2,0.25,1,3,0.75,12107,1,0.25,14


### Check the year 2 calculations for clients with less than 2 years (or 8 quarters) of data
The check below is used to demonstrate that clients without the full year of data for relative year 2 are set to NULL.

In [7]:
%%sql
SELECT TOP 5 * 
FROM FlatUIV /* using quarter table view */
WHERE LastQTR < 8 -- LOOKING AT CLIENTS WITH TRUNCATED DATA
ORDER BY SSN
;

 * mssql+pyodbc://@TDI
Done.


SSN,EMPy1,KEMPy1,PQEMPy1,EARNy1,KQGT3500y1,PQGT3500y1,EMPy2,KEMPy2,PQEMPy2,EARNy2,KQGT3500y2,PQGT3500y2,LastQTR
710237968,1,2,0.5,7550,1,,,,,,,,7
712038080,0,0,0.0,0,0,,,,,,,,7
713838192,1,2,0.5,22656,2,,,,,,,,7
716538360,1,1,0.25,1392,0,,,,,,,,7
718338472,1,1,0.25,10751,1,,,,,,,,7


## Automating the code above

The code presented above has been written manually for demonstration purposes. However, many of the measures we have created are yearly measures that are coded the same way for each time snapshot, and you may want to follow longer-term employment trends for people over many years. To create 5 measures for a 4 year follow-up period would have us typing many more lines of logic and summation code. This section demonstrates how to generate and run automated SQL queries and creates relative annual measures dynamically. 

This type of SQL coding is known as **Dynamic SQL**.  The result of our coding will be a query that can be run (or executed). So, we are not looking to generate SQL result sets below, rather we are generating SQL query code. 

##### Note about Dynamic SQL: 
*Jupyter Notebooks, the file type we are using to share this code, does not support dynamic SQL code. The code below therefore produces errors when it is executed here. To use this code, you should copy and paste it into your respective SQL Server software.*

Review the Create View code above that created 2 years of earning and employment measures but suppose you wanted  all the years we had data for. Notice that all the lines above are similar but   
1. The value that the WHEN clause test changes for every relative year: 1, 2
2. The name of measure being created changes for every relative year: EARNy1, EMPy1, EARNy2, EMPy2 ...

Below, the distinct values of the relative year variable will be stored in a iterative variable (@i) to drive the creation of our SQL query. By using this iterative variable, the only thing that changes on each line is value of the relative year. The value of the year is tested in the WHEN clause and is suffixed at the end of the column names.

In [None]:
%%sql

DECLARE @CMD NVARCHAR(MAX) ; -- create a temporary variable to store our automated sql code
SELECT @CMD = 
'SELECT SSN,'; -- store the start of the query code

DECLARE @i INT=1; -- create a variable for looping

WHILE @i <=4
BEGIN

SELECT @CMD=@CMD + ' 

SUM(CASE RelYear WHEN ' + cast(@i as nvarchar(10)) + ' THEN EMPy END) AS EMPy' + cast(@i as nvarchar(10)) + ',
SUM(CASE RelYear WHEN ' + cast(@i as nvarchar(10)) + ' THEN KEMPy END) AS KEMPy' + cast(@i as nvarchar(10))+ ',
SUM(CASE RelYear WHEN ' + cast(@i as nvarchar(10)) + ' THEN PerQEMPy END) AS PerQEMPy' + cast(@i as nvarchar(10))+ ',
SUM(CASE RelYear WHEN ' + cast(@i as nvarchar(10)) + ' THEN EARNy END) AS EARNy' + cast(@i as nvarchar(10))+ ',
SUM(CASE RelYear WHEN ' + cast(@i as nvarchar(10)) + ' THEN KQGT3500y END) AS KQGT3500y' + cast(@i as nvarchar(10))+ ','


SELECT @i= @i+1;
END;


SELECT @CMD=@CMD+ '
MAX(LastRelQTR) AS LastRelQTR
FROM  RelYearSumV
GROUP BY SSN;'
; -- append the end of the query code

PRINT @CMD; -- print the query code we generated
EXEC sp_executesql @CMD; -- execute the code we generated


#### The dynamic SQL coding above generated and executed the SQL Query code below:

In [None]:
SELECT SSN, 

SUM(CASE RelYear WHEN 1 THEN EMPy END) AS EMPy1,
SUM(CASE RelYear WHEN 1 THEN KEMPy END) AS KEMPy1,
SUM(CASE RelYear WHEN 1 THEN PerQEMPy END) AS PerQEMPy1,
SUM(CASE RelYear WHEN 1 THEN EARNy END) AS EARNy1,
SUM(CASE RelYear WHEN 1 THEN KQGT3500y END) AS KQGT3500y1, 

SUM(CASE RelYear WHEN 2 THEN EMPy END) AS EMPy2,
SUM(CASE RelYear WHEN 2 THEN KEMPy END) AS KEMPy2,
SUM(CASE RelYear WHEN 2 THEN PerQEMPy END) AS PerQEMPy2,
SUM(CASE RelYear WHEN 2 THEN EARNy END) AS EARNy2,
SUM(CASE RelYear WHEN 2 THEN KQGT3500y END) AS KQGT3500y2, 

SUM(CASE RelYear WHEN 3 THEN EMPy END) AS EMPy3,
SUM(CASE RelYear WHEN 3 THEN KEMPy END) AS KEMPy3,
SUM(CASE RelYear WHEN 3 THEN PerQEMPy END) AS PerQEMPy3,
SUM(CASE RelYear WHEN 3 THEN EARNy END) AS EARNy3,
SUM(CASE RelYear WHEN 3 THEN KQGT3500y END) AS KQGT3500y3, 

SUM(CASE RelYear WHEN 4 THEN EMPy END) AS EMPy4,
SUM(CASE RelYear WHEN 4 THEN KEMPy END) AS KEMPy4,
SUM(CASE RelYear WHEN 4 THEN PerQEMPy END) AS PerQEMPy4,
SUM(CASE RelYear WHEN 4 THEN EARNy END) AS EARNy4,
SUM(CASE RelYear WHEN 4 THEN KQGT3500y END) AS KQGT3500y4,
MAX(LastRelQTR) AS LastRelQTR
FROM  RelYearSumV
GROUP BY SSN;


## Any Change in Earnings Over Time?

We are often interested in comparing a client's quarterly earnings in the first year in the program to later years following program participation. A common approach is to look at the change in maximum quarterly earnings after four years of program participation. Below, we find the highest (or maximum) quarterly earnings during the first year of the program and compare it to the highest quarterly earnings during the fourth year. We create a variable storing the difference between the earnings (ChangeY1ToY4) and a 1/0 variable for whether the difference is greater than $250 (ChngY1Y4GE250).

In [8]:
%%sql
SELECT TOP 10
SSN,

/* relative quarter measures */
SUM(CASE RelativeQTR WHEN 1 THEN QTR_Earnings END) AS EARN1, -- displaying year 1 quarterly earnings
SUM(CASE RelativeQTR WHEN 2 THEN QTR_Earnings END) AS EARN2,
SUM(CASE RelativeQTR WHEN 3 THEN QTR_Earnings END) AS EARN3,
SUM(CASE RelativeQTR WHEN 4 THEN QTR_Earnings END) AS EARN4,
SUM(CASE RelativeQTR WHEN 13 THEN QTR_Earnings END) AS EARN13, -- displaying year 4 quarterly earnings
SUM(CASE RelativeQTR WHEN 14 THEN QTR_Earnings END) AS EARN14, 
SUM(CASE RelativeQTR WHEN 15 THEN QTR_Earnings END) AS EARN15,
SUM(CASE RelativeQTR WHEN 16 THEN QTR_Earnings END) AS EARN16,

MAX(CASE WHEN RelativeQTR in(1,2,3,4) THEN QTR_Earnings END) AS MAXEARNy1, -- Max quarterly earnings in year 1 
MAX(CASE WHEN RelativeQTR in(13,14,15,16) THEN QTR_Earnings END) AS MAXEARNy4, -- Max quarterly earnings in year 4

/* calculate the difference between year 1 max and year 4 max*/
(
MAX(CASE WHEN RelativeQTR in(13,14,15,16) THEN QTR_Earnings END) -
MAX(CASE WHEN RelativeQTR in(1,2,3,4) THEN QTR_Earnings END)
)  AS ChangeY1ToY4,

case when(
MAX(CASE WHEN RelativeQTR in(13,14,15,16) THEN QTR_Earnings END) -
MAX(CASE WHEN RelativeQTR in(1,2,3,4) THEN QTR_Earnings END)
) >= 250 then 1 else 0 end AS ChngY1Y4GE250, -- is the change $250 or more

MAX(RelativeQTR) AS LastQTR -- check each client has enough data to be included in these measures

FROM UIQuarterlyMeasuresV /* using quarter table view */
GROUP BY SSN /* group by is SQL way of reducing multiple record to 1 record per person */
HAVING MAX(RelativeQTR) >= 16 -- client needs at least 4 years of data to be appropriate for these calculations
ORDER BY SSN
;

 * mssql+pyodbc://@TDI
Done.


SSN,EARN1,EARN2,EARN3,EARN4,EARN13,EARN14,EARN15,EARN16,MAXEARNy1,MAXEARNy4,ChangeY1ToY4,ChngY1Y4GE250,LastQTR
100900056,4371,10992,0,0,0,8172,0,0,10992,8172,-2820,0,16
107200448,0,8574,0,8840,3482,27033,3350,16783,8840,27033,18193,1,16
113500840,0,0,703,6523,0,6331,0,7971,6523,7971,1448,1,16
119801232,3543,7184,8780,6773,0,0,1251,3121,8780,3121,-5659,0,16
126101624,30137,0,0,18499,0,0,0,0,30137,0,-30137,0,16
131501960,0,0,0,0,6871,0,3573,0,0,6871,6871,1,16
137802352,783,20722,9053,19942,0,0,0,0,20722,0,-20722,0,16
144102744,0,7350,0,0,12343,3082,0,1232,7350,12343,4993,1,16
150403136,0,0,0,2033,1552,0,0,0,2033,1552,-481,0,16
156703528,7801,0,281,0,0,0,7713,0,7801,7713,-88,0,16
