# Relative Quarterly Measures: What happened after program enrollment?

Note to users: please read the instructions file in this folder (00_instructions) before using this Jupyter Notebook file.

## SQL Database Connection

This section loads needed packages and connects the IPython Jupyter Notebook to the SQL database. If you are running this code in your own environment, remember to modify the SQL connection string to route the notebook to your own SQL server and database (see the 00_instructions file in this folder for more information). Our code uses the SQLALchemy Python package to interface between python and SQL languages, and uses Jupyter SQL ‘magic’ functions to make the code more concise.


In [1]:
# load sqlalchemy package to interface between Python and SQL databases
import sqlalchemy

# Replace the SQL connection string below (in quotation marks) with your own SQL connection information to run the program
connection_string = "mssql+pyodbc://@TDI"

# Create the engine connecting to the database server
sqlalchemy.create_engine(connection_string)

# Load the ipython-sql library to use Jupyter 'magic' functions, which make your code more concise
%load_ext sql

# Connect to the database server
%sql $connection_string

## Purpose:

Analysis questions often focus on what happened after  a certain event, such as enrollment in a program. This code will create quarterly employment and earnings measures for the quarter of program enrollment and each quarter afterwards. These are called "relative" measures because they are relative to the date of enrollment, unlike the simple calendar measures we created in the 04_calendar_measures_qtr notebook

Relative measures can also be created to line up outcomes for people after different events, such as entering TANF, exiting TANF, or completion of a specific program.

At the end of this notebook, you will find a section on using Dynamic SQL to streamline generating iterative code that can then be executed.

### Starting Point

On the source file, each client already has a record for every possible quarter in our data follow-up period. Each record has information about the earnings reported, a yes/no (0/1) indicator of employment, and the number of employers who reported earnings for the person during the quarter. Printed below are 6 quarters of data: the quarter just before program enrollment, the quarter of program enrollment, and 4 quarters after program enrollment. The source file already has a column (RelativeQTR), created in the 02_restructure_person_quarter file, that shows how many quarters before or after the program enrollment date the wages on that record were earned.

Let's take a look at the input file. For the first client below, notice that:
1.	The client enrolled in the program in quarter 2 of 2017 (06/01/2017)
2.	Earnings reported for quarter 1 of 2017 (1 quarter prior to enrollment) are assigned the RelativeQTR value of -1
3.	Earnings reported for quarter 2 of 2017 (the quarter of enrollment) are assigned the RelativeQTR value of 0

For the second client below, notice that:
1.	The client enrolled in the program in quarter 1 of 2017
2.	Quarter 1 of 2017 is the first quarter of UI wage data we received from our data provider
3.	Therefore, we do not have earnings information for quarters before this client started the program 
4.	This means there is no record with RelativeQTR value -1 for the second client

In [2]:
%%sql
SELECT TOP 12 *
FROM UIQuarterlyMeasuresV
where RelativeQTR in(-1,0,1,2,3,4) -- selecting just 6 quarters to show what happend just before and just after
order by SSN, YR_QTR;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,RelativeQTR,QTR_Earnings,QTR_EMPLOYED,QTR_NUMEMPLOYERS
100000000,2017-06-01,2017-11-28,2017Q1,2017-01-01,-1,9214,1,1
100000000,2017-06-01,2017-11-28,2017Q2,2017-04-01,0,8561,1,1
100000000,2017-06-01,2017-11-28,2017Q3,2017-07-01,1,12550,1,1
100000000,2017-06-01,2017-11-28,2017Q4,2017-10-01,2,0,0,0
100000000,2017-06-01,2017-11-28,2018Q1,2018-01-01,3,0,0,0
100000000,2017-06-01,2017-11-28,2018Q2,2018-04-01,4,0,0,0
100900056,2017-01-01,2017-06-30,2017Q1,2017-01-01,0,5624,1,1
100900056,2017-01-01,2017-06-30,2017Q2,2017-04-01,1,4371,1,1
100900056,2017-01-01,2017-06-30,2017Q3,2017-07-01,2,10992,1,1
100900056,2017-01-01,2017-06-30,2017Q4,2017-10-01,3,0,0,0


### FLATTEN FILE WITH GROUP BY and CASE: Relative Quarterly Measures

1.	We will reduce the multiple records per person (displayed above) to 1 record per person by using the GROUP BY statement.

2.	The CASE statement determines if the record is for the quarter of interest. If it is, it adds the earnings for that quarter into a new summary column. If not, record is skipped.

3.	Instead of pivoting using calendar quarters (YR_QTR) we will use the relative position indicator (RelativeQTR) to name and populate the columns with data from the appropriate row.

4.	Where an individual has data on the quarter prior to program enrollment (RelativeQTR is equal to -1), we can calculate PEARN1 and PEMP1, where “P” stands for prior to program enrollment.

In [3]:
%%sql
-- this code ALLOWS THIS PROGRAM to be rerun if there is a problem with the view and it needs to be removed and recreated
DROP VIEW IF EXISTS dbo.FlatUIv;

 * mssql+pyodbc://@TDI
Done.


[]

In [4]:
%%sql
CREATE VIEW FlatUIv as
SELECT SSN,

/* 1. in SQL it is impossible to flatten a file without group by statement*/
/* 2. in SQL it is impossible to use a group by statement without also using a summary function (e.g. sum below) */

/* relative quarter measures */
/*earnings*/
SUM(CASE RelativeQTR WHEN -1 THEN QTR_Earnings END) AS PEARN1, -- this is the quarter prior to program enrollment
SUM(CASE RelativeQTR WHEN 0 THEN QTR_Earnings END) AS EARN0, -- this is the quarter of program enrollment
SUM(CASE RelativeQTR WHEN 1 THEN QTR_Earnings END) AS EARN1, -- when statement is true value is moved to 1st col
SUM(CASE RelativeQTR WHEN 2 THEN QTR_Earnings END) AS EARN2, -- when statement is true value is moved to 2nd col
SUM(CASE RelativeQTR WHEN 3 THEN QTR_Earnings END) AS EARN3,
SUM(CASE RelativeQTR WHEN 4 THEN QTR_Earnings END) AS EARN4,

/*employment*/
SUM(CASE RelativeQTR WHEN -1 THEN QTR_Employed END) AS PEMP1,
SUM(CASE RelativeQTR WHEN 0 THEN QTR_Employed END) AS EMP0,
SUM(CASE RelativeQTR WHEN 1 THEN QTR_Employed END) AS EMP1,
SUM(CASE RelativeQTR WHEN 2 THEN QTR_Employed END) AS EMP2,
SUM(CASE RelativeQTR WHEN 3 THEN QTR_Employed END) AS EMP3,
SUM(CASE RelativeQTR WHEN 4 THEN QTR_Employed END) AS EMP4


FROM UIQuarterlyMeasuresV /* using quarter table view */
GROUP BY SSN /* group by is SQL way of reducing multiple record to 1 record per person */
;


 * mssql+pyodbc://@TDI
Done.


[]

#### Print a Few Cases

In [5]:
%%sql
SELECT TOP 20 * FROM FlatUIv ORDER BY SSN;

 * mssql+pyodbc://@TDI
Done.


SSN,PEARN1,EARN0,EARN1,EARN2,EARN3,EARN4,PEMP1,EMP0,EMP1,EMP2,EMP3,EMP4
100000000,9214.0,8561,12550,0,0,0,1.0,1,1,0,0,0
100900056,,5624,4371,10992,0,0,,1,1,1,0,0
101800112,0.0,0,0,7513,12523,26422,0.0,0,0,1,1,1
102700168,0.0,0,0,1324,0,0,0.0,0,0,1,0,0
103600224,0.0,7301,6670,8050,0,0,0.0,1,1,1,0,0
104500280,0.0,6673,0,9720,0,0,0.0,1,0,1,0,0
105400336,5122.0,0,8254,0,0,0,1.0,0,1,0,0,0
106300392,0.0,0,4794,13253,1530,8333,0.0,0,1,1,1,1
107200448,,25243,0,8574,0,8840,,1,0,1,0,1
108100504,8552.0,18603,6484,2192,13113,0,1.0,1,1,1,1,0


#### Check to make sure there are not duplicate SSNs

In [6]:
%%sql
SELECT COUNT(*) as NumRecs, COUNT(distinct SSN) as NumSSNs
FROM FlatUIv;

 * mssql+pyodbc://@TDI
Done.


NumRecs,NumSSNs
1012,1012


### A word about data coverage and missing data for clients who start the program late

Remember that the last quarter of UI wage data that we received was Q1 2021. We do not have data for Q2 2021 or later for anyone. So clients who start in the program close to the end of the time frame for data coverage will have fewer relative quarters of follow-up data and more relative quarters of historical data.

Let's take a look at the data for the last clients to enroll in our program. Notice:

1.	This person started in our program in December 2020.
2.	We only have information about wages for this person during the quarter of program enrollment and one quarter afterwards.
3.	Any measures we create after relative quarter 1 will be missing for this person.

In [7]:
%%sql
SELECT *
FROM UIQuarterlyMeasuresV
where ProgStart= -- selecting the last client with a start date equal to the last start date on the file
    (select MAX(ProgStart)
     FROM UIQuarterlyMeasuresV) -- finding the last start date on the file
;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,RelativeQTR,QTR_Earnings,QTR_EMPLOYED,QTR_NUMEMPLOYERS
919951016,2020-12-26,,2017Q1,2017-01-01,-15,0,0,0
919951016,2020-12-26,,2017Q2,2017-04-01,-14,841,1,1
919951016,2020-12-26,,2017Q3,2017-07-01,-13,9851,1,1
919951016,2020-12-26,,2017Q4,2017-10-01,-12,0,0,0
919951016,2020-12-26,,2018Q1,2018-01-01,-11,6340,1,1
919951016,2020-12-26,,2018Q2,2018-04-01,-10,5243,1,1
919951016,2020-12-26,,2018Q3,2018-07-01,-9,0,0,0
919951016,2020-12-26,,2018Q4,2018-10-01,-8,10271,1,1
919951016,2020-12-26,,2019Q1,2019-01-01,-7,0,0,0
919951016,2020-12-26,,2019Q2,2019-04-01,-6,223,1,1


### Now let's look at the longitudinal record we created for this person

Notice that the last 3 relative quarter measures are NULL (or "None") because data for those periods are not yet available . (Recall that the absence of earnings data for quarters that fall between Q1 2017 and Q1 2021 is presumed to be due to the individual not having been employed, and we imputed $0 earnings in those instances.) 

In [8]:
%%sql
SELECT TOP 20
*
FROM FlatUIv

where SSN=
  (select distinct SSN
     FROM UIQuarterlyMeasuresV
     where ProgStart= (select MAX(ProgStart) FROM UIQuarterlyMeasuresV))   
ORDER BY SSN
;


 * mssql+pyodbc://@TDI
Done.


SSN,PEARN1,EARN0,EARN1,EARN2,EARN3,EARN4,PEMP1,EMP0,EMP1,EMP2,EMP3,EMP4
919951016,0,0,0,,,,0,0,0,,,


## Automating the code above using Dynamic SQL

The code presented above has been written manually for demonstration purposes. However, many of the measures we have created are relative measures that are coded the same way for each time point, and you may want to follow employment trends for people over many years. To create a relative measure for a 3-year follow-up period, for example, would take 36 lines of logic and summation code. This section demonstrates how to generate and run automated SQL queries and creates relative quarterly measures dynamically.

This type of SQL coding is known as **Dynamic SQL**. The result will be query language that can be run (or executed). So, we are not looking to generate SQL result sets below; instead, we use this code to generateSQL query code that can then be run.

**Note about Dynamic SQL:**

*Jupyter Notebooks, the file type we are using to share this code, does not support dynamic SQL code. The code below therefore produces errors when it is executed here. To use this code, you should copy and paste it into your respective SQL Server software.*

Review the Create View code above that created 5 quarters of earning and employment measures. Suppose you wanted to create measures for all the quarters we have data for. Notice that all the lines above are similar to the ones below, but below,

1.	The value that the WHEN clause tests for changes for every relative quarter: 0,1,2,3,4
2.	The name of the measure created changes for every relative quarter: EARN0, EMP0, EARN1, EMP1 ...

Below, the distinct relative quarter values will be stored in a variable to drive the creation of our SQL query. By using this iterative variable, the only thing that changes in each line is the value of the relative quarter. The value of the relative quarter is tested in the WHEN clause and is added as a suffix to the end of the column names.


In [9]:
%%sql
DECLARE @QTR TABLE (QTR varchar(3), QTRN INT);-- create temporary table variables, character and numeric versions of same

INSERT INTO @QTR (QTR,QTRN)  -- Store the quarter values in the table variables.
SELECT DISTINCT RelativeQTR ,  RelativeQTR FROM dbo.UIQuarterlyMeasuresV
where RelativeQTR >= 0;  -- WHERE selects all quarters on or after program start.

DECLARE @CMD NVARCHAR(MAX); -- create a temporary variable to store our automated sql code

SELECT @CMD = 
'SELECT SSN, 
SUM(CASE RelativeQTR WHEN -1 THEN QTR_Earnings END) AS PEARN1, 
SUM(CASE RelativeQTR WHEN -1 THEN QTR_Employed END) AS PEMP1, 
'; -- store the start of the query code
SELECT @CMD=@CMD + ' 
SUM(CASE RelativeQTR WHEN ' + QTR + ' THEN QTR_Earnings END) AS EARN'+QTR+ ' ,'+'
SUM(CASE RelativeQTR WHEN ' + QTR + ' THEN QTR_Employed END) AS EMP'+QTR+ ' ,'
FROM @QTR ORDER BY QTRN; -- append each quarterly code to the query code ORDER BY NUMERIC version 

SELECT @CMD=SUBSTRING(@CMD,1,LEN(@CMD)-1); -- remove the , at the end of the last quarter line

SELECT @CMD=@CMD+'
FROM dbo.UIQuarterlyMeasuresV
GROUP BY SSN;'; -- append the end of the query code

PRINT @CMD; -- print the query code we generated
EXEC sp_executesql @CMD; -- execute the code we generated


 * mssql+pyodbc://@TDI
Done.
(pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Must declare the table variable "@QTR". (1087) (SQLExecDirectW)')
[SQL: INSERT INTO @QTR (QTR,QTRN)  -- Store the quarter values in the table variables.
SELECT DISTINCT RelativeQTR ,  RelativeQTR FROM dbo.UIQuarterlyMeasuresV
where RelativeQTR >= 0;  -- WHERE selects all quarters on or after program start.]
(Background on this error at: http://sqlalche.me/e/14/f405)


#### For reference, the dynamic SQL coding above generates and executes the SQL Query code below (remember, this code will produce errors when run in Jupyter Notebooks. See the note above about dynamic SQL):

In [10]:

(17 rows affected)
SELECT SSN, 
SUM(CASE RelativeQTR WHEN -1 THEN QTR_Earnings END) AS PEARN1, 
SUM(CASE RelativeQTR WHEN -1 THEN QTR_Employed END) AS PEMP1, 
SUM(CASE RelativeQTR WHEN 0 THEN QTR_Earnings END) AS EARN0 ,
SUM(CASE RelativeQTR WHEN 0 THEN QTR_Employed END) AS EMP0 , 
SUM(CASE RelativeQTR WHEN 1 THEN QTR_Earnings END) AS EARN1 ,
SUM(CASE RelativeQTR WHEN 1 THEN QTR_Employed END) AS EMP1 , 
SUM(CASE RelativeQTR WHEN 2 THEN QTR_Earnings END) AS EARN2 ,
SUM(CASE RelativeQTR WHEN 2 THEN QTR_Employed END) AS EMP2 , 
SUM(CASE RelativeQTR WHEN 3 THEN QTR_Earnings END) AS EARN3 ,
SUM(CASE RelativeQTR WHEN 3 THEN QTR_Employed END) AS EMP3 , 
SUM(CASE RelativeQTR WHEN 4 THEN QTR_Earnings END) AS EARN4 ,
SUM(CASE RelativeQTR WHEN 4 THEN QTR_Employed END) AS EMP4 , 
SUM(CASE RelativeQTR WHEN 5 THEN QTR_Earnings END) AS EARN5 ,
SUM(CASE RelativeQTR WHEN 5 THEN QTR_Employed END) AS EMP5 , 
SUM(CASE RelativeQTR WHEN 6 THEN QTR_Earnings END) AS EARN6 ,
SUM(CASE RelativeQTR WHEN 6 THEN QTR_Employed END) AS EMP6 , 
SUM(CASE RelativeQTR WHEN 7 THEN QTR_Earnings END) AS EARN7 ,
SUM(CASE RelativeQTR WHEN 7 THEN QTR_Employed END) AS EMP7 , 
SUM(CASE RelativeQTR WHEN 8 THEN QTR_Earnings END) AS EARN8 ,
SUM(CASE RelativeQTR WHEN 8 THEN QTR_Employed END) AS EMP8 , 
SUM(CASE RelativeQTR WHEN 9 THEN QTR_Earnings END) AS EARN9 ,
SUM(CASE RelativeQTR WHEN 9 THEN QTR_Employed END) AS EMP9 , 
SUM(CASE RelativeQTR WHEN 10 THEN QTR_Earnings END) AS EARN10 ,
SUM(CASE RelativeQTR WHEN 10 THEN QTR_Employed END) AS EMP10 , 
SUM(CASE RelativeQTR WHEN 11 THEN QTR_Earnings END) AS EARN11 ,
SUM(CASE RelativeQTR WHEN 11 THEN QTR_Employed END) AS EMP11 , 
SUM(CASE RelativeQTR WHEN 12 THEN QTR_Earnings END) AS EARN12 ,
SUM(CASE RelativeQTR WHEN 12 THEN QTR_Employed END) AS EMP12 , 
SUM(CASE RelativeQTR WHEN 13 THEN QTR_Earnings END) AS EARN13 ,
SUM(CASE RelativeQTR WHEN 13 THEN QTR_Employed END) AS EMP13 , 
SUM(CASE RelativeQTR WHEN 14 THEN QTR_Earnings END) AS EARN14 ,
SUM(CASE RelativeQTR WHEN 14 THEN QTR_Employed END) AS EMP14 , 
SUM(CASE RelativeQTR WHEN 15 THEN QTR_Earnings END) AS EARN15 ,
SUM(CASE RelativeQTR WHEN 15 THEN QTR_Employed END) AS EMP15 , 
SUM(CASE RelativeQTR WHEN 16 THEN QTR_Earnings END) AS EARN16 ,
SUM(CASE RelativeQTR WHEN 16 THEN QTR_Employed END) AS EMP16 
FROM dbo.UIQuarterlyMeasuresV
GROUP BY SSN;

SyntaxError: invalid syntax (<ipython-input-10-3aca9bb24a3d>, line 1)