# Summarizing a Person-Quarter Level File to a Person-Level File: Quarterly Calendar Measures

In [1]:
# this code loads needed packages and connect to database.
# load sqlalchemy package
import sqlalchemy

# Define connection string (Projects is the corpmdrc DSN)
connection_string = "mssql+pyodbc://@TDI"

# Create the engine connecting to the database server
sqlalchemy.create_engine(connection_string)

# Load sql magicks 
%load_ext sql

# Connect to the database server
%sql $connection_string


## Purpose: 

The purpose of this code is to demonstrate how to convert a person-quarterly transaction-level file to a person-level file, and add a series of quarterly employment and earnings outcomes.

In the previous notebook (02_restructure_person_quarter), we created a quarterly file with one record per person per quarter (UIQuarterlyMeasuresV). Here, we will summarize the file to 1 record per person (by distinct SSN) with all the information about each person’s history of employment and earnings on 1 record.


#### Starting Point 

In the previous notebook (02_restructure_person_quarter), we created a quarterly file with one record per person per quarter (UIQuarterlyMeasuresV). Here, we will pivot or transpose the file to 1 record per person (by distinct SSN) with all the information about each person’s history of employment and earnings on 1 record.

Our source file for the pivot has a record for every possible quarter a person could
be employed. Each record has information about the earning reported, a yes/no [0/1] indicator of employment,
and the number of employers who reported earnings for the person during the quarter. The code presented here is limited to the 4 calendar quarters in 2017 but can be expanded and adapted for any time frame. 

Let's take a look at the transaction file that we will use as input.

In [2]:
%%sql
SELECT TOP 12 *
FROM UIQuarterlyMeasuresV
where LEFT(YR_QTR,4) ='2017' -- selecting just 2017 quarters 
order by SSN, YR_QTR;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,RelativeQTR,QTR_Earnings,QTR_EMPLOYED,QTR_NUMEMPLOYERS
100000000,2017-06-01,2017-11-28,2017Q1,2017-01-01,-1,9214,1,1
100000000,2017-06-01,2017-11-28,2017Q2,2017-04-01,0,8561,1,1
100000000,2017-06-01,2017-11-28,2017Q3,2017-07-01,1,12550,1,1
100000000,2017-06-01,2017-11-28,2017Q4,2017-10-01,2,0,0,0
100900056,2017-01-01,2017-06-30,2017Q1,2017-01-01,0,5624,1,1
100900056,2017-01-01,2017-06-30,2017Q2,2017-04-01,1,4371,1,1
100900056,2017-01-01,2017-06-30,2017Q3,2017-07-01,2,10992,1,1
100900056,2017-01-01,2017-06-30,2017Q4,2017-10-01,3,0,0,0
101800112,2017-08-01,2018-01-28,2017Q1,2017-01-01,-2,0,0,0
101800112,2017-08-01,2018-01-28,2017Q2,2017-04-01,-1,0,0,0


### FLATTEN FILE WITH GROUP BY and CASE: Quarterly Calendar Measures

1. We will reduce multiple records for an SSN to 1 record for the SSN by using the GROUP BY statement, which indicates the level at which you want your created outcomes to be summarized.

2. The CASE statement is a conditional statement. In our case, we use CASE to specify what values of YR_QTR to process and what values of YR_QTR to skip and adds the earnings for each relevant quarter into a new summary column. 

In [4]:
%%sql
-- this code ALLOWS THIS PROGRAM to be rerun if there is a problem with the view and it needs to be removed and recreated
DROP VIEW IF EXISTS dbo.FlatUIv;

 * mssql+pyodbc://@TDI
Done.


[]

In [5]:
%%sql
CREATE VIEW FlatUIv as
SELECT
SSN,
/* 1. in SQL it is impossible to flatten a file without group by statement*/
/* 2. in SQL it is impossible to use a group by statement without also using a summary function (e.g. sum below) */

/* calendar quarter earnings measures*/
SUM(CASE YR_QTR WHEN '2017Q1' THEN QTR_Earnings END) AS EARN2017Q1, -- WHEN statement is true value is moved to 1st col
SUM(CASE YR_QTR WHEN '2017Q2' THEN QTR_Earnings END) AS EARN2017Q2, -- WHEN statement is true value is moved to 2nd col
SUM(CASE YR_QTR WHEN '2017Q3' THEN QTR_Earnings END) AS EARN2017Q3,
SUM(CASE YR_QTR WHEN '2017Q4' THEN QTR_Earnings END) AS EARN2017Q4,

/* calendar quarter employment measures*/
SUM(CASE YR_QTR WHEN '2017Q1' THEN QTR_Employed END) AS EMP2017Q1,
SUM(CASE YR_QTR WHEN '2017Q2' THEN QTR_Employed END) AS EMP2017Q2,
SUM(CASE YR_QTR WHEN '2017Q3' THEN QTR_Employed END) AS EMP2017Q3,
SUM(CASE YR_QTR WHEN '2017Q4' THEN QTR_Employed END) AS EMP2017Q4


FROM UIQuarterlyMeasuresV /* using quarter table view */
GROUP BY SSN /* this summarizes selected measures by person */
;

 * mssql+pyodbc://@TDI
Done.


[]

#### Print a few cases from view

In [6]:
%%sql
SELECT TOP 10 * FROM FlatUIv
ORDER BY SSN;

 * mssql+pyodbc://@TDI
Done.


SSN,EARN2017Q1,EARN2017Q2,EARN2017Q3,EARN2017Q4,EMP2017Q1,EMP2017Q2,EMP2017Q3,EMP2017Q4
100000000,9214,8561,12550,0,1,1,1,0
100900056,5624,4371,10992,0,1,1,1,0
101800112,0,0,0,0,0,0,0,0
102700168,0,0,0,1324,0,0,0,1
103600224,0,7301,6670,8050,0,1,1,1
104500280,0,6673,0,9720,0,1,0,1
105400336,5122,0,8254,0,1,0,1,0
106300392,0,0,4794,13253,0,0,1,1
107200448,25243,0,8574,0,1,0,1,0
108100504,944,8552,18603,6484,1,1,1,1


#### Check to make sure there are not duplicate SSNs

In [7]:
%%sql
SELECT COUNT(*) as NumRecs, COUNT(distinct SSN) as NumSSNs
FROM FlatUIv;

 * mssql+pyodbc://@TDI
Done.


NumRecs,NumSSNs
1012,1012


### Automating the code

The code presented above has been written manually for demonstration purposes. However, many of the measures we have created are quarterly measures that are coded the same way for each time snapshot, and you may want to follow longer-term employment trends for people over many years. To create one quarterly measure for a 3-year follow-up period, for example, would have us typing 36 lines of logic and summation code. This section demonstrates how to generate and run automated SQL queries and creates calendar quarterly measures dynamically. 

This type of SQL coding is known as **Dynamic SQL**.  The result of our coding will be a query that can be run (or executed). So, we are not looking to generate SQL result sets below, rather we are generating SQL query code. 

##### Note about Dynamic SQL: 
*Jupyter Notebooks, the file type we are using to share this code, does not support dynamic SQL code. The code below therefore produces errors when it is executed here. To use this code, you should copy and paste it into your respective SQL Server software.*

Review the Create View code above that created 4 quarters of earning and employment measures in 2017 but suppose you wanted all the quarters in 2017 and 2018? Notice that all the lines above are similar but

1.	The value that the WHEN clause test changes for every quarter: '2017Q1'
2.	The name of measure being created changes for every quarter: EARN2017Q1, EMP2017Q1

Below, the distinct quarter values will be stored in a variable to drive the creation of our SQL query. By using this iterative variable, the only thing that changes in each line is value of the quarter. The value of the quarter is tested in the WHEN clause and is suffixed at the end of the column names.

In [None]:
%%sql
/* automation code */
DECLARE @QTR TABLE (QTR VARCHAR(6));-- create a temporary table variable

INSERT INTO @QTR (QTR)
SELECT DISTINCT YR_QTR FROM dbo.UIQuarterlyMeasuresV
where EarnQTR <= '20181231';  -- Store the quarter values of interest in the table variables. Here we are selecting all quarters in 2018 and earlier

DECLARE @CMD NVARCHAR(MAX); -- create a temporary variable to store our automated sql code

SELECT @CMD = 
'SELECT SSN,'; -- store the start of the query code
SELECT @CMD=@CMD + ' 
SUM(CASE YR_QTR WHEN ''' + QTR + ''' THEN QTR_Earnings END) AS EARN'+QTR+ ' ,'+'
SUM(CASE YR_QTR WHEN ''' + QTR + ''' THEN QTR_Employed END) AS EMP'+QTR+ ' ,'
FROM @QTR; -- append each quarterly code to the query code

SELECT @CMD=SUBSTRING(@CMD,1,LEN(@CMD)-1); -- remove the , at the end of the last quarter line

SELECT @CMD=@CMD+'
FROM dbo.UIQuarterlyMeasuresV
GROUP BY SSN;'; -- append the end of the query code

PRINT @CMD; -- print the query code we generated
EXEC sp_executesql @CMD; -- execute the code we generated


#### The dynamic SQL coding above generated and executed the SQL Query code below:

In [None]:
SELECT SSN, 
SUM(CASE YR_QTR WHEN '2017Q1' THEN QTR_Earnings END) AS EARN2017Q1 ,
SUM(CASE YR_QTR WHEN '2017Q1' THEN QTR_Employed END) AS EMP2017Q1 , 
SUM(CASE YR_QTR WHEN '2017Q2' THEN QTR_Earnings END) AS EARN2017Q2 ,
SUM(CASE YR_QTR WHEN '2017Q2' THEN QTR_Employed END) AS EMP2017Q2 , 
SUM(CASE YR_QTR WHEN '2017Q3' THEN QTR_Earnings END) AS EARN2017Q3 ,
SUM(CASE YR_QTR WHEN '2017Q3' THEN QTR_Employed END) AS EMP2017Q3 , 
SUM(CASE YR_QTR WHEN '2017Q4' THEN QTR_Earnings END) AS EARN2017Q4 ,
SUM(CASE YR_QTR WHEN '2017Q4' THEN QTR_Employed END) AS EMP2017Q4 , 
SUM(CASE YR_QTR WHEN '2018Q1' THEN QTR_Earnings END) AS EARN2018Q1 ,
SUM(CASE YR_QTR WHEN '2018Q1' THEN QTR_Employed END) AS EMP2018Q1 , 
SUM(CASE YR_QTR WHEN '2018Q2' THEN QTR_Earnings END) AS EARN2018Q2 ,
SUM(CASE YR_QTR WHEN '2018Q2' THEN QTR_Employed END) AS EMP2018Q2 , 
SUM(CASE YR_QTR WHEN '2018Q3' THEN QTR_Earnings END) AS EARN2018Q3 ,
SUM(CASE YR_QTR WHEN '2018Q3' THEN QTR_Employed END) AS EMP2018Q3 , 
SUM(CASE YR_QTR WHEN '2018Q4' THEN QTR_Earnings END) AS EARN2018Q4 ,
SUM(CASE YR_QTR WHEN '2018Q4' THEN QTR_Employed END) AS EMP2018Q4 
FROM dbo.UIQuarterlyMeasuresV
GROUP BY SSN;