# Employer-Based Measures

## Purpose :

Sometimes we are interested in focusing on the client's employers or a particular employment sector. While our simulated data does not have employment sector codes, we can create a few measures focused on a client's employers, such as:  

1. How many employers did the client work for during the quarter?
2. What are the earning for each employer?
3. Which employer was the client's top employer based on earnings?
4. Which was the top employer in terms of amount of time worked in the year?

In [1]:
# this code loads needed packages and connect to database.
# load sqlalchemy package
import sqlalchemy

# Define connection string (Projects is the corpmdrc DSN)
connection_string = "mssql+pyodbc://@TDI"

# Create the engine connecting to the database server
sqlalchemy.create_engine(connection_string)

# Load sql magicks 
%load_ext sql

# Connect to the database server
%sql $connection_string

## Finding a good example case.
The code below hunts for a good case to use as a demonstration. It looks for a client who worked for more than 2 employers during more than 1 quarter.

In [2]:
%%sql
DROP VIEW IF EXISTS dbo.TESTCASE 
/* the drop above is executed when the view needs to be recreated */
;

 * mssql+pyodbc://@TDI
Done.


[]

In [3]:
%%sql 
CREATE VIEW dbo.TESTCASE AS 

-- THIS CODE JUST SEARCHES FOR A CLIENT WITH MULTIPLE QUARTERS WITH 2 EMPLOYERS

WITH MULTIqEMP AS ( -- FIND QUARTERS WITH 2 EMPLOYERS
    SELECT SSN, YR_QTR, COUNT(*) AS NUMEMPID FROM dbo.UIFULLV GROUP BY SSN, YR_QTR HAVING COUNT(*)>1
),

MULTIyEMP AS ( -- FIND SOMEONE WITH 4 QUARTERS WITH WORKING FOR 2 EMPLOYERS
    SELECT SSN,COUNT(*) AS SEVERAL FROM MULTIqEMP GROUP BY SSN HAVING COUNT(*)>3
)

SELECT SSN FROM MULTIyEMP;


 * mssql+pyodbc://@TDI
Done.


[]

## Starting Point

Let's look at a client with more than one employer. Notice that quarters 1 and 3 of 2019 and quarters 3 and 4 of 2020 have two different employers within the same quarter. 

In [4]:
%%sql
SELECT *
FROM  dbo.UIFULLV
WHERE SSN IN (SELECT SSN FROM dbo.TESTCASE) 
ORDER BY EarnQTR;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,WAGES,empid
250309352,2017-10-10,2018-04-08,2017Q1,2017-01-01,8554,352250309.0
250309352,2017-10-10,2018-04-08,2017Q2,2017-04-01,0,
250309352,2017-10-10,2018-04-08,2017Q3,2017-07-01,9734,352250309.0
250309352,2017-10-10,2018-04-08,2017Q4,2017-10-01,5864,352250309.0
250309352,2017-10-10,2018-04-08,2018Q1,2018-01-01,0,
250309352,2017-10-10,2018-04-08,2018Q2,2018-04-01,823,352250309.0
250309352,2017-10-10,2018-04-08,2018Q3,2018-07-01,0,
250309352,2017-10-10,2018-04-08,2018Q4,2018-10-01,0,
250309352,2017-10-10,2018-04-08,2019Q1,2019-01-01,7091,352250309.0
250309352,2017-10-10,2018-04-08,2019Q1,2019-01-01,936,522503093.0


In [5]:
%%sql
DROP TABLE IF EXISTS dbo.EmpRANK /* if table needs to be recreated */;

 * mssql+pyodbc://@TDI
Done.


[]

## Creating Quarterly Measures
### Creating a file with rankings for employers in each quarter based on wages
Here we create a total wage for each quarter and we rank top employers in terms of wages paid per quarter.

In [6]:
%%sql 
-- create a file with the rankings

select SSN, 
       YR_QTR, 
       empid,
       WAGES,
       Row_Number() OVER(partition by SSN, YR_QTR ORDER BY SSN, YR_QTR, WAGES desc) as EmpRANK,
       SUM(WAGES) OVER(partition by SSN, YR_QTR ORDER BY SSN, YR_QTR) as QTR_WAGES -- total wage per Quarter
INTO dbo.EmpRANK -- creating a file
FROM dbo.UIFULLV 
;

 * mssql+pyodbc://@TDI
17764 rows affected.


[]

####  Let's Print Our Example Case
Notice:
1. Now quarters with more than 1 employer have the employers ranked by wages (Quarters 1 and 3 of 2019 and Quarters 3 and 4 2020 are ranked).
2. We also now have a total wage for each quarter.

In [7]:
%%sql
-- print our test case on the ranking file
SELECT * 
FROM dbo.EmpRANK
WHERE SSN IN (SELECT SSN FROM dbo.TESTCASE)
order by ssn,YR_QTR
;

 * mssql+pyodbc://@TDI
Done.


SSN,YR_QTR,empid,WAGES,EmpRANK,QTR_WAGES
250309352,2017Q1,352250309.0,8554,1,8554
250309352,2017Q2,,0,1,0
250309352,2017Q3,352250309.0,9734,1,9734
250309352,2017Q4,352250309.0,5864,1,5864
250309352,2018Q1,,0,1,0
250309352,2018Q2,352250309.0,823,1,823
250309352,2018Q3,,0,1,0
250309352,2018Q4,,0,1,0
250309352,2019Q1,352250309.0,7091,1,8027
250309352,2019Q1,522503093.0,936,2,8027


### Summarizing/Standarizing the File
It is best to have a predictable number of records per client so let's first standarize the file to 1 record per quarter per client.

In [8]:
%%sql
SELECT 
    SSN, 
    YR_QTR, 
    MAX(CASE WHEN empid IS NOT NULL THEN EmpRANK else 0 END) AS NumberOfEmployers,
    MAX(CASE WHEN  EmpRANK= 1 and empid IS NOT NULL THEN WAGES END) AS EarningsPrimaryEmp,
    MAX(CASE WHEN  EmpRANK= 1 and empid IS NOT NULL and WAGES>0 THEN 1 else 0 END) AS EmployedPrimary,
    MAX(CASE EmpRANK WHEN 2 THEN WAGES END) AS EarningsSecondaryEmp,
    MAX(CASE WHEN  EmpRANK=2 and WAGES>0 THEN 1 else 0 END) AS EmployedSecondary,
    MAX(CASE EmpRANK WHEN 1 THEN empid END) AS PrimaryEmpID,
    MAX(CASE EmpRANK WHEN 2 THEN empid END) AS SecondaryEmpID

FROM dbo.EmpRank
WHERE SSN IN (SELECT SSN FROM dbo.TESTCASE)
GROUP BY SSN, YR_QTR
ORDER BY SSN, YR_QTR

;

 * mssql+pyodbc://@TDI
Done.


SSN,YR_QTR,NumberOfEmployers,EarningsPrimaryEmp,EmployedPrimary,EarningsSecondaryEmp,EmployedSecondary,PrimaryEmpID,SecondaryEmpID
250309352,2017Q1,1,8554.0,1,,0,352250309.0,
250309352,2017Q2,0,,0,,0,,
250309352,2017Q3,1,9734.0,1,,0,352250309.0,
250309352,2017Q4,1,5864.0,1,,0,352250309.0,
250309352,2018Q1,0,,0,,0,,
250309352,2018Q2,1,823.0,1,,0,352250309.0,
250309352,2018Q3,0,,0,,0,,
250309352,2018Q4,0,,0,,0,,
250309352,2019Q1,2,7091.0,1,936.0,1,352250309.0,522503093.0
250309352,2019Q2,1,7694.0,1,,0,352250309.0,


### Automating the Code Above

The code presented above has been written manually for demonstration purposes. However, many of the measures we have created are coded the same way for each employer, and you may want to follow multiple employers. This section demonstrates how to generate and run automated SQL queries and creates employer-based measures dynamically. 

This type of SQL coding is known as **Dynamic SQL**.  The result of our coding will be a query that can be run (or executed). So, we are not looking to generate SQL result sets below, rather we are generating SQL query code. 

##### Note about Dynamic SQL: 
*Jupyter Notebooks, the file type we are using to share this code, does not support dynamic SQL code. The code below therefore produces errors when it is executed here. To use this code, you should copy and paste it into your respective SQL Server software.*

The only thing that changes on each line is the value of the employer rank. The value of the rank is tested in the WHEN clause and also is suffix at the end of the column names. Below the distinct values of the employer rank will be stored in a iterative variable (@i) to drive the creation of our SQL query.

In [None]:
%%sql
DECLARE @sql NVARCHAR(MAX);  -- a temporary variable to store the query code we genterate
DECLARE @i int =1;  -- a variable to increment the loop starting with 1

SELECT @sql = 
'SELECT 
    SSN, 
    YR_QTR, 
    SUM(CASE WHEN empid IS NOT NULL THEN 1 ELSE 0 END) AS NumberOfEmployers,'; -- the top of the query that doesn''t change each quarter
    
/* looping to create 2 measures for 3 employers */
WHILE @i<=3  
BEGIN
    SELECT @sql = @sql +'
    MAX(CASE WHEN  EmpRANK=' +cast(@i as NCHAR(1)) + ' THEN WAGES END) AS EarningsEmployer'+cast(@i as NCHAR(1))+','+'
	MAX(CASE WHEN  EmpRANK=' +cast(@i as NCHAR(1)) + ' and WAGES>0 THEN 1 ELSE 0 END) AS EmployedEmployer'+cast(@i as NCHAR(1))+','+'
    MAX(CASE EmpRANK WHEN ' +cast(@i as NCHAR(1)) +' THEN empid END) AS EmpIDEmployer'+cast(@i as NCHAR(1))+',';
    SELECT @i=@i+1;
END;
SELECT @sql=SUBSTRING(@sql,1,LEN(@sql)-1); -- remove the , from the last line.

SELECT @sql=@sql+'
FROM dbo.EmpRank
GROUP BY SSN, YR_QTR;'; -- appending the last of the query

Print @sql; -- printing the query that was generated
EXEC sp_executesql @sql; -- executing the query that gets generated.

#### The dynamic SQL coding above generated and executed the SQL Query code below:

In [None]:
SELECT 
    SSN, 
    YR_QTR, 
    SUM(CASE WHEN empid IS NOT NULL THEN 1 ELSE 0 END) AS NumberOfEmployers,
    MAX(CASE WHEN  EmpRANK=1 THEN WAGES END) AS EarningsEmployer1,
	MAX(CASE WHEN  EmpRANK=1 and WAGES>0 THEN 1 ELSE 0 END) AS EmployedEmployer1,
    MAX(CASE EmpRANK WHEN 1 THEN empid END) AS EmpIDEmployer1,
    MAX(CASE WHEN  EmpRANK=2 THEN WAGES END) AS EarningsEmployer2,
	MAX(CASE WHEN  EmpRANK=2 and WAGES>0 THEN 1 ELSE 0 END) AS EmployedEmployer2,
    MAX(CASE EmpRANK WHEN 2 THEN empid END) AS EmpIDEmployer2,
    MAX(CASE WHEN  EmpRANK=3 THEN WAGES END) AS EarningsEmployer3,
	MAX(CASE WHEN  EmpRANK=3 and WAGES>0 THEN 1 ELSE 0 END) AS EmployedEmployer3,
    MAX(CASE EmpRANK WHEN 3 THEN empid END) AS EmpIDEmployer3
FROM dbo.EmpRank
GROUP BY SSN, YR_QTR;

## Yearly Measures
Let's also summarize the record by year to look at the amount of time employed for each employer during each year. 

Do to do this, we must first:
1. Count the number quarters our client worked for each employer each year.
2. Calculate the percent of the year during which our client worked for each employer.

In [9]:
%%sql
SELECT  SSN, 
        Year(EarnQTR) as Year, 
        empid,
        SUM(CASE WHEN WAGES>0 THEN 1 ELSE 0 END) AS EMPQTRS, -- count the number of quarters in year worked per employer
        cast(SUM(CASE WHEN WAGES>0 THEN 1 ELSE 0 END) as float)/4*100.0 AS PercentQemp -- Percent of year worked per emp
FROM  dbo.UIFULLV
WHERE SSN IN (SELECT SSN FROM dbo.TESTCASE) 
GROUP BY SSN, Year(EarnQTR), empid
ORDER BY Year, EMPQTRS desc;

 * mssql+pyodbc://@TDI
Done.


SSN,Year,empid,EMPQTRS,PercentQemp
250309352,2017,352250309.0,3,75.0
250309352,2017,,0,0.0
250309352,2018,352250309.0,1,25.0
250309352,2018,,0,0.0
250309352,2019,352250309.0,3,75.0
250309352,2019,522503093.0,2,50.0
250309352,2019,,0,0.0
250309352,2020,352250309.0,2,50.0
250309352,2020,522503093.0,2,50.0
250309352,2020,,0,0.0


#### Rank and Flatten
The code below then ranks the top employers for the year in terms of quarters worked (not wages) and flattens the file.

In [10]:
%%sql
with YearEmp as(
SELECT  SSN, 
        Year(EarnQTR) as Year, 
        empid,
        SUM(CASE WHEN WAGES>0 THEN 1 ELSE 0 END) AS EMPQTRS, -- count the number of quarters in year worked per employer
        cast(SUM(CASE WHEN WAGES>0 THEN 1 ELSE 0 END) as float)/4*100.0 AS PercentQemp -- Percent of year worked per emp
FROM  dbo.UIFULLV
WHERE SSN IN (SELECT SSN FROM dbo.TESTCASE) 
GROUP BY SSN, Year(EarnQTR), empid
),
EmpRank as (
SELECT *,
   ROW_NUMBER() OVER(partition by SSN, Year ORDER BY SSN, Year, EMPQTRS desc) as EmpRANK
FROM YearEmp
)
SELECT 
    SSN, 
    Year, 
    MAX(CASE WHEN empid IS NOT NULL THEN EmpRANK else 0 END) AS NumberOfEmployers,
    MAX(CASE WHEN  EmpRANK= 1 and empid IS NOT NULL THEN PercentQemp END) AS PercentYEmpPrimaryEmp,
    MAX(CASE EmpRANK WHEN 2 THEN  PercentQemp END) AS PercentYEmpSecondaryEmp,
    MAX(CASE EmpRANK WHEN 1 THEN empid END) AS PrimaryEmpID,
    MAX(CASE EmpRANK WHEN 2 THEN empid END) AS SecondaryEmpID
FROM EmpRank
WHERE SSN IN (SELECT SSN FROM dbo.TESTCASE)
GROUP BY SSN, Year
;

 * mssql+pyodbc://@TDI
Done.


SSN,Year,NumberOfEmployers,PercentYEmpPrimaryEmp,PercentYEmpSecondaryEmp,PrimaryEmpID,SecondaryEmpID
250309352,2017,1,75.0,0.0,352250309,
250309352,2018,1,25.0,0.0,352250309,
250309352,2019,2,75.0,50.0,352250309,522503093.0
250309352,2020,2,50.0,50.0,352250309,522503093.0
250309352,2021,1,25.0,,352250309,
