# Join the TANF Cross-Reference and UI Wage Data to Create a SQL View 

Note to users: please read the instructions file in this folder (00_instructions) before using this Jupyter Notebook file.

## SQL Database Connection

This section loads needed packages and connects the IPython Jupyter Notebook to the SQL database. If you are running this code in your own environment, remember to modify the SQL connection string to route the notebook to your own SQL server and database (see the 00_instructions file in this folder for more information). Our code uses the SQLALchemy Python package to interface between python and SQL languages, and uses Jupyter SQL ‘magic’ functions to make the code more concise.

In [17]:
# load sqlalchemy package to interface between Python and SQL databases
import sqlalchemy

# Replace the SQL connection string below (in quotation marks) with your own SQL connection information 
connection_string = "mssql+pyodbc://@TDI"

# Create the engine connecting to the database server
sqlalchemy.create_engine(connection_string)

# Load the ipython-sql library to use Jupyter ‘magic’ functions, which make your code more concise 
%load_ext sql

# Connect to the database server
%sql $connection_string

The sql extension is already loaded. To reload it, use:
  %reload_ext sql



## Purpose

The purpose of this code is to join the simulated UI wage records file from a state department of labor to a TANF sample definition file (which we named XRef). The sample definition file includes some of the information from what is typically included in a full TANF cross-reference file, which links individual-level identifiers (like SSNs) to TANF case information. In this scenario, the file contains a list of identifiers for all of our sample members with a program start and end date We use SSNs to link the two files (XRef dataset and UI dataset) so that everyone in our sample is included in the resulting file, even if they did not appear in the UI wage data file. You will need adults who did not work in any particular quarter in your data view so that you can account for non-employment in specific quarters when creating your employment-related measures. Since UI wage records files are typically be structured as one record per individual-quarter-employer, linking the two files creates a view that has at least one wage record per quarter for each person in our cross-reference file.

Recall that our simulated sample includes adult TANF recipients who entered the program between January 2017 and December 2020, and we have simulated quarterly wage data for the period Q1 2017 - Q1 2021. Since our simulated UI wage data file covers Q1 2017 through Q1 2021, each SSN will end up with at least 17 records because there are 17 quarters in this follow-up period. Even people who were rarely employed or never employed should end up with at least 17 records; any quarter with no UI wage data will have 0 earnings amounts imputed. People who worked more than one job in any quarter will have more than 17 records. This will be accomplished with a 3 part join: first, we join the TANF cross-reference file to the distinct quarters on the UI wage data file to produce a file (or view) in which each person in our sample has at least 1 record per quarter. Then, we join the resulting data set to the full set of data on UI wage data file.


#### Print a few records from our sample definition file before we start..

XRef is our sample definition file, which includes the person identifier and other relevant information that is important for matching to the UI wage data file and creating relative follow-up time frames (this will be covered in the 04_create_outcomes folder of this repository). This synthetic file includes SSN, a program start date, and a program exit date for each person. ProgStart is the date the client started in the program and ProgEnd is the date the client left the program.


In [18]:
%%sql
SELECT TOP 10 *
FROM dbo.XRef
ORDER BY SSN
;


 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00
100900056,2017-01-01 00:00:00,2017-06-30 00:00:00
101800112,2017-08-01 00:00:00,2018-01-28 00:00:00
102700168,2017-06-02 00:00:00,2017-11-29 00:00:00
103600224,2017-06-03 00:00:00,2017-11-30 00:00:00
104500280,2017-06-04 00:00:00,2017-12-01 00:00:00
105400336,2017-06-05 00:00:00,2017-12-02 00:00:00
106300392,2017-06-06 00:00:00,2017-12-03 00:00:00
107200448,2017-01-06 00:00:00,2017-07-05 00:00:00
108100504,2017-08-07 00:00:00,2018-02-03 00:00:00


#### Count the number of records on the TANF sample definition file. This file has 1 record per client

Make sure that the number of SSNs and the number of records are equal. If they are not, you may be missing or have some duplicate SSNs in your TANF cross-reference file.

In [19]:
%%sql
SELECT count (distinct SSN) as NumberSSN,
        count (*) as NumberRecs
FROM dbo.XRef
/* count the number of SSNs and records on the XRef before we start */
;

 * mssql+pyodbc://@TDI
Done.


NumberSSN,NumberRecs
1012,1012


#### Print a few cases from our UI wage file before we start so that we can follow these observations through the data preparation process. For illustrative purposes, we just print 2017 quarters.

In [20]:
%%sql
SELECT TOP 12 * -- This selects the first 12 wage records that appear in the data file with the specified parameters below (i.e., in year 2017)
FROM dbo.UI
WHERE LEFT(YR_QTR,4)=2017
ORDER BY SSN,YR_QTR
;

 * mssql+pyodbc://@TDI
Done.


SSN,YR_QTR,WAGES,empid
100000000,2017Q1,9214,100000
100000000,2017Q2,8561,100000
100000000,2017Q3,12550,100000
100900056,2017Q1,5624,56100900
100900056,2017Q2,4371,56100900
100900056,2017Q3,10992,56100900
102700168,2017Q4,1324,168102700
103600224,2017Q2,7301,224103600
103600224,2017Q3,6670,224103600
103600224,2017Q4,8050,224103600


#### Count the number of different units (e.g., records, people, employers, quarters, earnings categories, etc.) available in the UI wage data file to make sure they meet our expectations of what is on the file, and as a way to check our code as we create outcomes. 

From the numbers below, for example, we know that we have 912 individuals in our file. There should be fewer than the number of people who are in the TANF cross-reference file, since there will be people in the TANF file who did not work in a UI-covered job between 2017 and 2021. The 912 number represents the number of people who have received any earnings from UI-covered jobs between Q1 2017 and Q2 2021.

In [21]:
%%sql
SELECT count (distinct YR_QTR) as NumQTR, -- check how many quarters of wages were sent to us on the UI wage file
       count (distinct SSN) as NumSSN, -- check how many people are on UI wage file
       count (distinct empid) as NumEmployers, -- check how many employers reported wages for our clients
       sum (case when wages > 0 then 1 end ) as NumWRKQTR, -- check how many non-zero earnings records we have 
       count (*) as NumRecs -- count how many records are on the wage file. If you received UI wage data from the state, the number should be same as non-zero earnings records. (Note that UI wage data from the National Directory of New Hires (NDNH) may include $0 earnings.)
FROM dbo.UI
;

 * mssql+pyodbc://@TDI
Done.


NumQTR,NumSSN,NumEmployers,NumWRKQTR,NumRecs
17,912,1567,8792,8792


#### Confirm what quarters of data were sent to us on the UI wage data file and that the number of records per quarter are fairly consistent across quarters, with variations that might be expected with local economic conditions. 

As an example, we often see more wage records in the fourth quarter of each year due to seasonal increases in labor demand in the retail sector for the holidays. You might also expect to see dramatic reductions in the number of wage records through most of 2020 due to the pandemic. (Since the example below is based on synthetic data, these expected trends are not apparent.)

The query below produces a list of the quarters that are on the synthetic data file. In this case the file covers quarter 1 of 2017 through quarter 1 of 2021.


In [22]:
%%sql
SELECT YR_QTR, COUNT(*) as NumRecs
FROM dbo.UI
GROUP BY YR_QTR /* group by is SQL way of reducing multiple records to 1 record per person */
ORDER BY YR_QTR
/* These distinct quarters will next be joined to every SSN on the x-ref file
to create a record for every quarter for each person*/

 * mssql+pyodbc://@TDI
Done.


YR_QTR,NumRecs
2017Q1,530
2017Q2,523
2017Q3,516
2017Q4,507
2018Q1,495
2018Q2,518
2018Q3,504
2018Q4,536
2019Q1,504
2019Q2,506


### Create a person-quarter-level view of UI-covered earnings for everyone in the TANF sample definition file

To create this view, multiple stages of merging, or joining, data views are necessary. This can be done in one SQL statement. However, in order to demonstrate what each of these stages of joins is doing, we first show the SQL code in separate code segments. Then, we demonstrate how to nest these joins to create the desired view using one SQL statement.

#### In this first code segment, we join the TANF sample definition file to the distinct quarters on the UI wage data file to produce a file (or view) in which each person in our sample has at least 1 record per quarter.

In [23]:
%%sql
select top 17 * from -- this creates one row per person for each of the 17 quarters of follow-up that are in the UI wage data file
-- ADDING all RECORDS From the Cross Reference file. 
 ( SELECT SSN, ProgStart, ProgEnd
  FROM  dbo.XRef) AS XREF
  
 CROSS JOIN -- CREATING A CARTESIAN OF EVERY SSN AND EVERY QUARTER. 
           -- EVEN SSNS NOT IN THE UI WAGE DATA FILE WILL HAVE 1 RECORD FOR EVERY QUARTER

-- ADDING 1 RECORD FOR EVERY QUARTER IN THE FOLLOW-UP. USING DATES FROM WAGE FILE.
  (SELECT DISTINCT YR_QTR
  FROM TDI.dbo.UI) AS QUARTERS
    
ORDER BY SSN, YR_QTR;


 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q1
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q2
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q3
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q4
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q1
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q2
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q3
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q4
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2019Q1
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2019Q2


#### The next code segment shows how we join the above records to the full set of data on UI wage records file. 

This join produces a person-employer-quarter-level view. Everyone who has not worked, or worked at one employer per quarter, will have 17 rows of data (one row per quarter of employment). Anyone who has more than one employer in a quarter will have more than 17 rows, since some of the quarters will have multiple rows – one wage amount for each distinct employer.

In [24]:
%%sql
Select top 17 * from 
-- ADDING all RECORDS From the Cross Reference files. 
 ( SELECT SSN, ProgStart, ProgEnd
  FROM  dbo.XRef) AS XREF
  
 CROSS JOIN -- CREATING A CARTESIAN OF EVERY SSN AND EVERY QUARTER. 
           -- EVEN SSNS NOT IN THE UI WAGE FILE WILL HAVE 1 RECORD FOR EVERY QUARTER

-- ADDING 1 RECORD FOR EVERY QUARTER IN THE FOLLOW-UP. USING DATES FROM WAGE FILE.
  (SELECT DISTINCT YR_QTR
  FROM TDI.dbo.UI) AS QUARTERS
  
-- NOW THAT WE HAVE ONE RECORD FOR EACH PERSON-QUARTER, WE MERGE ON THE UI WAGES AND EMPLOYER IDS
 LEFT OUTER JOIN
 dbo.UI as WAGES 
  on XREF.SSN=WAGES.SSN AND QUARTERS.YR_QTR=WAGES.YR_QTR

ORDER BY XREF.SSN, QUARTERS.YR_QTR
;


 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,SSN_1,YR_QTR_1,WAGES,empid
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q1,100000000.0,2017Q1,9214.0,100000.0
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q2,100000000.0,2017Q2,8561.0,100000.0
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q3,100000000.0,2017Q3,12550.0,100000.0
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2017Q4,,,,
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q1,,,,
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q2,,,,
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q3,,,,
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2018Q4,,,,
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2019Q1,100000000.0,2019Q1,3804.0,100000.0
100000000,2017-06-01 00:00:00,2017-11-28 00:00:00,2019Q2,100000000.0,2019Q2,17082.0,1000000.0


In [25]:
%%sql
DROP VIEW IF EXISTS dbo.UIFULLV 
/* the drop above is executed when the view needs to be recreated */
;

 * mssql+pyodbc://@TDI
Done.


[]

####  Here, we show a longer code segment that performs the steps  from the two code segments above and converts variable types to facilitate our analysis. The ProgStart and ProgEnd columns are converted from a datetime type to a date type, and because our YR_QTR column is a text column, we also need to create a date version of that column to perform certain types of calculations. The step below takes care of these column type changes, imputes 0 earnings for any quarter in which a person does not have a UI wage record, and then saves a view.

In [26]:
%%sql
-- THE WAGE FILE ALWAYS CONTAINS ONLY THOSE WHO ARE EMPLOYED. 
-- WE NEED TO CREATE A FILE THAT CONTAINS EVERYONE IN OUR CROSS REFERENCE FILE, 
-- INCLUDING INDIVIDUALS WHO ARE NOT IN THE UI WAGE DATA FILE.
-- THE CODE BELOW IMPUTES ZERO EARNINGS IF THE INDIVIDUAL IS NOT ON THE UI WAGE DATA FILE.

CREATE or ALTER VIEW dbo.UIFULLV as 

SELECT  XREF.SSN, -- The XREFERENCE FILE has DISTINCT SSNs for everyone enrolled
        CAST(XREF.ProgStart AS DATE) AS ProgStart, -- converting from datetime to date data type
        CAST (XREF.ProgEnd AS DATE) AS ProgEnd,  -- converting from datetime to date data type
          QUARTERS.YR_QTR,  -- DISTINCT LIST OF ALL THE WAGE QUARTERS WE COLLECTED
          CASE RIGHT(QUARTERS.YR_QTR,1) -- TO CREATE A DATE FIELD VERSION OF THE COLUMN, SELECT JUST THE QUARTER PORTION OF THE ORIGINAL, CHARACTER VERSION.
        -- THEN, WE CONVERT THE QUARTER TO A DATE BY ADDING ON THE FIRST CALENDAR DATE OF THAT QUARTER (Quarter 1 is set to January 1, etc.)
            WHEN '1' THEN CAST((LEFT(QUARTERS.YR_QTR,4)+'-01-01') AS DATE)  -- WHEN THE 1ST QUARTER SET TO JANUARY
            WHEN '2' THEN CAST((LEFT(QUARTERS.YR_QTR,4)+'-04-01') AS DATE)  -- WHEN 2ND QUARTER SET TO APRIL
            WHEN '3' THEN CAST((LEFT(QUARTERS.YR_QTR,4)+'-07-01') AS DATE)  -- WHEN 3RD QUARTER SET TO JULY
            WHEN '4' THEN CAST((LEFT(QUARTERS.YR_QTR,4)+'-10-01') AS DATE)  -- WHEN 4TH QUARTER SET TO OCTOBER
          END AS EarnQTR, -- creating a date data type version from character column:YR_QTR
          COALESCE(WAGES.WAGES,0) AS WAGES, -- if the person is missing wages for this quarter, we want to impute $0.
          WAGES.empid,  -- if the person is missing wages for this quarter, the empid COLUMN will be NULL.
        inXRef, inQTR, inWage

FROM

-- ADDING all RECORDS From the sample definition file. 
 ( SELECT SSN, ProgStart, ProgEnd, inXRef=1
  FROM  dbo.XRef) AS XREF
  
 CROSS JOIN -- CREATING A CARTESIAN OF EVERY SSN AND EVERY QUARTER. 
           -- EVEN SSNS NOT IN THE UI WAGE FILE WILL HAVE 1 RECORD FOR EVERY QUARTER

-- ADDING 1 RECORD FOR EVERY QUARTER IN THE FOLLOW-UP. USING DATES FROM WAGE FILE.
  (SELECT DISTINCT YR_QTR, inQTR=1
  FROM TDI.dbo.UI) AS QUARTERS
  
-- NOW THAT WE HAVE ONE RECORD FOR EACH QUARTER FOR EACH SSN WE MERGE ON THE ADMIN WAGES AND EMPLOYER IDS
 LEFT OUTER JOIN
 (SELECT *, inWage=1 FROM dbo.UI) as WAGES 
  on XREF.SSN=WAGES.SSN AND QUARTERS.YR_QTR=WAGES.YR_QTR

;



 * mssql+pyodbc://@TDI
Done.


[]

#### Print a few records to check creation of our wage view.

In [27]:
%%sql

SELECT TOP 12 *
FROM dbo.UIFULLV
WHERE LEFT(YR_QTR,4) = '2017'
ORDER BY SSN, YR_QTR
/* below display the 4 records for 2017 but every client will have a record for each quarter in the follow-up period, whether or not they had any earnings during the given quarter*/
;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,WAGES,empid,inXRef,inQTR,inWage
100000000,2017-06-01,2017-11-28,2017Q1,2017-01-01,9214,100000.0,1,1,1.0
100000000,2017-06-01,2017-11-28,2017Q2,2017-04-01,8561,100000.0,1,1,1.0
100000000,2017-06-01,2017-11-28,2017Q3,2017-07-01,12550,100000.0,1,1,1.0
100000000,2017-06-01,2017-11-28,2017Q4,2017-10-01,0,,1,1,
100900056,2017-01-01,2017-06-30,2017Q1,2017-01-01,5624,56100900.0,1,1,1.0
100900056,2017-01-01,2017-06-30,2017Q2,2017-04-01,4371,56100900.0,1,1,1.0
100900056,2017-01-01,2017-06-30,2017Q3,2017-07-01,10992,56100900.0,1,1,1.0
100900056,2017-01-01,2017-06-30,2017Q4,2017-10-01,0,,1,1,
101800112,2017-08-01,2018-01-28,2017Q1,2017-01-01,0,,1,1,
101800112,2017-08-01,2018-01-28,2017Q2,2017-04-01,0,,1,1,


#### Check the results of the join:
Make sure: 
1. The number of people stayed the same (1,012) from the sample definition file (Xref).
2. The number of employers stayed the same (1,567) from the wage file (UI).
2. The number of quarters with wages greater than zero stayed the same (8,792) from the wage file.

In [28]:
%%sql
-- CHECKING HOW MANY RECORDS, SSNS, EMPLOYERS AND QUARTERS WE HAVE FROM THIS NEW FULL DATA FILE.
SELECT  COUNT(*) AS NumWageRec, -- count how may records we have now
        COUNT(DISTINCT SSN) AS NumSAMPLE, -- count that we still have the 1,012 clients we started with on the XRef file
        COUNT(DISTINCT empid) AS NumEMPLOYERS, -- count how many employers are on the file
        COUNT(CASE WHEN WAGES >0 THEN 1 END) AS WagesGT0, -- count how many quarters with earnings we have
        COUNT(DISTINCT YR_QTR) AS NumQTR,  -- count how many distinct quarters are on the file
        SUM(inXRef) as NumRecInXREF, -- number of records in XRef
        SUM(inQTR) as NumRecInQTR, -- number of records in Quarters
        SUM(inWage) as NumRecInWage -- number of records in Wage
FROM dbo.UIFULLV


;

 * mssql+pyodbc://@TDI
Done.


NumWageRec,NumSAMPLE,NumEMPLOYERS,WagesGT0,NumQTR,NumRecInXREF,NumRecInQTR,NumRecInWage
17764,1012,1567,8792,17,17764,17764,8792


#### Print one example case of a person who was never employed
This person would be on the sample definition file but not on the wage file.

In [29]:
%%sql
select TOP 1 * FROM dbo.XRef -- LOOKING FOR 1 example SSN IN XREF BUT NOT IN WAGE FILE
WHERE SSN NOT IN
(SELECT SSN FROM dbo.UI)  -- ANY SSN NOT ON UI WAGE FILE
ORDER BY SSN;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd
200000009,2017-08-21 00:00:00,2018-02-17 00:00:00


#### Next print how this case looks on the new VIEW created.

In [30]:
%%sql
SELECT TOP 17 *
FROM dbo.UIFULLV --PRINTING WHAT 1 SSN THAT WAS IN THE XREF but not in wages files to see how it looks on our new view file
WHERE SSN IN
(select SSN
from dbo.UIFULLV
GROUP BY SSN
HAVING SUM(WAGES)= 0) -- SUB-QUERY THAT ONLY SELECTS THOSE WITH TOTAL WAGES OF ZEROS
ORDER BY SSN, YR_QTR
;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,WAGES,empid,inXRef,inQTR,inWage
200000009,2017-08-21,2018-02-17,2017Q1,2017-01-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2017Q2,2017-04-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2017Q3,2017-07-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2017Q4,2017-10-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2018Q1,2018-01-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2018Q2,2018-04-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2018Q3,2018-07-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2018Q4,2018-10-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2019Q1,2019-01-01,0,,1,1,
200000009,2017-08-21,2018-02-17,2019Q2,2019-04-01,0,,1,1,


#### Now check that all the cases not on the original UI wage data file have 0 wages on the new view file.

In [31]:
%%sql
select SUM(WAGES) as "Total Wages for Those Not On UI Wage file"
FROM dbo.UIFULLV 
WHERE SSN in
(select SSN FROM dbo.XRef /* SELECT SSN on XREF file that are not on UI wage file*/
EXCEPT
select SSN FROM dbo.UI)
;

 * mssql+pyodbc://@TDI
Done.


Total Wages for Those Not On UI Wage file
0


#### Print 2 example cases of people who worked for more than one employer in the same quarter

In [32]:
%%sql
SELECT TOP 10 * FROM dbo.UIFULLV 
WHERE LEFT(YR_QTR,4) = '2017' AND SSN IN
(select SSN
from dbo.UI
WHERE LEFT(YR_QTR,4) = '2017'
GROUP BY SSN, YR_QTR -- find cases with multiple records in the same quarter in 2017
HAVING COUNT(*) >1)
ORDER BY SSN, YR_QTR
;

 * mssql+pyodbc://@TDI
Done.


SSN,ProgStart,ProgEnd,YR_QTR,EarnQTR,WAGES,empid,inXRef,inQTR,inWage
107200448,2017-01-06,2017-07-05,2017Q1,2017-01-01,11349,481072004.0,1,1,1.0
107200448,2017-01-06,2017-07-05,2017Q1,2017-01-01,13894,448107200.0,1,1,1.0
107200448,2017-01-06,2017-07-05,2017Q2,2017-04-01,0,,1,1,
107200448,2017-01-06,2017-07-05,2017Q3,2017-07-01,8574,448107200.0,1,1,1.0
107200448,2017-01-06,2017-07-05,2017Q4,2017-10-01,0,,1,1,
109000560,2017-06-07,2017-12-04,2017Q1,2017-01-01,0,,1,1,
109000560,2017-06-07,2017-12-04,2017Q2,2017-04-01,1121,560109000.0,1,1,1.0
109000560,2017-06-07,2017-12-04,2017Q3,2017-07-01,11331,560109000.0,1,1,1.0
109000560,2017-06-07,2017-12-04,2017Q3,2017-07-01,5070,601090005.0,1,1,1.0
109000560,2017-06-07,2017-12-04,2017Q4,2017-10-01,0,,1,1,


#### Those with more than 1 job in a quarter should have more than 17 records on the new view file. Note that in the example above, SSN 107200448 has two wage records in 2017Q1, and SSN 109050560 has two wage records in 2017Q3. The following output identifies all individuals who have more than one wage record in at least one quarter.

In [33]:
%%sql
SELECT distinct SSN, COUNT(*) as "Number of Records for those who work more than one job in a quarter"
FROM dbo.UIFULLV
WHERE SSN IN
    (SELECT distinct SSN -- identify SSN with more than 1 employer in a quarter
    FROM dbo.UI
    GROUP BY SSN, YR_QTR
    HAVING COUNT(*) > 1)
GROUP BY SSN
order by 1;

 * mssql+pyodbc://@TDI
Done.


SSN,Number of Records for those who work more than one job in a quarter
100000000,18
101800112,18
105400336,18
107200448,19
109000560,18
110800672,19
111700728,18
112600784,18
115300952,18
116201008,18
