# Basic Monthly Current Population Survey (CPS) Data Analysis, Changes in Income and Employment Based on Race
- Ursaminor Jupyter Notebook
- 20 November 2020
- By Barnett Yang

## Table of Contents

## Goals
- Analyze raw CPS data and partition data according to race.
- Track changes in income and employment based on race across the COVID-19 Pandemic (time period of interest if January to September, 2020).
- Form conclusions regarding the impacts of the economic fallout resultant from the COVID-19 pandemic on academic performance based on race.

## Miscellaneous Notes and Libraries
- CPS datasets can be found on this website under the CSV tab: https://data.nber.org/data/cps-basic2/.
- The data dictionary for the relevant datasets can be found here: https://www2.census.gov/programs-surveys/cps/datasets/2020/basic/2020_Basic_CPS_Public_Use_Record_Layout_plus_IO_Code_list.txt.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Indicators and Relevant Variables

### Demographic Variables
- ptdtrace: Race
    - For simplicity, and to be consistent with the analysis done for the Household Pulse Survey and High School Longitudinal Study, we will not consider those of mixed races and will only do so if time permits.
    - 01: White only.
    - 02: Black only.
    - 03: American Indian, Alaskan Native only.
    - 04: Asian only.
    - 05: Hawaiian/Pacific Islander only.
- pehspnon: Hispanic or non-Hispanic
    - For simplicity, and to be consistent with the analysis done for the Household Pulse Survey and High School Longitudinal Study, we will not consider those of mixed races and will only do so if time permits.
    - 1: Hispanic.
    - 2: Non-Hispanic.

### Income and Employment Variables
- hefaminc: Family Income, in dollars (Combined income of all family members during the last 12 months. Includes money from jobs, net income from business, farm or rent, pensions, dividents, interest, Social Security payments and any other money income received by family members who are 15 years of age or older)
    - 1: less than 5000
    - 2: 5000 to 7499
    - 3: 7500 to 9999
    - 4: 10000 to 12499
    - 5: 12500 to 14999
    - 6: 15000 to 19999
    - 7: 20000 to 24999
    - 8: 25000 to 29999
    - 9: 30000 to 34999
    - 10: 35000 to 39999
    - 11: 40000 to 49999
    - 12: 50000 to 59999
    - 13: 60000 to 74999
    - 14: 75000 to 99999
    - 15: 100000 to 149999
    - 16: 150000 or more
- puwk: Last week, did you do any work for (either) pay (or profit)?
    - 1: Yes
    - 2: No
    - 3: Retired
    - 4: Disabled
    - 5: Unable to work
    - Do not consider 3, 4, or 5 since those are outside of the labor force.
    - Do not consider negative values, since those likely indicate non-respondants
- puabsot: Last week, did you have a job either full or part-time?
    - 1: Yes
    - 2: No
    - 3: Retired
    - 4: Disabled
    - 5: Unable to work
    - Do not consider 3, 4, or 5 since those are outside of the labor force.
    - Do not consider negative values, since those likely indicate non-respondants
    - May not be a good indicator. ~90% of values are negative
- pulay: Last week, were you on layoff from a job?
    - 1: Yes
    - 2: No
    - 3: Retired
    - 4: Disabled
    - 5: Unable to work
    - Do not consider 3, 4, or 5 since those are outside of the labor force.
    - Do not consider negative values, since those likely indicate non-respondants
    - May not be a good indicator. ~90% of values are negative
- Can consider other factors later if time permits.

## Upload Data and Constants

In [20]:
# Demographic variables
dem_vars = ['ptdtrace', 'pehspnon']
# Income and Employment variables
eco_vars = ['hefaminc', 'puwk', 'puabsot', 'pulay']

all_vars = dem_vars + eco_vars

jan = pd.read_csv('../data/Current_Population_Survey/cpsb202001.csv', usecols=all_vars)
feb = pd.read_csv('../data/Current_Population_Survey/cpsb202002.csv', usecols=all_vars)
mar = pd.read_csv('../data/Current_Population_Survey/cpsb202003.csv', usecols=all_vars)
apr = pd.read_csv('../data/Current_Population_Survey/cpsb202004.csv', usecols=all_vars)
may = pd.read_csv('../data/Current_Population_Survey/cpsb202005.csv', usecols=all_vars)
jun = pd.read_csv('../data/Current_Population_Survey/cpsb202006.csv', usecols=all_vars)
jul = pd.read_csv('../data/Current_Population_Survey/cpsb202007.csv', usecols=all_vars)
aug = pd.read_csv('../data/Current_Population_Survey/cpsb202008.csv', usecols=all_vars)
sep = pd.read_csv('../data/Current_Population_Survey/cpsb202009.csv', usecols=all_vars)

## Changes in Family Income Based on Race

### Hispanic