In [1]:
##Import pandas
import pandas as pd

In [2]:
##Basic info for BLS data:
##I have four seperate excel documents from the BLS that I combined into one dataset in Excel.
##I tried cleaning and combining the data in pandas but could not...

##The goal was to make an index that compares the LFPR recovery of people with and without disabilites.

##The files all measure labor force participation rate (LFPR)
###FILE 1 - LFPR for men without a disability
###FILE 2 - LFPR for women without a disability
###FILE 3 - LFPR for men with a disability
###FILE 4 - LFPR for women with a disability

##I got the data here: https://www.bls.gov/webapps/legacy/cpsatab6.htm

In [59]:
##Cleaning the data - Step 1:

##Created 7 columns:
###One tracks time in the form of year and month from 01/2020 to present.
###Four track LFPR from each file.
###Two are for the calculation of totals for LFPR for people with disabilities and without disabilities.

##Transposed the data in Excel to make LFPR verticle, combining LFPR past 01/2020 into one column for each file
##Calculated the average across each group for people with and without disabilities.

xls = pd.ExcelFile('data/BLS DATA.xlsx')
df_all_LFPR = pd.read_excel(xls, 'LFPR_s1')
df_all_LFPR

Unnamed: 0,Month,Men with Disability,Women with Disability,People with Disability,Men without Disability,Women without Disability,People without Disability
0,2020-01-01,35.6,31.5,33.55,82.7,72.2,77.45
1,2020-02-01,35.9,31.7,33.8,83.0,72.5,77.75
2,2020-03-01,36.5,33.3,34.9,82.4,71.6,77.0
3,2020-04-01,35.4,30.0,32.7,79.3,68.2,73.75
4,2020-05-01,36.8,31.5,34.15,80.5,69.2,74.85
5,2020-06-01,36.4,32.3,34.35,81.9,70.6,76.25
6,2020-07-01,35.5,30.4,32.95,82.1,70.8,76.45
7,2020-08-01,36.1,31.1,33.6,81.9,70.4,76.15
8,2020-09-01,34.4,31.0,32.7,81.5,70.1,75.8
9,2020-10-01,35.3,31.3,33.3,81.9,70.7,76.3


In [31]:
##Cleaning the data - Step 2:

##Elimanted columns for each gender. We're just focusing on the totals.
##Deleted row for 01/2020 because we're just looking at what happened after pandemic began.

df_total_LFPR = pd.read_excel(xls, 'LFPR_s2')
df_total_LFPR

Unnamed: 0,Month,People with a Disability,People without a Disability
0,2020-01-01,33.55,77.45
1,2020-02-01,33.8,77.75
2,2020-03-01,34.9,77.0
3,2020-04-01,32.7,73.75
4,2020-05-01,34.15,74.85
5,2020-06-01,34.35,76.25
6,2020-07-01,32.95,76.45
7,2020-08-01,33.6,76.15
8,2020-09-01,32.7,75.8
9,2020-10-01,33.3,76.3


In [32]:
##Creating the index - Step 1:

##Divided the values for each column (except month) by their initial starting value. 
##This gives us an indexed ratio, but it's incomplete.

df_ratio_LFPR = pd.read_excel(xls, 'LFPR_s3')
df_ratio_LFPR

Unnamed: 0,Month,People with a Disability,People without a Disability
0,2020-02-01,1.0,1.0
1,2020-03-01,1.032544,0.990354
2,2020-04-01,0.967456,0.948553
3,2020-05-01,1.010355,0.962701
4,2020-06-01,1.016272,0.980707
5,2020-07-01,0.974852,0.98328
6,2020-08-01,0.994083,0.979421
7,2020-09-01,0.967456,0.97492
8,2020-10-01,0.985207,0.98135
9,2020-11-01,0.992604,0.976849


In [33]:
##Creating the index - Step 2:

##Multiplied the values for each column (except month) by 100. 
##This yields a proper index where 100 or 100% = the value at the start of the pandemic.

df_index_LFPR = pd.read_excel(xls, 'LFPR_s4')
df_index_LFPR

Unnamed: 0,Month,People with a Disability,People without a Disability
0,2020-02-01,100.0,100.0
1,2020-03-01,103.254438,99.03537
2,2020-04-01,96.745562,94.855305
3,2020-05-01,101.035503,96.270096
4,2020-06-01,101.627219,98.07074
5,2020-07-01,97.485207,98.327974
6,2020-08-01,99.408284,97.942122
7,2020-09-01,96.745562,97.491961
8,2020-10-01,98.52071,98.135048
9,2020-11-01,99.260355,97.684887


In [74]:
##Time to analyze the data...

In [38]:
##People with a disability

##Notes:
###The data shows that people with disabilities have a LFPR that is 11.6% higher than when the pandemic began
###The data shows it dipped to 96.75% of what it was when the pandemic began, which isn't very much.
###Standard deviation is comparatively high at 4.9%.

df_index_LFPR["People with a Disability"].describe()

count     26.000000
mean     102.708239
std        4.970197
min       96.745562
25%       98.742604
50%      101.183432
75%      107.137574
max      111.686391
Name: People with a Disability, dtype: float64

In [40]:
##People without a disabillity

##Notes:
###The maximum value is 100, indicating that LFPR has not fully recovered.
###The min is 94.85%, meaning that LFPR took a bigger hit among people without a disability.
###Standard deviation is much lower comparatively at just 1%.
###The mean is below the initial index value for people without a disability, unlike the other.

df_index_LFPR["People without a Disability"].describe()

count     26.000000
mean      98.231511
std        1.069034
min       94.855305
25%       97.893891
50%       98.295820
75%       98.826367
max      100.000000
Name: People without a Disability, dtype: float64

In [None]:
##Analysis:
###People with diabilities have had a better recovery from the pandemic in terms of their LFPR.
###Is that because their initial LFPR was already so low?
###Their population is also smaller ... but by how much?
###The could exlpain the volatility in the data.
###The LFPR for people without a disability has not fully recovered yet.


In [60]:
##What if we shifted the index point forward by a month for people with disabilities?
##We might want to do this because it seems more like the peak.

df_index_march_LFPR_disabled = pd.read_excel(xls, 'LFPR_s5')
df_index_march_LFPR_disabled

Unnamed: 0,Month,People with a Disability
0,2020-02-01,96.848138
1,2020-03-01,100.0
2,2020-04-01,93.696275
3,2020-05-01,97.851003
4,2020-06-01,98.424069
5,2020-07-01,94.412607
6,2020-08-01,96.275072
7,2020-09-01,93.696275
8,2020-10-01,95.415473
9,2020-11-01,96.131805


In [67]:
##Let's compare the two different indexes for people with disabilities:

##Remember: This one is indexed at March of 2020
df_index_march_LFPR_disabled["People with a Disability"].describe()

count     26.000000
mean      99.471016
std        4.813544
min       93.696275
25%       95.630372
50%       97.994269
75%      103.760745
max      108.166189
Name: People with a Disability, dtype: float64

In [68]:
##Remember: This one is indexed at February of 2020
df_index_LFPR["People with a Disability"].describe()

count     26.000000
mean     102.708239
std        4.970197
min       96.745562
25%       98.742604
50%      101.183432
75%      107.137574
max      111.686391
Name: People with a Disability, dtype: float64

In [75]:
##Analysis:
###It doesn't change the data too much, but in some important ways:
####The minimum value is significantly lower, suggesting more of a drop.
####Moving the index position to March decreases to percentage gain from 11.7% to 8.2%.
####The mean for the data indexed in March is more similar to the data for people without a disability in terms of the mean.

In [87]:
##We will now shift our focus to the data from Indeed's hiring lab...

##Basic info for Indeed data:
###This data measures the percentage of jobs open to candidates with criminal records as a percentage of all jobs hosted on Indeed.
###I did not need to clean the data.

xls2 = pd.ExcelFile('data/Fair_chance_Indeed_data.xlsx')
df_indeed_num = pd.read_excel(xls2, '% of all job postings')
df_indeed_num

Unnamed: 0,Date,Job postings on Indeed advertising fair chance hiring as a percent of all job postings on Indeed.
0,2019-01-01,1.764743
1,2019-02-01,1.956251
2,2019-03-01,1.889566
3,2019-04-01,1.847995
4,2019-05-01,1.910809
5,2019-06-01,2.004699
6,2019-07-01,2.081449
7,2019-08-01,2.120451
8,2019-09-01,2.310358
9,2019-10-01,2.430686


In [94]:
##Looking for a general description of the data.
df_indeed_num.describe()

Unnamed: 0,Job postings on Indeed advertising fair chance hiring as a percent of all job postings on Indeed.
count,39.0
mean,2.366711
std,0.276337
min,1.764743
25%,2.217986
50%,2.405907
75%,2.547615
max,2.826617


In [95]:
##Analysis:
###The max value for the data is very recent.
###This supports the idea of the thesis behind my story.
###More businesses are opening the door to people with criminal records.
###The minimum value is also from early in 2019, which supports my trend
###I'm not sure exactly why, but the standard deviation seems low to me compared to the mean.

In [96]:
##Trust, but verify:
###I was given some statistics in terms of growth from Indeed, but it doesn't hurt to make sure they're accurate.
df_indeed_growth = pd.read_excel(xls2, 'Growth by year')
df_indeed_growth

Unnamed: 0,Date,Job postings on Indeed advertising fair chance hiring: Growth since March 2022
0,2021-03-01,0.048
1,2020-03-01,0.05
2,2019-03-01,0.337


In [99]:
##Checking for 2019:
a=2.525643
b=1.889566
c=(a/b)
print(c)
##It checks out

1.3366259765469954


In [101]:
##Checking for 2020:
a=2.525643
b=2.405907
c=(a/b)
print(c)
##It checks out

1.0497675097167098


In [102]:
##Checking for 2020:
a=2.525643
b=2.410303
c=(a/b)
print(c)
##It checks out

1.0478529048007659


In [None]:
##This concludes my analysis.