# H-2A Experience Requirements Analysis

This notebook analyzes data on certified H-2A applications, and calculates the proportion that require candidates to have prior work experience. The data come from two main sources: 

- The Office of Foreign Labor Certification's [FY 2015 disclosure data for the H-2A program](http://www.foreignlaborcert.doleta.gov/performancedata.cfm). (You can download the specific Excel spreadsheet [here](http://www.foreignlaborcert.doleta.gov/docs/py2015q4/H-2A_Disclosure_Data_FY15_Q4.xlsx).)


- The OFLC's [Labor Certification Registry](https://icert.doleta.gov/index.cfm?event=ehLCJRExternal.dspAdvCertSearch), which lets us compare the 2015 findings to (less detailed) data from previous years. You can find the script used to download the case-counts [here](../scripts/get-lcr-experience-data.py).

## Methodology

The methodology is relatively straightforward. The main steps:

- Load the FY 2015 visa-decision data.


- Among the applications that OFLC "certified" for at least one H-2A position, calculate the proportion for which `EMP_EXPERIENCE_REQD` equals "Y" — as opposed to "N" or the (rare) blank.


- Repeat this calculation for each `WORKSITE_STATE`.


- Load the data collected from the Labor Certification Registry, which has reliable (but less-detailed) data for 2015 and 2014, and some data from 2013 of unknown completeness (according to the OFLC). Repeat the calculations above, primarily to ensure that the 2015 data is not an anomaly.

## Load FY 2015 disclosure data

... and convert the "Y"/"N" experience-required column to `True`/`False` values.

In [1]:
import pandas as pd

In [2]:
decisions_fy2015 = pd.read_excel("../data/H-2A_Disclosure_Data_FY15_Q4.xlsx")

In [3]:
decisions_fy2015["EMP_EXPERIENCE_REQD"].fillna("[blank]").value_counts()

Y          8003
N          2334
[blank]       2
Name: EMP_EXPERIENCE_REQD, dtype: int64

In [4]:
decisions_fy2015["EMP_EXPERIENCE_REQD_TRUE"] = (decisions_fy2015["EMP_EXPERIENCE_REQD"] == "Y")

## Calculate prevalence of experience requirements

Employers can apply for H-2A visas on their own, or through an umbrella association's "[master application](http://www.foreignlaborcert.doleta.gov/h_2a_details.cfm)." All of a master application's sub-applications are included in the OFLC data, and receive the same `CASE_NUMBER` as the master application.

Below, we calculate the proportion of employers that require prior experience two ways: counting each sub-application separately, and aggregating first by `CASE_NUMBER`.

In [5]:
fy2015_cert_rate = decisions_fy2015[
    decisions_fy2015["NBR_WORKERS_CERTIFIED"] > 0
]["EMP_EXPERIENCE_REQD_TRUE"].mean()
fy2015_cert_rate

0.77924433249370273

In [6]:
def get_case_exp_req_pct(subset):
    case_means = subset.groupby("CASE_NUMBER")["EMP_EXPERIENCE_REQD_TRUE"].mean()
    return case_means.mean()

In [7]:
fy2015_case_rate = get_case_exp_req_pct(decisions_fy2015[
    decisions_fy2015["NBR_WORKERS_CERTIFIED"] > 0
])
fy2015_case_rate

0.75952182374200727

In [8]:
print("Depending on whether you first aggregate by case, approximately"
      " {0:.0f}% or {1:.0f}% of H-2A applications"
      " certified in FY 2015 required prior work experience."
      .format(fy2015_cert_rate * 100, fy2015_case_rate * 100))

Depending on whether you first aggregate by case, approximately 78% or 76% of H-2A applications certified in FY 2015 required prior work experience.


## Calculate prevalence of experience requirements *by state*

Here, we aggregate certifications by the `WORKSITE_STATE` field.

In [9]:
grp_state = decisions_fy2015[
    decisions_fy2015["NBR_WORKERS_CERTIFIED"] > 0
].groupby("WORKSITE_STATE")

by_state = pd.DataFrame({
    "cases_certified": grp_state["CASE_NUMBER"].nunique(),
    "applications_certified": grp_state.size(),
    "n_workers_certified": grp_state["NBR_WORKERS_CERTIFIED"].sum(),
    "pct_cases": grp_state.apply(get_case_exp_req_pct).round(3) * 100,
    "pct_applications": grp_state["EMP_EXPERIENCE_REQD_TRUE"].mean().round(3) * 100
})

# Show only states with at least 100 certified cases
by_state[by_state["cases_certified"] >= 100]\
    .sort_values("pct_cases", ascending=False)

Unnamed: 0_level_0,applications_certified,cases_certified,n_workers_certified,pct_applications,pct_cases
WORKSITE_STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ID,414,414,2356,100.0,100.0
MT,238,238,614,99.2,99.2
ND,395,395,1340,98.7,98.7
MA,118,118,459,98.3,98.3
NY,340,331,5104,97.9,98.2
WA,337,114,19408,98.2,96.5
NC,1654,220,29350,99.5,96.4
UT,136,136,790,96.3,96.3
SD,156,156,751,96.2,96.2
KS,115,115,773,90.4,90.4


## Repeat the calculations with Labor Certification Registry data

This data is less detailed overall, and possibly incomplete for FY 2013. Primarily, they're useful as a check against the possibility that the FY 2015 represents an anomaly.

Note: The FY 2015 data above won't match up perfectly with the FY 2015 data below because the time-frames are capturing different events. Above, they indicate the fiscal year the application was certified. Below, they indicate the start of the period for which the employer is certified.

In the data below, `total` refers to the total number of H-2 certified "cases", while `req_experience` refers to the number of those cases that require prior experience.

In [10]:
lcr_overall = pd.read_csv("../data/H-2A-experience-requirements.csv")
lcr_by_state = pd.read_csv("../data/H-2A-experience-requirements-by-state.csv")

Overall, the ~75% prevalence of experience requirements goes back at least a few years:

In [11]:
lcr_overall.set_index("year")

Unnamed: 0_level_0,req_experience,total,prop_req_experience
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013,4482,5914,0.758
2014,4956,6675,0.742
2015,5496,7248,0.758


Top state-years:

In [12]:
lcr_by_state[
    (lcr_by_state["total"] >= 50) &
    (lcr_by_state["prop_req_experience"] >= 0.9)
].set_index([ "state_name", "year" ])\
    .sort_values("prop_req_experience", ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,req_experience,total,prop_req_experience
state_name,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
WYOMING,2015,99,99,1.0
WYOMING,2013,83,83,1.0
CONNECTICUT,2015,53,53,1.0
IDAHO,2014,413,413,1.0
IDAHO,2015,419,419,1.0
NEVADA,2015,67,67,1.0
IDAHO,2013,402,405,0.993
MONTANA,2015,241,243,0.992
MONTANA,2013,197,199,0.99
WYOMING,2014,89,90,0.989


---

---

---