# **Mapping AI Exposure to Employment (2014–2024): A Data-Driven Study of U.S. Occupations**
*(Final Submission Introduction)*

## **1. Introduction**

Generative artificial intelligence has rapidly become embedded in modern knowledge work. Large language models now assist with writing, problem-solving, customer service, programming, design, and many other tasks traditionally performed by white-collar professionals. As these tools expand in capability and adoption, important questions arise about how AI may be reshaping employment—especially for recent graduates entering degree-intensive occupations.

This project investigates how **AI exposure**—measured using GPT-based occupational exposure scores—is associated with changes in U.S. labor-market outcomes from **2014 to 2024**. To explore this, we integrate several major datasets:

- **O*NET SOC codes** for standardized occupation mapping  
- **AI exposure scores** from the GPTs-are-GPTs study  
- **OEWS** employment counts and wages  
- **JOLTS** job openings, hires, and separations  
- **CPS** unemployment rates by educational attainment, with a focus on bachelor’s-degree and higher workers  

By merging these datasets through SOC codes and calendar years, we create a unified panel that allows us to compare how **high-, medium-, and low-exposure occupations** have changed over time. This enables us to test whether AI-exposed occupations experienced different employment growth, unemployment rates, or job-opening patterns compared to less-exposed fields.

Our goal is not to claim that AI alone drives labor-market change, but to measure whether AI exposure correlates with observable differences in employment trends—especially in occupations relevant to university graduates.

---

## **2. Project Workflow**

To maintain a clear and reproducible structure, the analysis follows the steps below.

### **Step 1 — Load all datasets**
- Import AI exposure scores  
- Load OEWS employment data (2014–2024)  
- Load JOLTS job-openings, hires, and separations  
- Load CPS unemployment data for bachelor’s-degree-and-higher workers  
- Merge everything using SOC codes and year identifiers  

### **Step 2 — Clean and standardize the data**
- Normalize SOC formats  
- Extract relevant variables  
- Fix multi-row CPS headers  
- Handle missing and inconsistent values  
- Convert JOLTS monthly data into annual averages  

### **Step 3 — Assign AI-exposure categories**
- Use exposure scores to divide occupations into **High**, **Medium**, and **Low** AI-exposure groups  
- Verify balanced representation across occupation families  

### **Step 4 — Analyze employment outcomes**
- Compare employment growth across exposure groups  
- Track unemployment trends for university-educated workers  
- Examine job-opening and mobility changes using JOLTS  
- Generate comparisons of high- vs. low-exposure trajectories  

### **Step 5 — Visualize the patterns**
- Line charts for employment change  
- Scatterplots for correlation strength  
- Heatmaps showing exposure vs. unemployment  
- Grouped bar charts for job-opening changes  

### **Step 6 — Interpret the findings**
- Identify whether highly exposed occupations show:
  - slower or negative growth,  
  - higher unemployment, or  
  - different job-demand patterns  
- Evaluate whether AI exposure is a meaningful predictor of labor-market shifts  

### **Step 7 — Conclusions**
- Summarize how AI exposure correlates with employment outcomes  
- Discuss implications for students, graduates, educators, and policymakers  

---



In [1]:
# === 0. Imports and basic setup ===

import pandas as pd
import numpy as np
import zipfile
from pathlib import Path

# Google Colab: mount Google Drive so we can read the /data folder
from google.colab import drive
drive.mount('/content/drive')

# Base directory where the project data folder is stored in Drive
BASE_DIR = Path('/content/drive/MyDrive/2704-Final-Project-main/data')

print("Base directory:", BASE_DIR)
print("Contents:", list(BASE_DIR.iterdir()))


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Base directory: /content/drive/MyDrive/2704-Final-Project-main/data
Contents: [PosixPath('/content/drive/MyDrive/2704-Final-Project-main/data/occ_level.csv'), PosixPath('/content/drive/MyDrive/2704-Final-Project-main/data/bls-oews-national'), PosixPath('/content/drive/MyDrive/2704-Final-Project-main/data/jolts'), PosixPath('/content/drive/MyDrive/2704-Final-Project-main/data/CPS DATA')]


In [2]:
# === 1. Load AI exposure dataset (GPTs-are-GPTs occ_level.csv) ===

occ_level_path = BASE_DIR / 'occ_level.csv'
ai_exposure_df = pd.read_csv(occ_level_path)

print("AI exposure data loaded. Shape:", ai_exposure_df.shape)
display(ai_exposure_df.head())


AI exposure data loaded. Shape: (923, 8)


Unnamed: 0,O*NET-SOC Code,Title,dv_rating_alpha,dv_rating_beta,dv_rating_gamma,human_rating_alpha,human_rating_beta,human_rating_gamma
0,11-1011.00,Chief Executives,0.1,0.46,0.82,0.18,0.35,0.52
1,11-1011.03,Chief Sustainability Officers,0.166667,0.555556,0.944444,0.055556,0.388889,0.722222
2,11-1021.00,General and Operations Managers,0.0,0.480769,0.961538,0.115385,0.384615,0.653846
3,11-1031.00,Legislators,0.033333,0.4,0.766667,0.266667,0.516667,0.766667
4,11-2011.00,Advertising and Promotions Managers,0.0,0.476744,0.953488,0.255814,0.546512,0.837209


In [3]:
# === 2. Load OEWS national employment data (2014–2024) ===

oews_folder = BASE_DIR / 'bls-oews-national'
oews_years = range(2014, 2025)      # 2014–2024 inclusive

oews_frames = []

for year in oews_years:
    yy = str(year)[-2:]             # '14', '15', ..., '24'
    zip_path = oews_folder / f'oesm{yy}nat.zip'
    print(f"Loading OEWS for {year} from {zip_path.name} ...")

    with zipfile.ZipFile(zip_path, 'r') as z:
        # Find the main national Excel file inside the ZIP (e.g., national_M2014_dl.xlsx)
        candidates = [
            name for name in z.namelist()
            if name.lower().endswith('.xlsx') and 'national_' in name.lower()
        ]
        if not candidates:
            raise ValueError(f"No national Excel file found in {zip_path}.")

        inner_name = candidates[0]
        print("  -> Found file inside zip:", inner_name)

        with z.open(inner_name) as f:
            df_year = pd.read_excel(f)

    df_year["year"] = year
    oews_frames.append(df_year)

oews_df = pd.concat(oews_frames, ignore_index=True)
print("Combined OEWS shape:", oews_df.shape)
display(oews_df.head())


Loading OEWS for 2014 from oesm14nat.zip ...
  -> Found file inside zip: oesm14nat/national_M2014_dl.xlsx
Loading OEWS for 2015 from oesm15nat.zip ...
  -> Found file inside zip: oesm15nat/national_M2015_dl.xlsx
Loading OEWS for 2016 from oesm16nat.zip ...
  -> Found file inside zip: oesm16nat/national_M2016_dl.xlsx
Loading OEWS for 2017 from oesm17nat.zip ...
  -> Found file inside zip: oesm17nat/national_M2017_dl.xlsx
Loading OEWS for 2018 from oesm18nat.zip ...
  -> Found file inside zip: oesm18nat/national_M2018_dl.xlsx
Loading OEWS for 2019 from oesm19nat.zip ...
  -> Found file inside zip: oesm19nat/national_M2019_dl.xlsx
Loading OEWS for 2020 from oesm20nat.zip ...
  -> Found file inside zip: oesm20nat/national_M2020_dl.xlsx
Loading OEWS for 2021 from oesm21nat.zip ...
  -> Found file inside zip: oesm21nat/national_M2021_dl.xlsx
Loading OEWS for 2022 from oesm22nat.zip ...
  -> Found file inside zip: oesm22nat/national_M2022_dl.xlsx
Loading OEWS for 2023 from oesm23nat.zip ...
 

Unnamed: 0,OCC_CODE,OCC_TITLE,OCC_GROUP,TOT_EMP,EMP_PRSE,H_MEAN,A_MEAN,MEAN_PRSE,H_PCT10,H_PCT25,...,PRIM_STATE,NAICS,NAICS_TITLE,I_GROUP,OWN_CODE,O_GROUP,JOBS_1000,LOC_QUOTIENT,PCT_TOTAL,PCT_RPT
0,00-0000,All Occupations,total,135128260,0.1,22.71,47230,0.1,8.82,11.04,...,,,,,,,,,,
1,11-0000,Management Occupations,major,6741640,0.2,54.08,112490,0.1,22.33,32.25,...,,,,,,,,,,
2,11-1000,Top Executives,minor,2351130,0.2,58.68,122060,0.2,20.94,31.86,...,,,,,,,,,,
3,11-1010,Chief Executives,broad,246240,0.8,86.88,180700,0.4,34.97,53.25,...,,,,,,,,,,
4,11-1011,Chief Executives,detailed,246240,0.8,86.88,180700,0.4,34.97,53.25,...,,,,,,,,,,


In [4]:
# === 3. Load JOLTS data (job openings, hires, separations) ===

jolts_folder = BASE_DIR / 'jolts'

# Seasonally adjusted series (main series used in analysis)
jolts_openings_sa_path    = jolts_folder / 'jolts-openings-seasonally-adjusted.xlsx'
jolts_hires_sa_path       = jolts_folder / 'jolts-hires-seasonally-adjusted.xlsx'
jolts_separations_sa_path = jolts_folder / 'jolts-totseparations-seasonally-adjusted.xlsx'

jolts_openings_sa_df    = pd.read_excel(jolts_openings_sa_path)
jolts_hires_sa_df       = pd.read_excel(jolts_hires_sa_path)
jolts_separations_sa_df = pd.read_excel(jolts_separations_sa_path)

print("JOLTS openings (SA) shape:", jolts_openings_sa_df.shape)
print("JOLTS hires (SA) shape:", jolts_hires_sa_df.shape)
print("JOLTS separations (SA) shape:", jolts_separations_sa_df.shape)

# Optional: not seasonally adjusted versions (kept for reference)
jolts_openings_nsa_path    = jolts_folder / 'jolts-openings-not-seasonally-adjusted.xlsx'
jolts_hires_nsa_path       = jolts_folder / 'jolts-hires-not-seasonally-adjusted.xlsx'
jolts_separations_nsa_path = jolts_folder / 'jolts-totseparations-not-seasonally-adjusted.xlsx'

try:
    jolts_openings_nsa_df    = pd.read_excel(jolts_openings_nsa_path)
    jolts_hires_nsa_df       = pd.read_excel(jolts_hires_nsa_path)
    jolts_separations_nsa_df = pd.read_excel(jolts_separations_nsa_path)

    print("JOLTS openings (NSA) shape:", jolts_openings_nsa_df.shape)
    print("JOLTS hires (NSA) shape:", jolts_hires_nsa_df.shape)
    print("JOLTS separations (NSA) shape:", jolts_separations_nsa_df.shape)
except FileNotFoundError:
    print("Some NSA JOLTS files were not found. Proceeding with SA series only.")


  warn("Workbook contains no default style, apply openpyxl's default")
  warn("Workbook contains no default style, apply openpyxl's default")
  warn("Workbook contains no default style, apply openpyxl's default")


JOLTS openings (SA) shape: (24, 13)
JOLTS hires (SA) shape: (24, 13)
JOLTS separations (SA) shape: (24, 13)


  warn("Workbook contains no default style, apply openpyxl's default")
  warn("Workbook contains no default style, apply openpyxl's default")


JOLTS openings (NSA) shape: (24, 13)
JOLTS hires (NSA) shape: (24, 13)
JOLTS separations (NSA) shape: (24, 13)


  warn("Workbook contains no default style, apply openpyxl's default")


In [5]:
# === 4. Load CPS unemployment by education (2014–2023) ===

# Folder that holds the CPS HTML files
cps_folder = BASE_DIR / 'CPS DATA'

# Years we downloaded (2014–2023)
cps_years = range(2014, 2024)

cps_frames = []

for year in cps_years:
    html_file = cps_folder / f'CPS {year}.html'

    try:
        # Read all tables on the page
        tables = pd.read_html(html_file)
    except (FileNotFoundError, ValueError) as e:
        # File missing or no readable tables
        print(f"Skipping {year}: could not read {html_file} ({e})")
        continue

    if not tables:
        print(f"Skipping {year}: no tables found in {html_file}")
        continue

    # First table is the education / sex / race breakdown
    cps_year = tables[0].copy()
    cps_year["year"] = year

    cps_frames.append(cps_year)
    print(f"Loaded CPS table for {year}: shape {cps_year.shape}")

if cps_frames:
    cps_df = pd.concat(cps_frames, ignore_index=True)
    print("Combined CPS data shape:", cps_df.shape)
    display(cps_df.head())
else:
    print("No CPS tables loaded. Check CPS file names and paths.")


Loaded CPS table for 2014: shape (64, 14)
Loaded CPS table for 2015: shape (58, 10)
Loaded CPS table for 2016: shape (58, 10)
Loaded CPS table for 2017: shape (58, 10)
Loaded CPS table for 2018: shape (58, 10)
Loaded CPS table for 2019: shape (58, 10)
Loaded CPS table for 2020: shape (58, 10)
Loaded CPS table for 2021: shape (58, 10)
Loaded CPS table for 2022: shape (58, 10)
Loaded CPS table for 2023: shape (58, 10)
Combined CPS data shape: (586, 86)


Unnamed: 0_level_0,"Employment status, sex, race, and Hispanic or Latino ethnicity",Less than a high school diploma,Less than a high school diploma,"High school graduates, no college(1)","High school graduates, no college(1)",Some college or associate degree,Some college or associate degree,Some college or associate degree,Some college or associate degree,Some college or associate degree,...,2022,2022,2023,2023,2023,2023,2023,2023,2023,2023
Unnamed: 0_level_1,"Employment status, sex, race, and Hispanic or Latino ethnicity",2013,2014,2013,2014,Total,Total,"Some college, no degree","Some college, no degree",Associate degree,...,Bachelor's degree and higher,Bachelor's degree and higher,Less than a high school diploma,"High school graduates, no college(1)",Some college or associate degree,Some college or associate degree,Some college or associate degree,Bachelor's degree and higher,Bachelor's degree and higher,Bachelor's degree and higher
Unnamed: 0_level_2,"Employment status, sex, race, and Hispanic or Latino ethnicity",2013,2014,2013,2014,2013,2014,2013,2014,2013,...,Bachelor's degree only,Advanced degree,Less than a high school diploma,"High school graduates, no college(1)",Total,"Some college, no degree",Associate degree,Total(2),Bachelor's degree only,Advanced degree
0,TOTAL,,,,,,,,,,...,,,,,,,,,,
1,Civilian noninstitutional population,24424.0,24143.0,61949.0,62060.0,55038.0,55695.0,34532.0,34856.0,20506.0,...,,,,,,,,,,
2,Civilian labor force,11005.0,10828.0,36359.0,36033.0,37294.0,37320.0,22488.0,22518.0,14806.0,...,,,,,,,,,,
3,Participation rate,45.1,44.9,58.7,58.1,67.8,67.0,65.1,64.6,72.2,...,,,,,,,,,,
4,Employed,9798.0,9852.0,33619.0,33865.0,34925.0,35299.0,20914.0,21159.0,14011.0,...,,,,,,,,,,
