Skip to content

Code for the paper "Big City Bias: Evaluating the Impact of Metropolitan Size on Computational Job Market Abilities of Language Models" (NLP4HR '24)

License

Notifications You must be signed in to change notification settings

charlie-campanella/big-city-bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big City Bias: Evaluating the Impact of Metropolitan Size on Computational Job Market Abilities of Language Models

Code for the corresponding paper submitted to the NLP4HR Workshop at EACL 2024 (https://aclanthology.org/2024.nlp4hr-1.6/)

Abstract

Large language models have emerged as a useful technology for job matching, for both candidates and employers. Job matching is often based on a particular geographic location, such as a city or region. However, LMs have known biases, commonly derived from their training data. In this work, we aim to quantify the metropolitan size bias encoded within large language models, evaluating zero-shot salary, employer presence, and commute duration predictions in 384 of the United States’ metropolitan regions. Across all benchmarks, we observe correlations between metropolitan population and the accuracy of predictions, with the smallest 10 metropolitan regions showing upwards of 300% worse benchmark performance than the largest 10.

System Requirements

Note: MacOS is the preferred development environment.

  1. NodeJS: brew install node
  2. NPM: Included with Node installation
  3. An internet browser (visualization only)

Quick Start

Note: Run commands from the project root

Install dependencies:

    npm install

Render in-browser visualizations:

    npm run viz

Render PDF visualizations:

    python3 viz/data.py && python3 viz/correlation.py  

Running Evaluations

To re-run evaluations, configure the .env file as follows:

    CACHE="FALSE"
    OPENAI_API_KEY="<YOUR_KEY_HERE>"
    REPLICATE_API_KEY="<YOUR_KEY_HERE>"

You can then run:

    npm run eval:viz

Note: You must provide your own salary data in data/metro/us_metro_salary.csv

Note: A complete evaluation fires thousands of completion requests to each model. Use at your own (financial) risk!

Project Structure

  • /cache => contains cached evaluation outputs with a file for each model (e.g evaluation-gpt-3.5-turbo.json)
  • /data/metro => contains CSV data sources for evaluations (e.g us_metro_commute.csv)
  • /src/index.ts => root evaluation file, everything starts here
  • /src/Analysis.ts => statistical evaluation of evaluation output. Writes to viz/data.json for visualization.
  • /viz => contains browser and pdf visualization files. All charts render data from viz/data.json.

TODO: Describe visualizations

Data Sources

About

Code for the paper "Big City Bias: Evaluating the Impact of Metropolitan Size on Computational Job Market Abilities of Language Models" (NLP4HR '24)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published