Job Application Email Classifier — NLP Project (CS 7347)

Authors

Clark Pfohl
Michael Zimmerman
Oleksandra Zolotarevych

Overview

This project reads job application confirmation emails and classifies them into actionable categories using NLP techniques. Two classification pipelines are provided:

Pipeline	Model	Requires GPU?
Baseline (SVM)	TF-IDF → Linear SVM	No
Transformer (DeBERTa)	Zero-shot `microsoft/deberta-v3-large`	Recommended

Classification Labels

Label	Meaning
`acceptance`	Offer letter / congratulations
`rejection`	Application declined
`interview`	Interview invitation or scheduling
`action_required`	Assessment, test, or task to complete
`in_process`	Application received / under review
`unrelated`	Newsletter, marketing, account setup

Quick Start

Ensemble (works on any OS, no GPU needed)

pip install -r requirements.txt
python main.py --input data/job_app_confirmation_emails_anonymized.csv

DeBERTa — macOS / Linux

Requires Python 3.12 (PyTorch does not support Python 3.13 on macOS).

python3.12 -m venv venv
source venv/bin/activate
pip install torch "transformers>=4.30,<5.0" sentencepiece protobuf "numpy<2"
pip install -r requirements.txt
python main.py --input data/job_app_confirmation_emails_anonymized.csv --model deberta

DeBERTa — Windows

PyTorch supports Python 3.12 on Windows — install it from python.org if needed.

python -m venv venv
venv\Scripts\activate
pip install torch "transformers>=4.30,<5.0" sentencepiece protobuf "numpy<2"
pip install -r requirements.txt
python main.py --input data/job_app_confirmation_emails_anonymized.csv --model deberta

DeBERTa model presets

# Balanced — recommended (~20 min CPU)
python main.py --input data/... --model deberta --deberta-model balanced

# Fast — for testing (~12 min CPU)
python main.py --input data/... --model deberta --deberta-model fast

# Best accuracy (~60 min CPU)
python main.py --input data/... --model deberta --deberta-model best

Project Structure

nlp_project/
├── main.py                  # Entry point — runs full pipeline
├── README.md
├── data/                    # Input CSV lives here
├── models/
│   ├── rule_labeler.py      # Heuristic labeler (generates training labels)
│   ├── svm_classifier.py    # TF-IDF + SVM pipeline
│   ├── deberta_classifier.py# Zero-shot DeBERTa classifier
│   └── sentiment.py         # Lexicon-based sentiment analysis
├── utils/
│   ├── preprocessing.py     # Text cleaning & tokenization
│   ├── entity_extraction.py # Company / contact / date detection
│   └── summarizer.py        # Extractive email summarizer
├── results/                 # Output CSVs, plots, metrics
└── requirements.txt

Pipeline Diagram

Raw Emails (CSV)
       │
       ▼
  Preprocessing (clean HTML, normalize, tokenize)
       │
       ├──► Sentiment Analysis (lexicon scores)
       ├──► Entity Extraction (company, role, contact, dates)
       ├──► Extractive Summarization
       │
       ▼
  Classification
       ├── Rule-based labeler (heuristic pseudo-labels)
       ├── TF-IDF + SVM (trained on pseudo-labels)
       └── DeBERTa zero-shot (no training required)
       │
       ▼
  Evaluation & Results
       ├── Classification report (precision, recall, F1)
       ├── Confusion matrix
       ├── Sentiment distribution
       └── Final labeled CSV

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
models		models
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
SESSION_NOTES.md		SESSION_NOTES.md
claude.md		claude.md
generate_synthetic_data.py		generate_synthetic_data.py
main.py		main.py
requirements.txt		requirements.txt
train_with_synthetic.py		train_with_synthetic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Job Application Email Classifier — NLP Project (CS 7347)

Authors

Overview

Classification Labels

Quick Start

Ensemble (works on any OS, no GPU needed)

DeBERTa — macOS / Linux

DeBERTa — Windows

DeBERTa model presets

Project Structure

Pipeline Diagram

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Job Application Email Classifier — NLP Project (CS 7347)

Authors

Overview

Classification Labels

Quick Start

Ensemble (works on any OS, no GPU needed)

DeBERTa — macOS / Linux

DeBERTa — Windows

DeBERTa model presets

Project Structure

Pipeline Diagram

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages