Policy Proposal Labeler

A machine learning pipeline for detecting policy violations in social media posts related to transgender topics. This system combines LLM-based claim extraction and fact verification with Perspective API toxicity scoring to automatically label posts as policy-violating or non-violating.

Project Overview

This project implements an automated content moderation pipeline that:

Extracts scientific/biological claims from social media posts using OpenAI's GPT models
Verifies factual accuracy of extracted claims using LLM-based reasoning
Scores toxicity using Google's Perspective API (toxicity, insult, identity attack)
Combines signals to produce a final policy_violation label

Policy Violation Logic

A post is flagged as a policy violation if:

toxic == True (high toxicity/insult/identity attack scores), OR
fact == False (contains clearly false or misleading scientific claims)

Project Structure

Assignment_3/
├── policy_proposal_labeler.ipynb   # Main Jupyter notebook with full pipeline
├── data.csv                        # Full dataset (178 posts with human labels)
├── test.csv                        # Small test dataset (4 posts) for quick testing
├── result.csv                      # Output file generated after running the pipeline
├── .env                            # API keys (you need to create this)
└── README.md                       # This file

File Descriptions

File	Description
`policy_proposal_labeler.ipynb`	Main notebook containing all pipeline code, from setup to evaluation
`data.csv`	Full dataset with 178 labeled posts (columns: `post_id`, `post_text`, `human_label`, `post_type`)
`test.csv`	Small 4-row test dataset for quick pipeline validation
`result.csv`	Generated output with model predictions and scores
`.env`	Environment file for storing API keys (not included, must be created)

Environment Setup

Step 1: Create Conda Environment

Create a new conda environment with Python 3.11:

conda create -n hw3 python=3.11
conda activate hw3

Step 2: Install Required Packages

Run the following command (or execute the first cell in the notebook):

pip install requests pandas scikit-learn tqdm openai python-dotenv google-api-python-client

Step 3: Select Kernel

Open policy_proposal_labeler.ipynb in Jupyter/VS Code/Cursor and select the hw3 conda environment as the kernel.

API Keys Configuration

This project requires three API keys. Create a .env file in the project root directory:

touch .env

Add the following content to .env:

OPENAI_API_KEY=your_openai_api_key_here
PERSPECTIVE_API_KEY=your_perspective_api_key_here
FACT_CHECK_API_KEY=your_google_fact_check_api_key_here

API Key Sources

API Key	Source	Required
`OPENAI_API_KEY`	OpenAI Platform	✅ Yes
`PERSPECTIVE_API_KEY`	Google Perspective API	✅ Yes
`FACT_CHECK_API_KEY`	Google Fact Check Tools API	⚠️ Optional*

*Note: The Google Fact Check API was found to be unreliable for this use case and is not used in the final pipeline. The code remains for demonstration purposes.

Running the Code

Quick Test (Recommended First)

Open policy_proposal_labeler.ipynb
Run cells 1-9 sequentially
This processes the small test.csv dataset (4 posts) to validate your setup

Full Dataset Processing

Run Cell 10 to process all 178 posts in data.csv
Results are saved to result.csv
Run Cells 11-12 for evaluation metrics and analysis

Cell-by-Cell Guide

Cell	Description
1	Install required packages
2	Load and preview `test.csv`
3	Define `extract_claims()` - LLM-based claim extraction
4	Define `lookup_fact_check()` - Google Fact Check API (deprecated)
5	Define `llm_verdict_for_post()` - LLM-based fact verification
6	Define `get_perspective_scores()` - Toxicity scoring
7	Compute `policy_violation` and evaluate on test set
8	Visualize results
9	Preview final dataframe
10	Full processing on `data.csv` → saves to `result.csv`
11	Evaluation metrics (accuracy, precision, recall, F1, confusion matrix)
12	Inspect potential over-flagging cases

Pipeline Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         INPUT: Post Text                        │
└─────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
         ┌────────────────────────┴────────────────────────┐
         │                                                 │
         ▼                                                 ▼
┌─────────────────────┐                      ┌─────────────────────┐
│  CLAIM EXTRACTION   │                      │  TOXICITY SCORING   │
│  (OpenAI GPT)       │                      │  (Perspective API)  │
│                     │                      │                     │
│  Extract scientific │                      │  • toxicity_score   │
│  /biological claims │                      │  • insult_score     │
└─────────────────────┘                      │  • identity_attack  │
         │                                   └─────────────────────┘
         ▼                                             │
┌─────────────────────┐                                │
│  FACT VERIFICATION  │                                │
│  (OpenAI GPT)       │                                │
│                     │                                │
│  Verify if claims   │                                │
│  are factually      │                                │
│  accurate           │                                │
└─────────────────────┘                                │
         │                                             │
         ▼                                             ▼
┌─────────────────────┐                      ┌─────────────────────┐
│    fact = True/     │                      │   toxic = True if:  │
│           False     │                      │   • identity > 0.65 │
└─────────────────────┘                      │   • toxicity > 0.65 │
         │                                   │   • insult > 0.65   │
         │                                   │   • (id>0.5 & tox>  │
         │                                   │     0.55)           │
         │                                   └─────────────────────┘
         │                                             │
         └──────────────────┬──────────────────────────┘
                            ▼
              ┌─────────────────────────┐
              │    POLICY VIOLATION     │
              │                         │
              │  = toxic OR (NOT fact)  │
              └─────────────────────────┘
                            │
                            ▼
              ┌─────────────────────────┐
              │  OUTPUT: violate /      │
              │          no_violate     │
              └─────────────────────────┘

Evaluation Metrics

After running the full pipeline on data.csv, the following metrics are computed against human labels:

Expected Results (from notebook output)

Metric	Value
Accuracy	87.1%
Precision (violate)	100%
Recall (violate)	66%
F1 (violate)	0.79

Confusion Matrix

	Pred: no_violate	Pred: violate
True: no_violate	111	0
True: violate	23	44

The model achieves zero false positives (no posts incorrectly flagged as violations) but has some false negatives (violating posts missed).

Important Notes

⚠️ Perspective API Rate Limits

The Perspective API has rate limits (~60 requests/minute). If you encounter 429 errors:

Wait a few minutes and re-run the cell
Run during off-peak hours (early morning or late night)
The pipeline handles errors gracefully by setting scores to 0.0

⚠️ Google Fact Check API

The Google Fact Check API was found to be unreliable for this use case (rarely returns results for scientific claims). The code remains in the notebook for demonstration, but the final pipeline relies on LLM-based fact verification instead.

📁 Output Files

result.csv is appended each time Cell 10 runs
To start fresh, delete result.csv before re-running

Troubleshooting

Issue	Solution
`OPENAI_API_KEY environment variable not set`	Create `.env` file with your API key
`Perspective API HttpError 429`	Rate limited - restart the entire kernal and retry from the beginning
`ModuleNotFoundError`	Run `pip install` command from Step 2
Kernel not found	Ensure conda env `hw3` is activated and selected

Authors

CS5342 Trust and Safety - Assignment 3

License

For educational purposes only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Policy Proposal Labeler

Table of Contents

Project Overview

Policy Violation Logic

Project Structure

File Descriptions

Environment Setup

Step 1: Create Conda Environment

Step 2: Install Required Packages

Step 3: Select Kernel

API Keys Configuration

API Key Sources

Running the Code

Quick Test (Recommended First)

Full Dataset Processing

Cell-by-Cell Guide

Pipeline Architecture

Evaluation Metrics

Expected Results (from notebook output)

Confusion Matrix

Important Notes

⚠️ Perspective API Rate Limits

⚠️ Google Fact Check API

📁 Output Files

Troubleshooting

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
data.csv		data.csv
policy_proposal_labeler.ipynb		policy_proposal_labeler.ipynb
test.csv		test.csv

Folders and files

Latest commit

History

Repository files navigation

Policy Proposal Labeler

Table of Contents

Project Overview

Policy Violation Logic

Project Structure

File Descriptions

Environment Setup

Step 1: Create Conda Environment

Step 2: Install Required Packages

Step 3: Select Kernel

API Keys Configuration

API Key Sources

Running the Code

Quick Test (Recommended First)

Full Dataset Processing

Cell-by-Cell Guide

Pipeline Architecture

Evaluation Metrics

Expected Results (from notebook output)

Confusion Matrix

Important Notes

⚠️ Perspective API Rate Limits

⚠️ Google Fact Check API

📁 Output Files

Troubleshooting

Authors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages