A machine learning pipeline for detecting policy violations in social media posts related to transgender topics. This system combines LLM-based claim extraction and fact verification with Perspective API toxicity scoring to automatically label posts as policy-violating or non-violating.
- Project Overview
- Project Structure
- Environment Setup
- API Keys Configuration
- Running the Code
- Pipeline Architecture
- Evaluation Metrics
- Important Notes
This project implements an automated content moderation pipeline that:
- Extracts scientific/biological claims from social media posts using OpenAI's GPT models
- Verifies factual accuracy of extracted claims using LLM-based reasoning
- Scores toxicity using Google's Perspective API (toxicity, insult, identity attack)
- Combines signals to produce a final
policy_violationlabel
A post is flagged as a policy violation if:
toxic == True(high toxicity/insult/identity attack scores), ORfact == False(contains clearly false or misleading scientific claims)
Assignment_3/
├── policy_proposal_labeler.ipynb # Main Jupyter notebook with full pipeline
├── data.csv # Full dataset (178 posts with human labels)
├── test.csv # Small test dataset (4 posts) for quick testing
├── result.csv # Output file generated after running the pipeline
├── .env # API keys (you need to create this)
└── README.md # This file
| File | Description |
|---|---|
policy_proposal_labeler.ipynb |
Main notebook containing all pipeline code, from setup to evaluation |
data.csv |
Full dataset with 178 labeled posts (columns: post_id, post_text, human_label, post_type) |
test.csv |
Small 4-row test dataset for quick pipeline validation |
result.csv |
Generated output with model predictions and scores |
.env |
Environment file for storing API keys (not included, must be created) |
Create a new conda environment with Python 3.11:
conda create -n hw3 python=3.11
conda activate hw3Run the following command (or execute the first cell in the notebook):
pip install requests pandas scikit-learn tqdm openai python-dotenv google-api-python-clientOpen policy_proposal_labeler.ipynb in Jupyter/VS Code/Cursor and select the hw3 conda environment as the kernel.
This project requires three API keys. Create a .env file in the project root directory:
touch .envAdd the following content to .env:
OPENAI_API_KEY=your_openai_api_key_here
PERSPECTIVE_API_KEY=your_perspective_api_key_here
FACT_CHECK_API_KEY=your_google_fact_check_api_key_here| API Key | Source | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI Platform | ✅ Yes |
PERSPECTIVE_API_KEY |
Google Perspective API | ✅ Yes |
FACT_CHECK_API_KEY |
Google Fact Check Tools API |
*Note: The Google Fact Check API was found to be unreliable for this use case and is not used in the final pipeline. The code remains for demonstration purposes.
- Open
policy_proposal_labeler.ipynb - Run cells 1-9 sequentially
- This processes the small
test.csvdataset (4 posts) to validate your setup
- Run Cell 10 to process all 178 posts in
data.csv - Results are saved to
result.csv - Run Cells 11-12 for evaluation metrics and analysis
| Cell | Description |
|---|---|
| 1 | Install required packages |
| 2 | Load and preview test.csv |
| 3 | Define extract_claims() - LLM-based claim extraction |
| 4 | Define lookup_fact_check() - Google Fact Check API (deprecated) |
| 5 | Define llm_verdict_for_post() - LLM-based fact verification |
| 6 | Define get_perspective_scores() - Toxicity scoring |
| 7 | Compute policy_violation and evaluate on test set |
| 8 | Visualize results |
| 9 | Preview final dataframe |
| 10 | Full processing on data.csv → saves to result.csv |
| 11 | Evaluation metrics (accuracy, precision, recall, F1, confusion matrix) |
| 12 | Inspect potential over-flagging cases |
┌─────────────────────────────────────────────────────────────────┐
│ INPUT: Post Text │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────┴────────────────────────┐
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ CLAIM EXTRACTION │ │ TOXICITY SCORING │
│ (OpenAI GPT) │ │ (Perspective API) │
│ │ │ │
│ Extract scientific │ │ • toxicity_score │
│ /biological claims │ │ • insult_score │
└─────────────────────┘ │ • identity_attack │
│ └─────────────────────┘
▼ │
┌─────────────────────┐ │
│ FACT VERIFICATION │ │
│ (OpenAI GPT) │ │
│ │ │
│ Verify if claims │ │
│ are factually │ │
│ accurate │ │
└─────────────────────┘ │
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ fact = True/ │ │ toxic = True if: │
│ False │ │ • identity > 0.65 │
└─────────────────────┘ │ • toxicity > 0.65 │
│ │ • insult > 0.65 │
│ │ • (id>0.5 & tox> │
│ │ 0.55) │
│ └─────────────────────┘
│ │
└──────────────────┬──────────────────────────┘
▼
┌─────────────────────────┐
│ POLICY VIOLATION │
│ │
│ = toxic OR (NOT fact) │
└─────────────────────────┘
│
▼
┌─────────────────────────┐
│ OUTPUT: violate / │
│ no_violate │
└─────────────────────────┘
After running the full pipeline on data.csv, the following metrics are computed against human labels:
| Metric | Value |
|---|---|
| Accuracy | 87.1% |
| Precision (violate) | 100% |
| Recall (violate) | 66% |
| F1 (violate) | 0.79 |
| Pred: no_violate | Pred: violate | |
|---|---|---|
| True: no_violate | 111 | 0 |
| True: violate | 23 | 44 |
The model achieves zero false positives (no posts incorrectly flagged as violations) but has some false negatives (violating posts missed).
The Perspective API has rate limits (~60 requests/minute). If you encounter 429 errors:
- Wait a few minutes and re-run the cell
- Run during off-peak hours (early morning or late night)
- The pipeline handles errors gracefully by setting scores to 0.0
The Google Fact Check API was found to be unreliable for this use case (rarely returns results for scientific claims). The code remains in the notebook for demonstration, but the final pipeline relies on LLM-based fact verification instead.
result.csvis appended each time Cell 10 runs- To start fresh, delete
result.csvbefore re-running
| Issue | Solution |
|---|---|
OPENAI_API_KEY environment variable not set |
Create .env file with your API key |
Perspective API HttpError 429 |
Rate limited - restart the entire kernal and retry from the beginning |
ModuleNotFoundError |
Run pip install command from Step 2 |
| Kernel not found | Ensure conda env hw3 is activated and selected |
CS5342 Trust and Safety - Assignment 3
For educational purposes only.