Skip to content

bymle/Assignment-3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Policy Proposal Labeler

A machine learning pipeline for detecting policy violations in social media posts related to transgender topics. This system combines LLM-based claim extraction and fact verification with Perspective API toxicity scoring to automatically label posts as policy-violating or non-violating.


Table of Contents

  1. Project Overview
  2. Project Structure
  3. Environment Setup
  4. API Keys Configuration
  5. Running the Code
  6. Pipeline Architecture
  7. Evaluation Metrics
  8. Important Notes

Project Overview

This project implements an automated content moderation pipeline that:

  1. Extracts scientific/biological claims from social media posts using OpenAI's GPT models
  2. Verifies factual accuracy of extracted claims using LLM-based reasoning
  3. Scores toxicity using Google's Perspective API (toxicity, insult, identity attack)
  4. Combines signals to produce a final policy_violation label

Policy Violation Logic

A post is flagged as a policy violation if:

  • toxic == True (high toxicity/insult/identity attack scores), OR
  • fact == False (contains clearly false or misleading scientific claims)

Project Structure

Assignment_3/
├── policy_proposal_labeler.ipynb   # Main Jupyter notebook with full pipeline
├── data.csv                        # Full dataset (178 posts with human labels)
├── test.csv                        # Small test dataset (4 posts) for quick testing
├── result.csv                      # Output file generated after running the pipeline
├── .env                            # API keys (you need to create this)
└── README.md                       # This file

File Descriptions

File Description
policy_proposal_labeler.ipynb Main notebook containing all pipeline code, from setup to evaluation
data.csv Full dataset with 178 labeled posts (columns: post_id, post_text, human_label, post_type)
test.csv Small 4-row test dataset for quick pipeline validation
result.csv Generated output with model predictions and scores
.env Environment file for storing API keys (not included, must be created)

Environment Setup

Step 1: Create Conda Environment

Create a new conda environment with Python 3.11:

conda create -n hw3 python=3.11
conda activate hw3

Step 2: Install Required Packages

Run the following command (or execute the first cell in the notebook):

pip install requests pandas scikit-learn tqdm openai python-dotenv google-api-python-client

Step 3: Select Kernel

Open policy_proposal_labeler.ipynb in Jupyter/VS Code/Cursor and select the hw3 conda environment as the kernel.


API Keys Configuration

This project requires three API keys. Create a .env file in the project root directory:

touch .env

Add the following content to .env:

OPENAI_API_KEY=your_openai_api_key_here
PERSPECTIVE_API_KEY=your_perspective_api_key_here
FACT_CHECK_API_KEY=your_google_fact_check_api_key_here

API Key Sources

API Key Source Required
OPENAI_API_KEY OpenAI Platform ✅ Yes
PERSPECTIVE_API_KEY Google Perspective API ✅ Yes
FACT_CHECK_API_KEY Google Fact Check Tools API ⚠️ Optional*

*Note: The Google Fact Check API was found to be unreliable for this use case and is not used in the final pipeline. The code remains for demonstration purposes.


Running the Code

Quick Test (Recommended First)

  1. Open policy_proposal_labeler.ipynb
  2. Run cells 1-9 sequentially
  3. This processes the small test.csv dataset (4 posts) to validate your setup

Full Dataset Processing

  1. Run Cell 10 to process all 178 posts in data.csv
  2. Results are saved to result.csv
  3. Run Cells 11-12 for evaluation metrics and analysis

Cell-by-Cell Guide

Cell Description
1 Install required packages
2 Load and preview test.csv
3 Define extract_claims() - LLM-based claim extraction
4 Define lookup_fact_check() - Google Fact Check API (deprecated)
5 Define llm_verdict_for_post() - LLM-based fact verification
6 Define get_perspective_scores() - Toxicity scoring
7 Compute policy_violation and evaluate on test set
8 Visualize results
9 Preview final dataframe
10 Full processing on data.csv → saves to result.csv
11 Evaluation metrics (accuracy, precision, recall, F1, confusion matrix)
12 Inspect potential over-flagging cases

Pipeline Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         INPUT: Post Text                        │
└─────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
         ┌────────────────────────┴────────────────────────┐
         │                                                 │
         ▼                                                 ▼
┌─────────────────────┐                      ┌─────────────────────┐
│  CLAIM EXTRACTION   │                      │  TOXICITY SCORING   │
│  (OpenAI GPT)       │                      │  (Perspective API)  │
│                     │                      │                     │
│  Extract scientific │                      │  • toxicity_score   │
│  /biological claims │                      │  • insult_score     │
└─────────────────────┘                      │  • identity_attack  │
         │                                   └─────────────────────┘
         ▼                                             │
┌─────────────────────┐                                │
│  FACT VERIFICATION  │                                │
│  (OpenAI GPT)       │                                │
│                     │                                │
│  Verify if claims   │                                │
│  are factually      │                                │
│  accurate           │                                │
└─────────────────────┘                                │
         │                                             │
         ▼                                             ▼
┌─────────────────────┐                      ┌─────────────────────┐
│    fact = True/     │                      │   toxic = True if:  │
│           False     │                      │   • identity > 0.65 │
└─────────────────────┘                      │   • toxicity > 0.65 │
         │                                   │   • insult > 0.65   │
         │                                   │   • (id>0.5 & tox>  │
         │                                   │     0.55)           │
         │                                   └─────────────────────┘
         │                                             │
         └──────────────────┬──────────────────────────┘
                            ▼
              ┌─────────────────────────┐
              │    POLICY VIOLATION     │
              │                         │
              │  = toxic OR (NOT fact)  │
              └─────────────────────────┘
                            │
                            ▼
              ┌─────────────────────────┐
              │  OUTPUT: violate /      │
              │          no_violate     │
              └─────────────────────────┘

Evaluation Metrics

After running the full pipeline on data.csv, the following metrics are computed against human labels:

Expected Results (from notebook output)

Metric Value
Accuracy 87.1%
Precision (violate) 100%
Recall (violate) 66%
F1 (violate) 0.79

Confusion Matrix

Pred: no_violate Pred: violate
True: no_violate 111 0
True: violate 23 44

The model achieves zero false positives (no posts incorrectly flagged as violations) but has some false negatives (violating posts missed).


Important Notes

⚠️ Perspective API Rate Limits

The Perspective API has rate limits (~60 requests/minute). If you encounter 429 errors:

  1. Wait a few minutes and re-run the cell
  2. Run during off-peak hours (early morning or late night)
  3. The pipeline handles errors gracefully by setting scores to 0.0

⚠️ Google Fact Check API

The Google Fact Check API was found to be unreliable for this use case (rarely returns results for scientific claims). The code remains in the notebook for demonstration, but the final pipeline relies on LLM-based fact verification instead.

📁 Output Files

  • result.csv is appended each time Cell 10 runs
  • To start fresh, delete result.csv before re-running

Troubleshooting

Issue Solution
OPENAI_API_KEY environment variable not set Create .env file with your API key
Perspective API HttpError 429 Rate limited - restart the entire kernal and retry from the beginning
ModuleNotFoundError Run pip install command from Step 2
Kernel not found Ensure conda env hw3 is activated and selected

Authors

CS5342 Trust and Safety - Assignment 3


License

For educational purposes only.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors