# 🚀 Fine-Tuning GPT-2 to Generate Conspiracy Theories

This notebook walks through how to fine-tune GPT-2 on a dataset of conspiracy-related news and Reddit posts using HuggingFace Transformers.

👉 **Requirements:** Free Google Colab GPU or local setup with GPU

In [None]:
# 📦 Install Dependencies
!pip install transformers datasets accelerate --quiet

In [None]:
# 🧠 Load Pretrained GPT-2
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

In [None]:
# 📁 Load and Prepare Dataset
from datasets import load_dataset
import json

# Replace with your actual file or path to .json dataset
with open('/content/conspiracy_data.json') as f:
    data = json.load(f)

texts = [entry['title'] + ' ' + entry['summary'] for entry in data]
dataset = { 'text': texts }

In [None]:
# ✏️ Tokenize
from datasets import Dataset

ds = Dataset.from_dict(dataset)
def tokenize(batch):
    return tokenizer(batch['text'], truncation=True, padding='max_length', max_length=128)

tokenized_ds = ds.map(tokenize, batched=True)
tokenized_ds = tokenized_ds.train_test_split(test_size=0.1)

In [None]:
# ⚙️ Training
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="gpt2_conspiracy_model",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=3,
    logging_steps=10,
    save_steps=500,
    evaluation_strategy="epoch",
    fp16=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds['train'],
    eval_dataset=tokenized_ds['test']
)

trainer.train()

✅ Once trained, you can generate new conspiracy theory texts using `model.generate()`!