
---

# üìò Notebook: AI Text Detector (RoBERTa Fine-Tuning)

**Objective:** Create a binary classifier to detect if text is Human-written (Label 0) or AI-generated (Label 1).

**Base Model:** `FacebookAI/roberta-base`



### **Step 1: Install Dependencies**

We need the Hugging Face ecosystem libraries (`transformers`, `datasets`) and `evaluate` for metrics.

In [2]:

!pip install -q transformers datasets evaluate accelerate scikit-learn

[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m84.1/84.1 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h

### **Step 2: Import Libraries & Configure**

We set up the device (GPU/CPU) and define our model checkpoints.

In [3]:
# [Cell 2]
import torch
import numpy as np
from datasets import load_dataset
from transformers import (
    RobertaTokenizer,
    RobertaForSequenceClassification,
    TrainingArguments,
    Trainer,
    DataCollatorWithPadding
)
import evaluate

# Configuration
MODEL_CHECKPOINT = "FacebookAI/roberta-base"
DATASET_NAME = "artem9k"
BATCH_SIZE = 16
MAX_LENGTH = 512

# Check for GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cuda


### **Step 3: Load and Prepare Dataset**

The HC3 dataset contains questions with both `human_answers` and `chatgpt_answers`. We need to "flatten" this so each row is just `(text, label)`.

* **Label 0:** Human
* **Label 1:** AI

In [4]:
import torch
import numpy as np
from datasets import load_dataset, concatenate_datasets, ClassLabel
from transformers import (
    RobertaTokenizer,
    RobertaForSequenceClassification,
    TrainingArguments,
    Trainer,
    DataCollatorWithPadding
)
import evaluate

# Configuration
MODEL_CHECKPOINT = "FacebookAI/roberta-base"
DATASET_NAME = "artem9k"
BATCH_SIZE = 16
MAX_LENGTH = 512

# Check for GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

print("Loading dataset...")
raw_dataset_dict = load_dataset("artem9k/ai-text-detection-pile")

# Access the 'train' split from the DatasetDict
raw_dataset = raw_dataset_dict['train']

# 1. Separate Human and AI data
#    We filter the huge raw_dataset to get two separate piles
print("Filtering for 'human' source...")
human_ds = raw_dataset.filter(lambda x: x['source'] == 'human')

print("Filtering for 'ai' source...")
ai_ds = raw_dataset.filter(lambda x: x['source'] == 'ai')

# 2. Select 1000 random samples from each
#    .shuffle(seed=42) ensures we get a random mix, not just the first 1000 rows
human_sample = human_ds.shuffle(seed=42).select(range(9000))
ai_sample = ai_ds.shuffle(seed=42).select(range(9000))

# 3. Combine them into one balanced dataset (2000 rows total)
balanced_dataset = concatenate_datasets([human_sample, ai_sample])

# 4. Convert 'source' (string) to 'labels' (integer)
#    The model requires numbers: 0 for Human, 1 for AI.
def map_labels(example):
    # If source is 'human', label = 0. If 'ai', label = 1.
    example['labels'] = 0 if example['source'] == 'human' else 1
    return example

balanced_dataset = balanced_dataset.map(map_labels)

# 5. Final Shuffle & Split
#    We shuffle again so the model doesn't see 1000 humans in a row then 1000 AIs
final_dataset = balanced_dataset.shuffle(seed=42)

#    Remove 'source' and 'id' columns as the model doesn't need them anymore
final_dataset = final_dataset.remove_columns(['source', 'id'])

#    Split: 1600 Train, 400 Test (80/20 split)
dataset_split = final_dataset.train_test_split(test_size=0.2, seed=42)

print("------------------------------------------------")
print("SUCCESS! Dataset is now balanced and formatted.")
print(f"Total Train Samples: {len(dataset_split['train'])}")
print(f"Total Test Samples:  {len(dataset_split['test'])}")
print(f"Sample Entry: {dataset_split['train'][0]}")
print("------------------------------------------------")

Using device: cuda
Loading dataset...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00007-bc5952582e004d(‚Ä¶):   0%|          | 0.00/758M [00:00<?, ?B/s]

data/train-00001-of-00007-71c80017bc45f3(‚Ä¶):   0%|          | 0.00/318M [00:00<?, ?B/s]

data/train-00002-of-00007-ee2d43f396e78f(‚Ä¶):   0%|          | 0.00/125M [00:00<?, ?B/s]

data/train-00003-of-00007-529931154b42b5(‚Ä¶):   0%|          | 0.00/137M [00:00<?, ?B/s]

data/train-00004-of-00007-b269dc49374a2c(‚Ä¶):   0%|          | 0.00/137M [00:00<?, ?B/s]

data/train-00005-of-00007-3dce5e05ddbad7(‚Ä¶):   0%|          | 0.00/258M [00:00<?, ?B/s]

data/train-00006-of-00007-3d8a471ba0cf1c(‚Ä¶):   0%|          | 0.00/242M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1392522 [00:00<?, ? examples/s]

Filtering for 'human' source...


Filter:   0%|          | 0/1392522 [00:00<?, ? examples/s]

Filtering for 'ai' source...


Filter:   0%|          | 0/1392522 [00:00<?, ? examples/s]

Map:   0%|          | 0/18000 [00:00<?, ? examples/s]

------------------------------------------------
SUCCESS! Dataset is now balanced and formatted.
Total Train Samples: 14400
Total Test Samples:  3600
Sample Entry: {'text': 'This is a rush transcript. Copy may not be in its final form.\n\nAMY GOODMAN: We end today\'s show with a look at the ongoing protests in Iraq and Syria, where a group called Al Qaeda in Iraq is pushing for an Islamic caliphate with no borders or boundaries. The group, which calls itself the Islamic State of Iraq and al-Sham, has claimed responsibility for the beheading of American reporter James Foley and the attack against the Syrian city of Kobani. Both videos of the beheadings were posted on an Islamic State militant Web page this morning. The Al Qaeda-backed group also beheaded two other Americans, Steven Sotloff and Peter Kassig, at point-blank range. The group\'s media arm posted a statement on their Web site that read, quote, "Our mujahidin in the land of jihad today announced the establishment of the Islam

### **Step 4: Tokenization**

We turn the text into numbers using the `RobertaTokenizer`.

In [5]:
# [Cell 4] - Tokenization
from transformers import RobertaTokenizer

tokenizer = RobertaTokenizer.from_pretrained(MODEL_CHECKPOINT)

def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=MAX_LENGTH
    )

# 1. Tokenize the text
print("Tokenizing dataset...")
tokenized_datasets = dataset_split.map(tokenize_function, batched=True)

# 2. Remove the raw 'text' column
# (The model only needs the numerical 'input_ids' and 'labels' now)
tokenized_datasets = tokenized_datasets.remove_columns(["text"])

# 3. Set format to PyTorch
tokenized_datasets.set_format("torch")

# Verify
print("------------------------------------------------")
print(f"Columns ready for model: {tokenized_datasets['train'].column_names}")
print("Expected: ['labels', 'input_ids', 'attention_mask']")
print("------------------------------------------------")

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Tokenizing dataset...


Map:   0%|          | 0/14400 [00:00<?, ? examples/s]

Map:   0%|          | 0/3600 [00:00<?, ? examples/s]

------------------------------------------------
Columns ready for model: ['labels', 'input_ids', 'attention_mask']
Expected: ['labels', 'input_ids', 'attention_mask']
------------------------------------------------


### **Step 5: Initialize Model**

We load `RobertaForSequenceClassification`. This adds a linear layer on top of the base RoBERTa model specifically for classification.

In [6]:
# [Cell 5 - PEFT VERSION]
from transformers import RobertaForSequenceClassification
from peft import get_peft_model, LoraConfig, TaskType

# 1. Load Base Model (Same as before)
model = RobertaForSequenceClassification.from_pretrained(
    MODEL_CHECKPOINT,
    num_labels=2
)

# 2. Define PEFT (LoRA) Configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, # Sequence Classification
    inference_mode=False,
    r=16,                       # Rank (size of the adapter). 8 or 16 is standard.
    lora_alpha=32,              # Scaling factor
    lora_dropout=0.1            # Helps prevent overfitting
)

# 3. Wrap the model
model = get_peft_model(model, peft_config)

# 4. Verify how efficient it is
# This prints how many parameters we are actually training.
# You will see it drop from ~125 Million to just ~0.5 Million!
model.print_trainable_parameters()

# Move to GPU
model.to(device)

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at FacebookAI/roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 1,181,954 || all params: 125,829,124 || trainable%: 0.9393


PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): RobertaForSequenceClassification(
      (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
          (word_embeddings): Embedding(50265, 768, padding_idx=1)
          (position_embeddings): Embedding(514, 768, padding_idx=1)
          (token_type_embeddings): Embedding(1, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
          (layer): ModuleList(
            (0-11): 12 x RobertaLayer(
              (attention): RobertaAttention(
                (self): RobertaSdpaSelfAttention(
                  (query): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.1, inplace=False)
                    )
                    (lora_A): Mod

### **Step 6: Define Metrics & Trainer**

We define how we measure success (Accuracy/F1) and set the training hyperparameters.

In [7]:
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

# Initialize DataCollatorWithPadding here
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

training_args = TrainingArguments(
    output_dir="./roberta-ai-detector",
    learning_rate=5e-6,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    num_train_epochs=3,  # Increased from 1 to 3 for better results
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    # processing_class=tokenizer, # This is not needed when data_collator is provided
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

Downloading builder script: 0.00B [00:00, ?B/s]

### **Step 7: Train the Model**

This step will take time depending on your GPU. On a Google Colab T4 GPU, 1 epoch with 2000 samples takes about 2-3 minutes.

In [8]:
# [Cell 7]
print("Starting Training...")
trainer.train()

Starting Training...


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6711,0.415162,0.774722
2,0.3877,0.341359,0.843611
3,0.3351,0.327147,0.854722


TrainOutput(global_step=2700, training_loss=0.4393119020815249, metrics={'train_runtime': 3359.2653, 'train_samples_per_second': 12.86, 'train_steps_per_second': 0.804, 'total_flos': 1.15232551796736e+16, 'train_loss': 0.4393119020815249, 'epoch': 3.0})

### **Step 8: Save the Model**

Save your fine-tuned model so you can reload it later without training again.

In [9]:
# [Cell 8]
save_path = "./saved_peft"
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)
print(f"Model saved to {save_path}")

Model saved to ./saved_peft


In [10]:
from peft import PeftModel, PeftConfig

# 1. Load the Base Model first
base_model = RobertaForSequenceClassification.from_pretrained("FacebookAI/roberta-base", num_labels=2)

# 2. Load the PEFT Adapter on top of it
model = PeftModel.from_pretrained(base_model, "./saved_peft")

# Now use 'model' for prediction as usual!

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at FacebookAI/roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
# [Cell 10] - Save and Zip
import shutil

# 1. Define where to save
save_directory = "./my_ai_detector_final"

# 2. Save Model and Tokenizer
# If you used standard training:
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)

# If you used PEFT, it saves adapters here automatically.
# (Note: For PEFT, you also need the base 'roberta-base' available,
# but the app below assumes a full standard save for simplicity).

print(f"Model saved to {save_directory}")

# 3. Zip it for download
shutil.make_archive("ai_detector_model", 'zip', save_directory)
print("Zip file created: ai_detector_model.zip")

Model saved to ./my_ai_detector_final
Zip file created: ai_detector_model.zip


### **Step 9: Inference System (The Final Detector)**

This is the actual function you will use in your application to detect text.

In [None]:
# [Cell 9]
import torch.nn.functional as F

def detect_ai(text, path_to_model):
    # Load from your saved path
    loaded_tokenizer = RobertaTokenizer.from_pretrained(path_to_model)
    loaded_model = RobertaForSequenceClassification.from_pretrained(path_to_model)
    loaded_model.eval() # Set to evaluation mode

    # Prepare input
    inputs = loaded_tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )

    # Get prediction
    with torch.no_grad():
        outputs = loaded_model(**inputs)
        logits = outputs.logits

    # Convert logits to probabilities
    probs = F.softmax(logits, dim=-1)
    confidence = torch.max(probs).item()
    predicted_class = torch.argmax(probs).item()

    label_map = {0: "HUMAN", 1: "AI-GENERATED"}

    return {
        "label": label_map[predicted_class],
        "confidence": f"{confidence:.2%}",
        "raw_probabilities": {
            "Human": f"{probs[0][0]:.4f}",
            "AI": f"{probs[0][1]:.4f}"
        }
    }

# --- TEST THE SYSTEM ---
sample_text = "Deep learning is a subset of machine learning based on artificial neural networks."
result = detect_ai(sample_text, "./my_saved_detector_with_peft")

print("-----------------------------")
print(f"Input Text: {sample_text}")
print(f"Result: {result['label']}")
print(f"Confidence: {result['confidence']}")
print("-----------------------------")

In [None]:
from peft import PeftModel, PeftConfig

# 1. Load the Base Model first
base_model = RobertaForSequenceClassification.from_pretrained("FacebookAI/roberta-base", num_labels=2)

# 2. Load the PEFT Adapter on top of it
model = PeftModel.from_pretrained(base_model, "./my_saved_detector_with_peft")

# Now use 'model' for prediction as usual!

In [None]:
model.save_pretrained("./my_saved_detector_with_peft")