In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import sklearn
from sklearn.utils import check_random_state
import tensorflow as tf

In [10]:
# Set random states for reproducability
RandomState = 42
random.seed(RandomState)
np.random.seed(RandomState)
skl_rand = check_random_state(RandomState)
tf.random.set_seed(RandomState)


TO DO:
- Dataset Loading/Choosing ✅
- Dataset Cleaning
- Exploratory Data Analysis
- Baseline (TBD)
- BERT fine-tuning to classify text
- Error Analysis / Robustness Testing

# Dataset Loading/Choosing

- LLM - Detect AI Generated Text Dataset (28k essays)
https://www.kaggle.com/datasets/sunilthite/llm-detect-ai-generated-text-dataset.
- Dataset Card for AI Text Dectection Pile (1.4mil essays)
https://huggingface.co/datasets/artem9k/ai-text-detection-pile
- Raid (10+mil essays from 10 genres) https://github.com/liamdugan/raid

# Dataset Cleaning

- Lowercasing (optional with BERT since it's often case-aware depending on the model)
- Removing HTML tags, extra spaces
- Filtering by length (exclude very short texts)
- Removing duplicates
- Language detection if you need only English
- Where possible tag which model AI text is from

In [20]:
from datasets import load_dataset
ds = load_dataset("liamdugan/raid", "raid")

README.md:   0%|          | 0.00/3.66k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


train.csv:   0%|          | 0.00/11.8G [00:00<?, ?B/s]

extra.csv:   0%|          | 0.00/3.71G [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating extra split: 0 examples [00:00, ? examples/s]

Loading dataset shards:   0%|          | 0/24 [00:00<?, ?it/s]

In [26]:
len(ds["train"])
data = ds["train"]

In [36]:
data.features

{'id': Value(dtype='string', id=None),
 'adv_source_id': Value(dtype='string', id=None),
 'source_id': Value(dtype='string', id=None),
 'model': Value(dtype='string', id=None),
 'decoding': Value(dtype='string', id=None),
 'repetition_penalty': Value(dtype='string', id=None),
 'attack': Value(dtype='string', id=None),
 'domain': Value(dtype='string', id=None),
 'title': Value(dtype='string', id=None),
 'prompt': Value(dtype='string', id=None),
 'generation': Value(dtype='string', id=None)}

In [54]:
data.select(range(10)).to_pandas()[["id","adv_source_id","source_id","model","decoding","repetition_penalty","attack","domain","title","prompt"]]

Unnamed: 0,id,adv_source_id,source_id,model,decoding,repetition_penalty,attack,domain,title,prompt
0,e5e058ce-be2b-459d-af36-32532aaba5ff,e5e058ce-be2b-459d-af36-32532aaba5ff,e5e058ce-be2b-459d-af36-32532aaba5ff,human,,,none,abstracts,FUTURE-AI: Guiding Principles and Consensus Re...,
1,f95b107b-d176-4af5-90f7-4d0bb20caf93,f95b107b-d176-4af5-90f7-4d0bb20caf93,f95b107b-d176-4af5-90f7-4d0bb20caf93,human,,,none,abstracts,EdgeFlow: Achieving Practical Interactive Segm...,
2,856d8972-9e3d-4544-babc-0fe16f21e04d,856d8972-9e3d-4544-babc-0fe16f21e04d,856d8972-9e3d-4544-babc-0fe16f21e04d,human,,,none,abstracts,Semi-supervised Contrastive Learning for Label...,
3,fbc8a5ea-90fa-47b8-8fa7-73dd954f1524,fbc8a5ea-90fa-47b8-8fa7-73dd954f1524,fbc8a5ea-90fa-47b8-8fa7-73dd954f1524,human,,,none,abstracts,Combo Loss: Handling Input and Output Imbalanc...,
4,72c41b8d-0069-4886-b734-a4000ffca286,72c41b8d-0069-4886-b734-a4000ffca286,72c41b8d-0069-4886-b734-a4000ffca286,human,,,none,abstracts,Attention-Based 3D Seismic Fault Segmentation ...,
5,72fe360b-cce6-4daf-b66a-1d778f5964f8,72fe360b-cce6-4daf-b66a-1d778f5964f8,72fe360b-cce6-4daf-b66a-1d778f5964f8,human,,,none,abstracts,Segmenter: Transformer for Semantic Segmentation,
6,df594cf4-9a0c-4488-bcb3-68f41e2d5a16,df594cf4-9a0c-4488-bcb3-68f41e2d5a16,df594cf4-9a0c-4488-bcb3-68f41e2d5a16,human,,,none,abstracts,Mining Contextual Information Beyond Image for...,
7,853c0e51-7dd5-4bb5-8286-e4aa8820173b,853c0e51-7dd5-4bb5-8286-e4aa8820173b,853c0e51-7dd5-4bb5-8286-e4aa8820173b,human,,,none,abstracts,Comprehensive Multi-Modal Interactions for Ref...,
8,1649f195-8f98-4c79-92b6-54a5ca9261fa,1649f195-8f98-4c79-92b6-54a5ca9261fa,1649f195-8f98-4c79-92b6-54a5ca9261fa,human,,,none,abstracts,Few-Shot Segmentation with Global and Local Co...,
9,5e23ab14-b85f-48e8-9aa3-15452e73524e,5e23ab14-b85f-48e8-9aa3-15452e73524e,5e23ab14-b85f-48e8-9aa3-15452e73524e,human,,,none,abstracts,Efficient and Generic Interactive Segmentation...,


Possible extra feature engineering, usefullness unsure.
BERT embeddings + handcrafted features like:
- Average sentence length
- N-gram repetition
- Ratio of stopwords Then feed that into a LightGBM/XGBoost model to compare.

# Exploratory Data Analysis (EDA)
- Text length distributions
- Vocabulary richness (e.g. unique words)
- POS tag distribution (maybe AI uses more nouns, fewer adjectives?)
- Visualizations: word clouds, frequency plots
- Clustering to check for seperability of classes
- .
- Comparing perplexity charts of AI model text and human text, can help understand the complexity of the task at hand, due to the variety of distributions.

# Baseline (TBD)

- Basic baseline logistic regression etc (Might not be relevant)
- Basic Deep learning artitecture
- Base BERT
- Maybe a basic baseline and a base BERT to see how much performance BERT adds and how much fine-tuned BERT additionally adds.

# BERT fine-tuning to classify text

- BERT vs RoBERTa vs DistilBERT
- RoBERTa often performs better in classification tasks

# Error Analysis / Robustness Testing

- What types of errors does it make confusion matrix?
- Is the model biased toward longer/shorter texts?
- Attention analysis (using tools like BertViz)
- Check if BERT overfits to text length or formatting
- Does it misclassify texts on certain topics?
- Could it unfairly flag texts written by non-native speakers?
- Does it perform better on specific outpurs from specific models?
- Small edits (punctuation, synonyms) and how does they affect the model?
- Test synonym replacements (e.g., "happy" → "joyful") with slight paraphrasing

# Explainability

- Attention Heatmap (with bertviz or transformers-interpret)
- Visualize token importance
- SHAP map

# (Option extra if time allows) Own trained text generator, compare its outputs predictions to the ones from other models