
Team Members:

Aksh Minesh Patel (G01418113: apatel66@gmu.edu)

Sreeja Venkatakrishnan (G01438914: svenka7@gmu.edu)

# Implementation of "[GPT-who: An Information Density-based Machine-Generated Text Detector](https://arxiv.org/abs/2310.06202)"

This notebook provides code to calculate the 4 UID-based features and UID minimum and maximum span features described in the paper for efficient and accurate machine text detection.

![image](https://github.com/saranya-venkatraman/gpt-who/assets/8631382/eee122f4-8578-4f88-b2ca-d01b597a1bde)

Code Reference: https://github.com/saranya-venkatraman/gpt-who/tree/main

The command pip install -r needs.txt will install all of the dependencies needed and mentioned within this file. All Python projects require a number of dependencies, such as: **numpy 1.24.3**: for numerical computation; enabling manipulation of multi-dimensional arrays; and **pandas 2.0.2**: managing data with structured data and libraries such as **scikit-learn 1.2.2** for machine learning tasks. It also includes tools for classification, regression, and clustering. **Torch**: The torch package with version 2.0.1 serves the functions of a deep learning framework to build and train artificial neural networks. **tqdm**: It contains a progress bar to track long-term tasks, and with transformers, version 4.30.0 is applied for natural language processing. It provides pre-trained models for text-like generation tasks. Installation These libraries will set up your project environment with appropriate settings for the specified version.



In [1]:
 pip install -r requirements.txt



The script below will process text data and calculate a number of features that might be needed by the model **GPT-Who**, in particular **UID-related features. It loads input data from a CSV file called `text_test.csv`, having columns like `text`, `prompt_id`, `score`, among others. This script uses the ArguGPT dataset**, a balanced prompt collection of more than 4,000 human-written essays and 4,000 articles generated by seven recent large language models, including multiple variants of **ChatGPT**. This dataset uses prompts in English datasets such as **TOEFL11** and **WECCL** for evaluating argumentative writing. Hence, it assesses the quality of AI-generated arguments with respect to logical coherence, persuasiveness, and structure against human-written ones. In this script, this data would be used in order to calculate several **UID-related features**, including the **variance of UID**, the **differences**, and the **surprisal** values. These latter two values are measures of predictability/fluidity of generated text that provide the most informative insights into the evaluation of argumentation models such as **GPT-Who**. The script subsequently calculates features across UIDs on every text entry by passing the text through the fine-tuned **GPT-2 XL model** to calculate log probabilities and perplexities of the tokens, therefore providing insight into the structure and complexity of the text. Indeed, analyzing the differences and variances in word-level surprisal-that is, the probability distributions-used to calculate the UID features really gives critical information about the predictability and fluency of the text.

It writes the resulting score and temperature metadata to an output CSV file named `uid_test.csv` with the precomputed features of the UID. The file might later be used as an input for the **GPT-Who** model in order to perform some further analysis or to make a prediction. The method estimates the quality of AI-generated text that will be suitable for further NLP tasks with argumentation, text analysis, or GPT-based language models.

In [12]:
from collections import Counter
import transformers
import pandas as pd
import numpy as np
import csv
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
import torch.nn.functional as F
from tqdm import tqdm
import gc
import re

# Set file paths (replace these paths with your actual file paths on Colab)
input_path = './data/text_test.csv'    # Path to your input CSV file
cache_path = './model_cache/gpt2-xl'     # Path to cache directory for GPT-2 XL
output_path = './scores/uid_test.csv' # Path to save the output CSV file

# Load data to DataFrame
df = pd.read_csv(input_path)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("CUDA status:", torch.cuda.is_available())

# Adjusted to use 'text' column for text generation and other relevant columns for output
all_sents = df['text']  # Text column
all_labels = df['score']  # Assuming 'score' or other columns are labels, adjust as needed

# Functions to get UID_Diff & UID_Diff2 features
def local_diff(x):
    d = 0
    for i in range(len(x)-1):
        d += abs(x[i+1]-x[i])
    return d/len(x)

def local_diff2(x):
    d = 0
    for i in range(len(x)-1):
        d += (x[i+1]-x[i])**2
    return d/len(x)

# Load GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-xl", cache_dir=cache_path)
model = GPT2LMHeadModel.from_pretrained("gpt2-xl", cache_dir=cache_path).to(device)
tokenizer.pad_token = tokenizer.eos_token

# Getting UID_var and all surprisals for "UID spans" features
def get_line_uid_surp(lines):
    with torch.no_grad():
        lines = tokenizer.eos_token + lines
        tok_res = tokenizer(lines, return_tensors='pt')
        input_ids = tok_res['input_ids'][0].to(device)
        attention_mask = tok_res['attention_mask'][0].to(device)
        lines_len = torch.sum(tok_res['attention_mask'], dim=1)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=input_ids)
        loss, logits = outputs[:2]
    line_log_prob = torch.tensor([0.0])
    word_probs = []
    for token_ind in range(lines_len - 1):
        token_prob = F.softmax(logits[token_ind], dim=0).cpu()
        token_id = input_ids[token_ind + 1].cpu()
        target_log_prob = -torch.log(token_prob[token_id]).detach().numpy()
        line_log_prob += -torch.log(token_prob[token_id]).detach().numpy()
        word_probs.append(target_log_prob)

    # mu = average surprisal
    mu = line_log_prob / (lines_len - 1)
    uid = torch.tensor([0.0])

    for i in range(len(word_probs)):
        uid += (word_probs[i] - mu) ** 2 / (len(word_probs))
    sentence_uids = uid.detach().numpy()[0]
    sentence_surprisal = np.mean(word_probs)
    sentence_length = lines_len.detach().numpy()[0]
    torch.cuda.empty_cache()
    return (sentence_uids, sentence_surprisal, word_probs, sentence_length)

# Write output to CSV file
with open(output_path, "w") as f:
    writer = csv.writer(f)
    row = ["id", "prompt_id", "prompt", "text", "model", "temperature", "exam_type", "score", "score_level", "uid_var", "uid_diff", "uid_diff2", "mean", "sum", "surps", "n_token"]
    writer.writerow(row)

for i in tqdm(range(len(all_sents))):
    # Retrieve relevant columns for each row
    row_data = df.iloc[i]
    text = row_data['text']
    prompt_id = row_data['prompt_id']
    prompt = row_data['prompt']
    model_type = row_data['model']
    temperature = row_data['temperature']
    exam_type = row_data['exam_type']
    score = row_data['score']
    score_level = row_data['score_level']

    # Compute UID-related features
    uids, surps, probs, lens = get_line_uid_surp(text)
    uid_diff1 = local_diff(probs)
    uid_diff2 = local_diff2(probs)
    sum_probs = np.sum(probs)

    # Free memory and clear cache
    gc.collect()
    torch.cuda.empty_cache()

    # Write results to CSV
    row_ = [row_data['id'], prompt_id, prompt, text, model_type, temperature, exam_type, score, score_level, uids, uid_diff1, uid_diff2, surps, sum_probs, probs, lens]

    with open(output_path, "a") as f:
        writer = csv.writer(f)
        writer.writerow(row_)


CUDA status: True




vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/6.43G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

100%|██████████| 350/350 [04:04<00:00,  1.43it/s]


The script below is same as above just for the training dataset `text_train.csv` goes in and we get `uid_train.csv` in output saved.

In [2]:
from collections import Counter
import transformers
import pandas as pd
import numpy as np
import csv
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
import torch.nn.functional as F
from tqdm import tqdm
import gc
import re

# Set file paths (replace these paths with your actual file paths on Colab)
input_path = './data/text_train.csv'    # Path to your input CSV file
cache_path = './model_cache/gpt2-xl'     # Path to cache directory for GPT-2 XL
output_path = './scores/uid_train.csv' # Path to save the output CSV file

# Load data to DataFrame
df = pd.read_csv(input_path)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("CUDA status:", torch.cuda.is_available())

# Adjusted to use 'text' column for text generation and other relevant columns for output
all_sents = df['text']  # Text column
all_labels = df['score']  # Assuming 'score' or other columns are labels, adjust as needed

# Functions to get UID_Diff & UID_Diff2 features
def local_diff(x):
    d = 0
    for i in range(len(x)-1):
        d += abs(x[i+1]-x[i])
    return d/len(x)

def local_diff2(x):
    d = 0
    for i in range(len(x)-1):
        d += (x[i+1]-x[i])**2
    return d/len(x)

# Load GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-xl", cache_dir=cache_path)
model = GPT2LMHeadModel.from_pretrained("gpt2-xl", cache_dir=cache_path).to(device)
tokenizer.pad_token = tokenizer.eos_token

# Getting UID_var and all surprisals for "UID spans" features
def get_line_uid_surp(lines):
    with torch.no_grad():
        lines = tokenizer.eos_token + lines
        tok_res = tokenizer(lines, return_tensors='pt')
        input_ids = tok_res['input_ids'][0].to(device)
        attention_mask = tok_res['attention_mask'][0].to(device)
        lines_len = torch.sum(tok_res['attention_mask'], dim=1)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=input_ids)
        loss, logits = outputs[:2]
    line_log_prob = torch.tensor([0.0])
    word_probs = []
    for token_ind in range(lines_len - 1):
        token_prob = F.softmax(logits[token_ind], dim=0).cpu()
        token_id = input_ids[token_ind + 1].cpu()
        target_log_prob = -torch.log(token_prob[token_id]).detach().numpy()
        line_log_prob += -torch.log(token_prob[token_id]).detach().numpy()
        word_probs.append(target_log_prob)

    # mu = average surprisal
    mu = line_log_prob / (lines_len - 1)
    uid = torch.tensor([0.0])

    for i in range(len(word_probs)):
        uid += (word_probs[i] - mu) ** 2 / (len(word_probs))
    sentence_uids = uid.detach().numpy()[0]
    sentence_surprisal = np.mean(word_probs)
    sentence_length = lines_len.detach().numpy()[0]
    torch.cuda.empty_cache()
    return (sentence_uids, sentence_surprisal, word_probs, sentence_length)

# Write output to CSV file
with open(output_path, "w") as f:
    writer = csv.writer(f)
    row = ["id", "prompt_id", "prompt", "text", "model", "temperature", "exam_type", "score", "score_level", "uid_var", "uid_diff", "uid_diff2", "mean", "sum", "surps", "n_token"]
    writer.writerow(row)

for i in tqdm(range(len(all_sents))):
    # Retrieve relevant columns for each row
    row_data = df.iloc[i]
    text = row_data['text']
    prompt_id = row_data['prompt_id']
    prompt = row_data['prompt']
    model_type = row_data['model']
    temperature = row_data['temperature']
    exam_type = row_data['exam_type']
    score = row_data['score']
    score_level = row_data['score_level']

    # Compute UID-related features
    uids, surps, probs, lens = get_line_uid_surp(text)
    uid_diff1 = local_diff(probs)
    uid_diff2 = local_diff2(probs)
    sum_probs = np.sum(probs)

    # Free memory and clear cache
    gc.collect()
    torch.cuda.empty_cache()

    # Write results to CSV
    row_ = [row_data['id'], prompt_id, prompt, text, model_type, temperature, exam_type, score, score_level, uids, uid_diff1, uid_diff2, surps, sum_probs, probs, lens]

    with open(output_path, "a") as f:
        writer = csv.writer(f)
        writer.writerow(row_)


CUDA status: True


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
100%|██████████| 3338/3338 [33:14<00:00,  1.67it/s]


This Python script is foreseen for a text classifier that will use the logistic regression model in predicting the `model` label, which represents the various language models based on several **UID-related features** and **surprisal spans** calculated from AI-generated text. First of all, it loads training and test data from CSV files (`uid_train.csv` and `uid_test.csv`). It proceeds with pre-processing of a `probs` column, which contains raw **surprisal values**, or probability-related metrics. Following a series of string manipulations, these become a list of floats. This script also defines the function `spans` that calculates the **span of surprisals**. This is done by dividing the values of the surprisal into chunks or spans of particular length (`n=20`). After that, for each span, it calculates the **variance (UID)** and finds the span with maximum and minimum variance. Then, it concatenates such spans and uses them as features for the model.

Now, it prepares the training and testing data: it joins the **UID-related features**-`uid_var`, `uid_diff`, `uid_diff2`, and `mean`-with the **spans** data to create the feature matrices (`X_train` and `X_test`). The target variable (`y_train` and `y_test`) is `model`, that is, the language model used in generating the text.

The script then fits the logistic regression model on the training data-`X_train`, `y_train`, and makes predictions on the test set-`X_test`. At the end, the model is evaluated using the `classification_report` provided from scikit-learn, showing metrics such as precision, recall, and F1-score to determine how well the model performed the predictions on the test set.

In sum, this code serves to classify text samples based on features of language model behaviour and predictability, and logistic regression will be used as a classifier.



In [5]:
import pandas as pd
import numpy as np
from collections import Counter
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report as cr
import re

# Set file paths directly
train_file = "./scores/uid_train.csv"
test_file = "./scores/uid_test.csv"

# Load data
train = pd.read_csv(train_file)
test = pd.read_csv(test_file)

print(len(train), len(test))
k = Counter(train['model'])
label_list = list(k.keys())

# Define function to transform probs column
def transform_probs(df):
    df['probs'] = df['surps']
    df['probs'] = df['probs'].apply(lambda x: re.sub(' +', ',', x))
    df['probs'] = df['probs'].apply(lambda x: re.sub('\n', '', x))
    df['probs'] = df['probs'].apply(lambda x: x.strip('][').split(','))
    df['probs'] = df['probs'].apply(lambda x: list(filter(None, x)))
    df['probs'] = df['probs'].apply(lambda x: list(map(float, x)))
    return df['probs']

# Convert raw surprisal values to numerical format
train['surps'] = transform_probs(train)
test['surps'] = transform_probs(test)

# Define spans function
def spans(lst, n=20):
    max_uid, min_uid = -1, 10000
    span_max, span_min = [], []
    if len(lst) <= n:
        return None
    for i in range(0, len(lst), n):
        span = lst[i:i + n]
        if len(span) == n:
            uid = np.var(span)
            if uid > max_uid:
                max_uid = uid
                span_max = span
            if uid < min_uid:
                min_uid = uid
                span_min = span
    return span_min + span_max

# Calculate spans and clean NaNs
train['spans'] = list(map(spans, train['surps']))
test['spans'] = list(map(spans, test['surps']))
train = train[train['spans'].notna()].reset_index()
test = test[test['spans'].notna()].reset_index()

# Concatenate UID features
features = ['uid_var', 'uid_diff', 'uid_diff2', 'mean']
X_1 = train[features]
X_2 = pd.DataFrame(np.stack(train['spans']))
X_train = pd.concat((X_1, X_2), axis=1, ignore_index=True)

Z_1 = test[features]
Z_2 = pd.DataFrame(np.stack(test['spans']))
X_test = pd.concat((Z_1, Z_2), axis=1, ignore_index=True)

y_train = train['model']
y_test = test['model']

# Model and evaluation
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)
pred = model.predict(X_test)
print(cr(y_test, pred))


3338 350
                  precision    recall  f1-score   support

   gpt-3.5-turbo       0.59      0.56      0.57        86
         gpt2-xl       0.69      0.46      0.55        59
text-babbage-001       0.59      0.38      0.46        60
  text-curie-001       0.44      0.32      0.37        60
text-davinci-001       0.00      0.00      0.00         0
text-davinci-002       0.00      0.00      0.00         0
text-davinci-003       0.44      0.36      0.40        85

        accuracy                           0.42       350
       macro avg       0.39      0.30      0.34       350
    weighted avg       0.55      0.42      0.47       350



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


The warnings appear because, in the case of some classes in the test set-say, **text-davinci-001** and **text-davinci-002**-, no true samples are found. Therefore, recall and F1-score are undefined for such labels. By default, `classification_report` from Scikit-learn emits an `UndefinedMetricWarning` when there is no way to compute such a metric without true instances of the classes in the ground truth, while these metrics are set to zero. These warnings can be suppressed along with the handling of undefined metrics by using the `zero_division` parameter in the `classification_report` function that allows the user to define whether such metrics should be set to zero or one. For instance, `classification_report(y_test, pred, zero_division=1)` will suppress these warnings and set such metrics to one.

The classification report shows that, overall, the logistic regression model achieved a 42% success rate. It also varies a lot for classes. Precisions and recall values for models like **gpt-3.5-turbo** - Precision: 0.59, Recall: 0.56, and **gpt2-xl** - Precision: 0.69, Recall: 0.46 hint at very mixed performance in such a way that precision for some classes is good but quite weak in recall, leading to the model not correctly identifying instances of those classes. Some classes, such as **text-davinci-001** and **text-davinci-002**, do not have any instances in the test set; therefore, precision, recall, and F1-score for them are undefined. The macro and weighted averages are showing relatively high precision, with a macro average of 0.39 and a weighted average of 0.55, while recall is rather low, with a macro average of 0.30 and a weighted average of 0.42, giving way to moderate F1-scores. The model further has to be improved to handle class imbalance, hyperparameter tuning, and probably advanced feature engineering in order to classify the classes better.