# Capstone Project Spotlight: Distinguishing AI-Generated News from Human Written Articles

## Introduction
##### In an era where artificial intelligence (AI) has become adept at producing text that closely mimics human writing, the ability to accurately differentiate between content created by humans and that generated by machines is more crucial than ever. Our project centers on developing and comparing various machine learning models to effectively make this distinction, which is a fundamental step in addressing the broader challenge of fake news detection.


### The spread of online misinformation poses a serious threat to democracies in the 21st century. It erodes trust in public institutions and increases political polarization, weakening the foundation that democratic systems are built upon.

### Initial Setup

In [1]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/kchoi22/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /Users/kchoi22/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /Users/kchoi22/nltk_data...


True

In [2]:
import json
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

In [3]:
# # Load the JSON data into a pandas DataFrame
# # Load Human real news
# with open('data/HR.json', 'r') as file:
#     data = json.load(file)

# HR_df = pd.DataFrame.from_dict(data, orient='index')

## Loading Data

### Purpose
##### We need to load our data, stored in JSON format, into a structured DataFrame. This process is essential as it converts raw data into a workable format, allowing for more efficient manipulation and analysis.

In [4]:
# Function to load JSON data into a DataFrame
def load_json_to_df(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)
    return pd.DataFrame.from_dict(data, orient='index')

# List of JSON files
json_files = ['HR.json', 'HF.json', 'MR.json', 'MF.json']
dfs = []

# Load each JSON file into a DataFrame and store them in a list
for json_file in json_files:
    df = load_json_to_df('data/' + json_file)
    dfs.append(df)

# Naming the DataFrames
HR_df, HF_df, MR_df, MF_df = dfs

In [5]:
# Human Real News
HR_df

Unnamed: 0,id,text,title,description
0,gossipcop-951329,11 Summer Camp Movies That'll Make You Nostalg...,11 Summer Camp Movies That'll Make You Nostalg...,Nothing says summer like watching a movie all ...
1,gossipcop-861360,Info Category: Richest Business › Executives N...,Charrisse Jackson Jordan Net Worth,What is Charrisse Jackson Jordan's net worth?
2,gossipcop-911046,Warning: This story contains major spoilers fr...,Raúl Esparza exits Law & Order: SVU after six ...,The actor reveals why he decided to leave the ...
3,gossipcop-899120,Lil Peep died of an overdose of fentanyl and g...,Lil Peep Cause of Death Revealed,Pima County Office of the Medical Examiner con...
4,gossipcop-919455,Goop is kicking off its weekly podcast in a bi...,"Gwyneth Paltrow, Oprah talk Weinstein, #MeToo’...",Goop is kicking off its weekly podcast in a bi...
...,...,...,...,...
8163,gossipcop-875489,For free real time breaking news alerts sent s...,The top interior design trends for millennials,From hand-baked clay tiles to LED lights that ...
8164,gossipcop-844263,Gilmore Girls: A Year in the Life made its Net...,"Gilmore Girls Video: Lauren Graham, Alexis Ble...",Gilmore Girls: A Year in the Life made its Net...
8165,gossipcop-917467,Why Is It Airing Now?\n\nAccording to the exec...,"The O.J. Simpson Interview on Fox: Gripping, G...",On Sunday Fox aired “O.J. Simpson: The Lost Co...
8166,gossipcop-924877,Just when you thought this season of Vanderpum...,Kristen Doute and James Kennedy Hooked Up Rumo...,Just when you thought this season of Vanderpum...


In [6]:
# Human Fake News
HF_df

Unnamed: 0,id,text,title,description
0,gossipcop-1991455469,✕ Close Meghan Markle and Prince Harry have an...,As it happened: Prince Harry and Meghan Markle...,The wedding will take place in spring 2018
1,gossipcop-7798039260,Kim Kardashian and Kanye West are pulling out ...,Kim & Kanye Install At-Home Panic Room After P...,'Keeping the kids safe is the couples number o...
2,gossipcop-7817725290,Prince Harry and Meghan currently live at Kens...,£1.4million spent renovating Prince Harry and ...,Prince Harry and Meghan might not be living in...
3,gossipcop-5111151830,They can't get enough of the Biebs on this sho...,Photos from Dancing With the Stars: Special Gu...,Photos from Dancing With the Stars: Special Gu...
4,gossipcop-9658632569,Ben Affleck is keeping life with his three kid...,Jennifer Garner ‘Doesn’t Want’ Her Kids Around...,Jennifer Garner ‘doesn’t want’ her three kids ...
...,...,...,...,...
4079,gossipcop-7065786957,There was no shortage of celebrity beefs in 20...,The Biggest Celebrity Feuds of 2017,There was no shortage of celebrity beefs in 2017.
4080,gossipcop-1188213997,Kim Kardashian and her sisters seem pretty uni...,Kim Kardashian Criticizes Scott Disick for Dat...,See what Kim said on 'KUWTK' inside!
4081,gossipcop-9024002184,"When John and I got together, I found my love ...",Chrissy Teigen Opens Up for the First Time Abo...,"""The mental pain of knowing I let so many peop..."
4082,gossipcop-3520745692,Yikes! Less than 3 months after giving birth t...,Kylie Jenner Suffers Pregnancy Scare 3 Months ...,Yikes! Less than 3 months after giving birth t...


In [7]:
# AI Real News
MR_df

Unnamed: 0,id,description,text,title
0,gossipcop-951329,Nothing says summer like watching a movie all ...,"With summer just around the corner, it's the p...",11 Summer Camp Movies That'll Make You Nostalg...
1,gossipcop-861360,What is Charrisse Jackson Jordan's net worth?,"Charrisse Jackson Jordan, an American reality ...",Charrisse Jackson Jordan Net Worth
2,gossipcop-911046,The actor reveals why he decided to leave the ...,Warning: This story contains major spoilers fr...,Raúl Esparza Exits Law & Order: SVU After Six ...
3,gossipcop-899120,Pima County Office of the Medical Examiner con...,The Pima County Office of the Medical Examiner...,Lil Peep's Cause of Death Revealed
4,gossipcop-919455,Goop is kicking off its weekly podcast in a bi...,Goop is kicking off its weekly podcast in a bi...,"Gwyneth Paltrow, Oprah Discuss Weinstein and #..."
...,...,...,...,...
4164,gossipcop-849360,Kailyn Lowry revealed she was recently 'hookin...,"Kailyn Lowry, star of Teen Mom 2, recently ope...",Kailyn Lowry Reveals Regrets About Relationshi...
4165,gossipcop-923609,"Farrah Abraham, one of the stars of the MTV sh...","Farrah Abraham, star of MTV's Teen Mom OG, has...",Farrah Abraham Drops $5 Million 'Sex Shaming' ...
4166,gossipcop-933361,Kim DePaola can't say enough good things about...,"The Real Housewives of New Jersey star, Kim De...","Real Housewives' Kim DePaola on Botched, Terry..."
4167,gossipcop-902565,See the red carpet looks (and Time's Up black ...,The 2018 Golden Globes red carpet is one unlik...,Black but not boring! See the red carpet looks...


In [8]:
# AI Fake News
MF_df

Unnamed: 0,id,text,title,description
0,gossipcop-1991455469,Excitement and anticipation are in the air as ...,Royal Family prepares to welcome modern bride ...,The wedding will take place in spring 2018
1,gossipcop-7798039260,In the wake of Kim Kardashian's traumatic Pari...,Kim and Kanye's At-Home Panic Room Sparks Outr...,'Keeping the kids safe is the couples number o...
2,gossipcop-7817725290,"uke and Duchess of Sussex, Prince Harry and Me...",£1.4 Million Renovation for Prince Harry and M...,Prince Harry and Meghan might not be living in...
3,gossipcop-5111151830,"In a surprise turn of events, former President...",Former President Obama and Beyoncé grace the D...,Photos from Dancing With the Stars: Special Gu...
4,gossipcop-9658632569,"In an unexpected turn of events, Hollywood act...",Jennifer Garner Caught Banning Lindsay Shookus...,Jennifer Garner ‘doesn’t want’ her three kids ...
...,...,...,...,...
4079,gossipcop-7065786957,As we bid farewell to the drama-filled year th...,The Most Anticipated Celebrity Feuds of 2018,There was no shortage of celebrity beefs in 2017.
4080,gossipcop-1188213997,Reality television star Kim Kardashian is faci...,Kim Kardashian Accused of Hypocrisy After Crit...,See what Kim said on 'KUWTK' inside!
4081,gossipcop-9024002184,"Chrissy Teigen, the popular model and social m...",Chrissy Teigen Reveals Secret Struggle with Po...,"""The mental pain of knowing I let so many peop..."
4082,gossipcop-3520745692,Kylie Jenner and Travis Scott's relationship m...,Kylie Jenner and Travis Scott's Relationship o...,Yikes! Less than 3 months after giving birth t...


## Data Cleaning

### Purpose
##### The raw text data often contains noise and irrelevant information. Cleaning the text by removing special characters and transforming all text to lowercase standardizes the input for further processing, ensuring consistency across data samples.


In [9]:
def clean_text(text):
    # Remove special characters and numbers
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    # Convert text to lowercase
    text = text.lower()
    return text

HR_df['cleaned_text'] = HR_df['text'].apply(clean_text)
HF_df['cleaned_text'] = HF_df['text'].apply(clean_text)
MR_df['cleaned_text'] = MR_df['text'].apply(clean_text)
MF_df['cleaned_text'] = MF_df['text'].apply(clean_text)

# Print the cleaned text for each DataFrame
print("Human Real News:")
print(HR_df['cleaned_text'])
print("\nHuman Fake News:")
print(HF_df['cleaned_text'])
print("\nAI Real News:")
print(MR_df['cleaned_text'])
print("\nAI Fake News:")
print(MF_df['cleaned_text'])

Human Real News:
0        summer camp movies thatll make you nostalgic ...
1       info category richest business  executives net...
3       lil peep died of an overdose of fentanyl and g...
4       goop is kicking off its weekly podcast in a bi...
                              ...                        
8163    for free real time breaking news alerts sent s...
8164    gilmore girls a year in the life made its netf...
8165    why is it airing now\n\naccording to the execu...
8166    just when you thought this season of vanderpum...
8167    a cringeworthy video of katie couric talking a...
Name: cleaned_text, Length: 8168, dtype: object

Human Fake News:
0        close meghan markle and prince harry have ann...
1       kim kardashian and kanye west are pulling out ...
2       prince harry and meghan currently live at kens...
3       they cant get enough of the biebs on this show...
4       ben affleck is keeping life with his three kid...
                              ...              

## Text Preprocessing

### Purpose
##### Beyond basic cleaning, text data requires deeper preprocessing to be suitable for machine learning. This includes tokenization, stop word removal, and lemmatization, which simplify the text and reduce it to its meaningful essence. This step is critical for highlighting the textual features that are most informative for our classification task.


In [27]:
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text)
    # Remove stop words
    filtered_tokens = [token for token in tokens if token not in stop_words]
    # Lemmatize the tokens
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]
    # Join the tokens back into a single string
    processed_text = ' '.join(lemmatized_tokens)
    return processed_text

HR_df['preprocessed_text'] = HR_df['cleaned_text'].apply(preprocess_text)
HF_df['preprocessed_text'] = HF_df['cleaned_text'].apply(preprocess_text)
MR_df['preprocessed_text'] = MR_df['cleaned_text'].apply(preprocess_text)
MF_df['preprocessed_text'] = MF_df['cleaned_text'].apply(preprocess_text)

print("Human Real News:")
print(HR_df['preprocessed_text'])
print("\nHuman Fake News:")
print(HF_df['preprocessed_text'])
print("\nAI Real News:")
print(MR_df['preprocessed_text'])
print("\nAI Fake News:")
print(MF_df['preprocessed_text'])

Human Real News:
0       summer camp movie thatll make nostalgic childh...
1       info category richest business executive net w...
3       lil peep died overdose fentanyl generic xanax ...
4       goop kicking weekly podcast big way oprah big ...
                              ...                        
8163    free real time breaking news alert sent straig...
8164    gilmore girl year life made netflix debut six ...
8165    airing according executive producer terry wron...
8166    thought season vanderpump rule couldnt get sca...
8167    cringeworthy video katie couric talking matt l...
Name: preprocessed_text, Length: 8168, dtype: object

Human Fake News:
0       close meghan markle prince harry announced eng...
1       kim kardashian kanye west pulling stop keep fa...
2       prince harry meghan currently live kensington ...
3       cant get enough biebs show back first week sea...
4       ben affleck keeping life three kid relationshi...
                              ...         

In [11]:
# Save the preprocessed data to a new JSON file 
# Do this only once!
# HR_df.to_json('HR_prep.json', orient='index')
# HF_df.to_json('HF_prep.json', orient='index')
# MF_df.to_json('MF_prep.json', orient='index')
# MR_df.to_json('MR_prep.json', orient='index')

In [12]:
# # Load the JSON data into a pandas DataFrame
# # Load Human real news
# with open('data/HR_prep.json', 'r') as file:
#     data = json.load(file)

# HR_prep_df = pd.DataFrame.from_dict(data, orient='index')
# HR_df = HR_prep_df['preprocessed_text']
# HR_df

## Preparing Processed Data

### Purpose
##### With text data now preprocessed, we reorganize it by loading the cleaned and structured text back into new DataFrames. This step ensures our data remains organized and accessible for model training and evaluation.


In [13]:
# Function to load JSON data into a DataFrame
def load_json_to_df(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)
    return pd.DataFrame.from_dict(data, orient='index')

# List of JSON files
json_files = ['HR_prep.json', 'HF_prep.json', 'MR_prep.json', 'MF_prep.json']
dfs = []

# Load each JSON file into a DataFrame and store them in a list
for json_file in json_files:
    df = load_json_to_df('data/' + json_file)
    dfs.append(df)

# Naming the DataFrames
HR_df, HF_df, MR_df, MF_df = dfs

In [14]:
HR = HR_df['preprocessed_text']
HR

0       summer camp movie thatll make nostalgic childh...
1       info category richest business executive net w...
3       lil peep died overdose fentanyl generic xanax ...
4       goop kicking weekly podcast big way oprah big ...
                              ...                        
8163    free real time breaking news alert sent straig...
8164    gilmore girl year life made netflix debut six ...
8165    airing according executive producer terry wron...
8166    thought season vanderpump rule couldnt get sca...
8167    cringeworthy video katie couric talking matt l...
Name: preprocessed_text, Length: 8168, dtype: object

In [15]:
HF = HF_df['preprocessed_text']
HF

0       close meghan markle prince harry announced eng...
1       kim kardashian kanye west pulling stop keep fa...
2       prince harry meghan currently live kensington ...
3       cant get enough biebs show back first week sea...
4       ben affleck keeping life three kid relationshi...
                              ...                        
4079    shortage celebrity beef whether fight costars ...
4080    kim kardashian sister seem pretty unimpressed ...
4081    john got together found love cooking one earli...
4082    yikes le month giving birth baby stormi kylie ...
4083    beyonc knowles jay z welcomed twin according r...
Name: preprocessed_text, Length: 4084, dtype: object

In [16]:
MR = MR_df['preprocessed_text']
MR

0       summer around corner perfect time take trip me...
1       charrisse jackson jordan american reality tele...
3       pima county office medical examiner confirmed ...
4       goop kicking weekly podcast big way oprah winf...
                              ...                        
4164    kailyn lowry star teen mom recently opened reg...
4165    farrah abraham star mtvs teen mom og decided d...
4166    real housewife new jersey star kim depaola rec...
4167    golden globe red carpet one unlike time initia...
4168    kanye west renowned rapper time grammy winner ...
Name: preprocessed_text, Length: 4169, dtype: object

In [17]:
MF = MF_df['preprocessed_text']
MF

0       excitement anticipation air royal family get r...
1       wake kim kardashians traumatic paris robbery r...
2       uke duchess sussex prince harry meghan markle ...
3       surprise turn event former president barack ob...
4       unexpected turn event hollywood actress jennif...
                              ...                        
4079    bid farewell dramafilled year anticipation wha...
4080    reality television star kim kardashian facing ...
4081    chrissy teigen popular model social medium per...
4082    kylie jenner travis scott relationship may jeo...
4083    according exclusive report tmz beyonc jay z na...
Name: preprocessed_text, Length: 4084, dtype: object

## Data Preparation for Modeling

In [18]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

### Data Aggregation

#### Purpose
##### Before we can train our machine learning models, it is essential to compile all individual datasets into a single DataFrame. This consolidated data framework facilitates consistent processing and analysis across all text samples, ensuring that our models are trained on a unified dataset.


In [19]:
# Concatenate all data into one DataFrame
df = pd.concat([HR, HF, MR, MF], ignore_index=True)

### Label Assignment

#### Purpose
##### Each text sample in our dataset needs a corresponding label to indicate whether it's human-written or AI-generated. This step is critical as it prepares our dataset for supervised learning, where each input (text) must have an associated output (label) for the model to learn from.


In [20]:
# Assigning labels: 0 for human news, 1 for AI news
labels = [0]*len(HR) + [0]*len(HF) + [1]*len(MR) + [1]*len(MF)

### Data Splitting

#### Purpose
##### To evaluate the effectiveness of our machine learning models, we must test them on unseen data. This is accomplished by splitting our dataset into two parts: training and testing sets. The training set is used to train the models, teaching them to recognize patterns between features (text) and labels (human or AI). The testing set, which the models have not seen during training, is used to assess how well the models generalize to new, unseen data. This split ensures that our performance metrics reflect the model's capability to perform in real-world scenarios, not just on the data it has learned.

#### Parameters:
- **test_size=0.2**: Allocates 20% of the dataset for testing, which is a standard split ratio in machine learning, providing a good balance between training data quantity and testing accuracy.
- **random_state=42**: Ensures reproducibility of the split. This setting acts as a seed for the random number generator used in splitting the dataset, allowing us and others to reproduce the exact split in the future.


In [21]:
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df, labels, test_size=0.2, random_state=42)

### Feature Extraction with TF-IDF

#### Purpose
##### After dividing the dataset, we utilize the Term Frequency-Inverse Document Frequency (TF-IDF) technique to transform text into numerical vectors. This method emphasizes words that are crucial for understanding content while diminishing the impact of frequently appearing but less informative words.

#### Why TF-IDF?
- **Relevance**: It enhances model accuracy by emphasizing words that provide the most informational value about the document.


#### Application in Our Project
TF-IDF is instrumental for our project as it transforms extensive text data into a structured format that's optimal for training our machine learning models, ensuring that features used in model training are the most representative of the text's content.


In [22]:
# Creating TF-IDF vectorizer
tfidf_vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

## Training the Logistic Regression Model

#### Purpose
##### We now apply Logistic Regression, a widely used statistical method for binary classification tasks. This model is known for its simplicity and effectiveness in predicting categorical outcomes.

#### Model Training
- **Configuration**: We set `max_iter=1000` to ensure the model has sufficient iterations to converge on the optimal weights during training.
- **Training**: The model is trained on the TF-IDF vectors of the training data (`X_train_tfidf`), learning to associate the vectors with the corresponding labels (`y_train`).

#### Model Prediction
- **Testing**: After training, the Logistic Regression model is used to make predictions on the TF-IDF vectors of the test dataset (`X_test_tfidf`). These predictions allow us to evaluate the model's performance in terms of accuracy, helping us understand how well the model can generalize to new, unseen data.

#### Significance
Using Logistic Regression provides a baseline for the performance of text classification, allowing us to compare its results with more complex models if needed. This comparison can highlight the strengths and limitations of different approaches in the context of our text classification task.


In [23]:
# Logistic Regression model
logistic_regression_model = LogisticRegression(max_iter=1000)
logistic_regression_model.fit(X_train_tfidf, y_train)
logistic_regression_predictions = logistic_regression_model.predict(X_test_tfidf)

### Training the Naive Bayes Model

#### purpose
We continue our analysis by implementing the Naive Bayes classifier, renowned for its efficacy in text classification due to its simplicity and efficient handling of large datasets.

#### Model Training
- **Approach**: Naive Bayes calculates the probability of each text category based on the TF-IDF vector features, leveraging its assumption of feature independence.

#### Model Prediction
- **Testing**: After training, the model is tested on the unseen TF-IDF vectors from our test set. This step evaluates the model’s ability to generalize to new data once again.

#### Evaluation
- **Metrics**: We assess model performance using key metrics such as accuracy and F1-score, which help quantify its effectiveness in differentiating human-written from AI-generated texts.

### Significance
Deploying Naive Bayes allows for a comparison with Logistic Regression, helping identify the most suitable model for our needs based on empirical evidence.



In [24]:
# Naive Bayes model
naive_bayes_model = MultinomialNB()
naive_bayes_model.fit(X_train_tfidf, y_train)
naive_bayes_predictions = naive_bayes_model.predict(X_test_tfidf)

### Evaluating the Logistic Regression Model
##### After training our Logistic Regression model, we evaluated its performance on the test dataset to determine how accurately it can identify human-written versus AI-generated texts.


In [25]:
# Evaluate Logistic Regression model
print("Logistic Regression Accuracy:", accuracy_score(y_test, logistic_regression_predictions))
print("Logistic Regression Classification Report:")
print(classification_report(y_test, logistic_regression_predictions))

Logistic Regression Accuracy: 0.9366008290660814
Logistic Regression Classification Report:
              precision    recall  f1-score   support

           0       0.92      0.98      0.95      2452
           1       0.97      0.87      0.92      1649

    accuracy                           0.94      4101
   macro avg       0.94      0.93      0.93      4101
weighted avg       0.94      0.94      0.94      4101



#### Qualitative Analysis
##### The high accuracy and F1-scores indicate that Logistic Regression can be a great choice for our task. The model excels in distinguishing between the two types of texts, particularly notable in its ability to avoid false positives in AI-generated text identification.

### Implications
##### These results provide strong evidence that our feature engineering and model training approaches are effective. The high precision for AI-generated texts is particularly promising, as it suggests the model can be a reliable tool for filtering out AI-generated content in practical applications.

### Limitations and Considerations
##### While the results are promising, the slightly lower recall for AI-generated texts indicates room for improvement, possibly by further tuning the model or exploring alternative feature engineering techniques.

### Evaluating the Naive Bayes Model
##### The Naive Bayes model has been assessed to understand its effectiveness in classifying texts as either human-written or AI-generated.


In [26]:
# Evaluate Naive Bayes model
print("Naive Bayes Accuracy:", accuracy_score(y_test, naive_bayes_predictions))
print("Naive Bayes Classification Report:")
print(classification_report(y_test, naive_bayes_predictions))

Naive Bayes Accuracy: 0.9163618629602536
Naive Bayes Classification Report:
              precision    recall  f1-score   support

           0       0.89      0.99      0.93      2452
           1       0.97      0.81      0.89      1649

    accuracy                           0.92      4101
   macro avg       0.93      0.90      0.91      4101
weighted avg       0.92      0.92      0.91      4101



## Qualitative Analysis

##### Along with the Logistic Regression model, the Naive Bayes model performed well, especially in minimizing false negatives for human-written texts (high recall). However, its lower recall for AI-generated texts suggests some cases were missed, which could be critical depending on the application context.

### Implications
##### This performance indicated that while Naive Bayes is quite reliable, it may require additional adjustments or supplementary techniques to improve its detection rates for AI-generated content, especially in scenarios where failing to detect such content could have significant consequences.

### Limitations and Considerations
##### The main limitation observed is the trade-off between recall and precision in detecting AI-generated texts. This could potentially be addressed by exploring more complex models or adjusting the threshold for classification to balance the recall and precision better.