### Fake headlines generated by ChatGPT

We asked ChatGPT to generate headlines similar to the style of our two news sources, Onion and HuffPost.
We gathered 510 fake headlines for each, 1020 in total. In the process of prompting, we used prompting techniques to get better performance from the model, to make sure the model itself is considering what is the style of each news source and what are their properties. And to make sure the generated headlines are diverse.

We gathered all headlines in the text file `data/chatgpt_onionstyle_data/chatgpt_onionstyle.txt`.
Here are some examples:
* Like Onion:
    * local dad hopes to be remembered as the guy who never saw ‘the fast and furious’
    * area man insists world is flat, despite never leaving his basement
    * scientists warn new virus could spread via passive-aggressive texts
    * area siblings finally agree to stop faking peace during family gatherings
    * survey finds the majority of couples argue over whether to watch tv or just stare into space
    * couple celebrates 10 years together by agreeing they still don’t know how to do the dishes
    * report finds 68% of workplace meetings could be emails, and 32% could be screams
    * new candle company promises scents that smell exactly like regret
    * local dog suspiciously eyeing amazon package
    * new gym introduces revolutionary workout where you just watch people exercise
* Like HuffPost:
    * how ai is quietly shaping the future of democracy
    * this common food might be sabotaging your weight loss goals
    * the heartwarming story of a dog who saved its owner’s life
    * why millennials are obsessed with vintage tupperware
    * how a group of teens built an app that’s saving lives
    * this grandmother’s daily walks are inspiring an entire community
    * why experts say you should declutter your mind before your closet
    * how tech layoffs are creating new opportunities for start-ups
    * how to stop comparing yourself to others – for good
    * this single mom’s side hustle turned into a $1 million business


In our opinion, ChatGPT did a reasonable job at faking headlines of these two sources.
Many fake headlines for Onion are genuinely funny with a dead-pan tone, and many fake headlines imitating HuffPost are serious, inspirational, and practical.
Each seems to follow the style of its respective source.

Now let's analyze...

In [1]:
import spacy
import os
import pandas as pd
from sklearn.model_selection import train_test_split as split

from src.preprocessing import convert_txt_to_json, convert_to_conllu
from src.data_util import load_data
from src.naive_bayes import NaiveBayesClassifier
from src.patterns import fit_patterns

Let's first convert the data to the conllu format:

In [2]:
# convert txt file to JSON file
input_file = "../data/chatgpt_onionstyle_data/chatgpt_onionstyle.txt"
output_file = "../data/chatgpt_onionstyle_data/chatgpt_onionstyle.json"
convert_txt_to_json(
    input_file=input_file, output_file=output_file, link="https://chatgpt.com/"
)

# convert JSON file to conllu dataset
output_file = "../data/chatgpt_onionstyle_data/chatgpt_onionstyle.conllu"
nlp = spacy.load("en_core_web_sm")
file_path = "../data/chatgpt_onionstyle_data/chatgpt_onionstyle.json"
data = pd.read_json(file_path, lines=True)
convert_to_conllu(data, output_file, nlp)

100%|██████████| 1020/1020 [00:08<00:00, 116.16it/s]


And test our Naive Bayes Bag of Words model on this data:

In [3]:
# load the data
headlines = load_data("../data/headline_data/headlines.conllu")
fake_headlines = load_data("../data/chatgpt_onionstyle_data/chatgpt_onionstyle.conllu")

# split into training and test sets
SEED = 42
train_headlines, other_headlines = split(headlines, test_size=0.3, random_state=SEED)
_, test_headlines = split(other_headlines, test_size=0.5, random_state=SEED)
print(
    f"Number of headlines for training, testing, \
        and testing fake headlines is {len(train_headlines)}, {len(test_headlines)}, \
        and {len(fake_headlines)} resp."
)

Number of headlines for training, testing,         and testing fake headlines is 20033, 4293,         and 1020 resp.


In [4]:
# fit the Naive Bayes Bag of Word model to training data
naive_bayes = NaiveBayesClassifier(ngram_range=(1, 1))
naive_bayes.fit(train_headlines)

100%|██████████| 20033/20033 [00:23<00:00, 850.41it/s] 


In [5]:
# test on test data (of the original dataset, not fake)
_, _ = naive_bayes.test(test_headlines)

100%|██████████| 4293/4293 [00:01<00:00, 3103.64it/s]


               precision    recall  f1-score   support

Non-sarcastic       0.84      0.86      0.85      2237
    Sarcastic       0.84      0.82      0.83      2056

     accuracy                           0.84      4293
    macro avg       0.84      0.84      0.84      4293
 weighted avg       0.84      0.84      0.84      4293



In [6]:
# now test on fake headlines generated by ChatGPT and get false positive and negatives
fp, fn = naive_bayes.test(fake_headlines)

100%|██████████| 1020/1020 [00:00<00:00, 3137.04it/s]

               precision    recall  f1-score   support

Non-sarcastic       0.89      0.95      0.92       510
    Sarcastic       0.95      0.89      0.92       510

     accuracy                           0.92      1020
    macro avg       0.92      0.92      0.92      1020
 weighted avg       0.92      0.92      0.92      1020






As can be seen, our model that is trained on the original dataset, performs very well on classifying these fake headlines generated by ChatGPT.
In fact, it reaches a much higher value for all three metrics (precision, recall, and f1-score) on both classes than it had for our actual test set.

We believe this is because ChatGPT is utilizing the most common patterns of each source too much when generating fake headlines.
Consider some familiar patterns by Onion, the ones that include expressions like
* local man...
* nation believes...
* study shows...

These patterns are abundant in Onion headlines.
But by taking a look at the datasets, it becomes clear that ChatGPT is using these patterns too much.
The same is true for some familiar patterns used by HuffPost:
* how...
* why...
* here's...

Now let's confirm these hypotheses by running some patterns on each dataset:

In [7]:
patterns =  [
    ".*area man.*",
    ".*nation.*",
    ".*local.*",
    ".*study.*",
]

In [8]:
_ = fit_patterns(test_headlines, patterns)

-- class = 1: precision=0.7926267281105991, recall=0.08365758754863813
-- examples that fit these patterns: 
bigot annoyed local mosque already vandalized before he got there
study finds over 5 million birds die annually from head-on collisions with clouds
area man may have lied about having sex
study: american spiritual epiphanies increasingly juice-based
white nationalists have been saying 'diversity is not our strength' for years
local welder suffering from welder's block
guy from sopranos drops by local pizza parlor for free slice
government shutdown forces national zoo to turn off panda suicide cam
you can't study college coaches without looking at the players
study exposes risks of conducting research while driving


In [9]:
_ = fit_patterns(fake_headlines, patterns)

-- class = 1: precision=0.9408866995073891, recall=0.37450980392156863
-- examples that fit these patterns: 
area man starts company that specializes in making people feel uncomfortable on zoom
study finds majority of bees too busy to explain how pollination works
local cat declares war on ceiling fan, promises total victory
nation’s weathermen announce bold new plan to blame every wrong forecast on wind
study finds most horoscopes just general enough to be creepy
local man feels profound sense of accomplishment after completing a single task
local museum announces new exhibit: the history of museum announcements
area man announces he’s not watching tv, just resting his eyes on the screen
local influencer claims moral high ground after liking charity post first
nation agrees: captcha tests now harder than college entrance exams


We can see that these four patterns alone account for more than a third of the fake dataset generated by ChatGPT!
The recall on the sarcastic class is about four times, and the precision is also much more!

Now let's do the same for some patterns for HuffPost:

In [10]:
patterns =  [
    ".*why.*",
    ".*how.*",
    ".*here's.*",
]

In [11]:
_ = fit_patterns(test_headlines, patterns, label="0")

-- class = 0: precision=0.7516339869281046, recall=0.10281627179257935
-- examples that fit these patterns: 
why trevor noah thinks hillary clinton will never connect with people
newly sworn-in north korean official wondering how he'll eventually be executed
why and how to eliminate mortgage charges by third parties
you won't believe why this man's license was suspended
the history of how salt and pepper became the world's most popular pairing
dwight howard responds to lebron james' full-court shot with one of his own
why did the dying grandma shred $1 million?
upper-middle-class woman worries there's better coffee she doesn't know about
grandfather seems proud of how many people polio killed
this chinese video ​explains​ why beijing rejects the south china sea ruling


In [12]:
_ = fit_patterns(fake_headlines, patterns, label="0")

-- class = 0: precision=0.8615384615384616, recall=0.4392156862745098
-- examples that fit these patterns: 
how a group of friends turned a hobby into a successful side hustle
why more doctors are saying you should skip the diet and focus on this
why co-living spaces are the future of affordable housing
how tiktok changed the way we discover music forever
why this indie actor is suddenly a household name
how the latest election results are shaking up the senate
why gen z is rejecting traditional career paths in droves
how a pet chicken became a small town’s mascot
why more people are turning to forest therapy for mental health


And so the same can be seen for the non-sarcastic class.
These three very simple patterns account for more than 40 percent of fake headlines by ChatGPT, whereas in the original dataset they were about 10 percent, and with noticeably less precision.

Therefore, one can conclude that ChatGPT is reinforcing the sterotypes and styles regarding the headlines of these two sources, by using their common patterns even more than they do.
On the other hand, one can simultaneously conclude that these were the easier patterns to detect, and that ChatGPT has a lot of room for improvement in terms of covering all the styles, not just the more common and easier ones.