In [1]:
import os
%cd /content
!git clone --recursive https://github.com/BiteKirby3/FakeNewsChallenge
root_dir = "/content/FakeNewsChallenge/fnc-1-baseline"
os.chdir(root_dir)

/content
Cloning into 'FakeNewsChallenge'...
remote: Enumerating objects: 97, done.[K
remote: Counting objects: 100% (97/97), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 97 (delta 40), reused 71 (delta 17), pack-reused 0[K
Unpacking objects: 100% (97/97), 5.06 MiB | 5.74 MiB/s, done.


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import sklearn
import tensorflow as tf
import tqdm
import scipy 
import nltk
from datetime import date
import csv

In this Notebook, we use [OpenAI](https://platform.openai.com/docs/api-reference) API to solve the Fake News Challenge Stage 1 ([FNC-I](http://www.fakenewschallenge.org/)) - stance detection task. To call the API of OpenAI, you need to have your secret API key.

In [3]:
API_KEY = '<YOUR_API_KEY>'

#Data preprocessing

##Data Loading

We load the stances and article bodies into two separate containers.

In [4]:
from utils.dataset import DataSet

In [5]:
dataset_train = DataSet("train")
dataset_test = DataSet("test")

Reading dataset
Total stances: 49972
Total bodies: 1683
Reading dataset
Total stances: 25413
Total bodies: 904


We can access it through the *.stances* and *.articles* variables. Moreover, *.articles* is a dictionary of articles, indexed by the body id.

In [6]:
dataset_test.stances[0]

{'Headline': 'Ferguson riots: Pregnant woman loses eye after cops fire BEAN BAG round through car window',
 'Body ID': 2008,
 'Stance': 'unrelated'}

In [7]:
print(dataset_test.articles[dataset_test.stances[0]['Body ID']])

A RESPECTED senior French police officer investigating the Charlie Hebdo magazine massacre took his own life mere hours after the horrific attacks stunned the world.

Commissioner Helric Fredou, 45, turned a gun on himself in his police office in Limoges last Wednesday night, reported France 3.

A colleague found his body at 1am on Thursday, the day after three gunmen fired at the satirical magazine's office and left 12 people dead.

Speaking to our sister publication Mirror Online, the Union of Commissioners of the National Police confirmed Mr Fredou had taken his own life.


In a statement released after his death, a union spokesman said: "It is with great sadness that we were informed this morning of the death of our colleague Helric Fredou, assigned as Deputy Director of the Regional Service Judicial Police in Limoges.

"On this particular day of national mourning, police commissioners are hit hard by the tragic death of one of their own.

"The Union of Commissioners of the Nationa

##Construct a 1000-instance test dataset

We construct a 1000-instance test dataset, where we have 731 unrelated, 178 discuss, 17 disagree and 74 agree stance examples.

In [8]:
NB_UNRELATED = 731
NB_DISCUSS = 178
NB_DISAGREE = 17
NB_AGREE = 74
cpt_unrelated,cpt_discuss,cpt_disagree,cpt_agree = 0,0,0,0
testset_index = []

In [9]:
for i in range(len(dataset_test.stances)):
  if(dataset_test.stances[i]['Stance']=='unrelated' and cpt_unrelated<NB_UNRELATED):
    testset_index.append(i)
  elif(dataset_test.stances[i]['Stance']=='discuss' and cpt_discuss<NB_DISCUSS):
    testset_index.append(i)
  elif(dataset_test.stances[i]['Stance']=='agree' and cpt_agree<NB_AGREE):
    testset_index.append(i)
  elif(dataset_test.stances[i]['Stance']=='disagree' and cpt_disagree<NB_DISAGREE):
    testset_index.append(i)
  if len(testset_index) == 1000:
    break

#Pre-train, Prompt and Predict Paradigm

The idea is rather intuitive and simple to apply, in our case, news stance classification can be viewed as asking chatGPT a question, for example,

**What's the stance of the news body:**
$<Body>$

**to the news headline:**
$<Headline>$? 

**Choose a stance from "unrelated, discuss, agree, disagree". The stance is**

Then wait for the completion of the sentence answered by chatGPT.

This can be implemented through OpenAI's officiel API. 
We define a prompt template that includes the news headline as well as the news body and asks for the stance of the news body towards it. Then we filter the generated stances to select the relevant stance.(See OpenAI's [Chat Completion API](https://platform.openai.com/docs/api-reference/chat) for the detailed usages.)

Note that the quality of classification depends on the quality of the prompt, thus it may require some fine-tuning to obtain accurate results.

We choose `gpt-3.5-turbo` model, which is the most capable [GPT-3.5](https://platform.openai.com/docs/models/gpt-3-5) family model and is optimized for chat at 1/10th the cost of `text-davinci-003`. Since OpenAI limits the access rate to the API for normal users(3 requests/minute), we only test on a smaller dataset which contains only 1000 instances but keeps approximately the same proportion for each class as the original test dataset.

In [10]:
!pip install --upgrade openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.6-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.9/71.9 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp (from openai)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m52.9 MB/s[0m eta [36m0:00:00[0m
Collecting multidict<7.0,>=4.5 (from aiohttp->openai)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m14.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai)
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai)
  Downloadin

In [11]:
import os
import openai

In [12]:
openai.api_key = API_KEY

##Zero-Shot prompt

Zero-Shot prompt means classifying directly previously unseen text into categories that the model has never been explicitly trained to identify. Usually, the zero-shot prompt-based classifier needs proper prompt engineering to achieve high accuracy.

In [13]:
def classify_stance_zero_shot(news_body, news_headline):
    # Define prompt template
    headline_template = "Given the following news headline: '{}'."
    body_template = "Given the following news body: '{}'."
    question_template = "What is the stance of this news body towards this news headline? Please choose one of the following stances: unrelated, discuss, agree, disagree. The stance is "

    # Generate prompt
    news_body = news_body.replace("\n\n","\n")
    news_body = news_body.replace("\n\n","\n")
    headline_template = headline_template.format(news_headline)
    body_template = body_template.format(news_body)

    # Generate possible stances using ChatGPT
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": headline_template},
                  {"role": "user", "content": body_template},
                  {"role": "user", "content": question_template}
                  ],
        temperature = 0.5
    )

    # Filter stances to select the most relevant one
    stance = response.choices[0].message.content
    if stance[-1]=="." :
      stance = stance.rstrip(stance[-1])
    #if stance in ["unrelated", "discuss", "agree", "disagree"]:
    #    return stance
    #else:
    #    return None
    return stance

In [None]:
#Test 
print(dataset_test.stances[248])
print("The stance given by GPT3.5 is: "+classify_stance_zero_shot(dataset_test.articles[dataset_test.stances[248]['Body ID']], dataset_test.stances[248]['Headline']))

{'Headline': '‘Crabzilla’ spotted off the coast of Britain', 'Body ID': 893, 'Stance': 'disagree'}
The stance given by GPT3.5 is: disagree


In [None]:
#Classify the test dataset, we write the GPT responses to a csv file.
import time
filename = "/content/FakeNewsChallenge/result/"+"zero_shot_prompt_prediction"+str(date.today())+".csv"
with open(filename, 'w', newline='') as csvfile:
    fieldnames = ["STANCE_INDEX","ACTUAL_STANCE","PREDICT_STANCE"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for i in testset_index:
      actual_stance = dataset_test.stances[i]['Stance']
      headline = dataset_test.stances[i]['Headline']
      body = dataset_test.articles[dataset_test.stances[i]['Body ID']]
      predict_stance = classify_stance_zero_shot(dataset_test.articles[dataset_test.stances[i]['Body ID']], dataset_test.stances[i]['Headline'])
      writer.writerow({'STANCE_INDEX': str(i), 'ACTUAL_STANCE': str(actual_stance), 'PREDICT_STANCE': str(predict_stance)})
      csvfile.flush()
      time.sleep(21) #wait for 21s to not interrupt API access

##Few-Shot prompt

Few-shot prompting is a technique where the model is given a small number of example(typically between two and five), in order to quickly adapt to new context. Although it requires less data, usually this technique can allow for the generation of more versatile and adaptive text.

In our case, in addition to the test stance example, we also feed the input a `discuss`, a `unrelated`, an `agree` and a `disagree` stance example from the training set so that the model can learn to adapt to this context.

In [31]:
def classify_stance_few_shot(news_body, news_headline):
    # Define prompt template
    #One-shot for each stance
    #unrelated example
    unrelated_example = "News Headline: {}, News body: {}, Stance: unrelated".format(dataset_train.stances[0]['Headline'], dataset_train.articles[dataset_train.stances[0]['Body ID']].replace("\n\n","\n").replace("\n\n","\n"))
    #discuss example
    discuss_example = "News Headline: {}, News body: {}, Stance: discuss".format(dataset_train.stances[10]['Headline'], dataset_train.articles[dataset_train.stances[10]['Body ID']].replace("\n\n","\n").replace("\n\n","\n"))
    #disagree example
    disagree_example = "News Headline: {}, News body: {}, Stance: disagree".format(dataset_train.stances[4]['Headline'], dataset_train.articles[dataset_train.stances[4]['Body ID']].replace("\n\n","\n").replace("\n\n","\n"))
    #agree example
    agree_example = "News Headline: {}, News body: {}, Stance: agree".format(dataset_train.stances[1]['Headline'], dataset_train.articles[dataset_train.stances[1]['Body ID']].replace("\n\n","\n").replace("\n\n","\n"))

    #test example
    headline_template = "Given the following news headline: '{}'."
    body_template = "Given the following news body: '{}'."
    question_template = "What is the stance of this news body towards this news headline? Please choose one of the following stances: unrelated, discuss, agree, disagree. The stance is "

    # Generate prompt
    news_body = news_body.replace("\n\n","\n")
    news_body = news_body.replace("\n\n","\n")
    headline_template = headline_template.format(news_headline)
    body_template = body_template.format(news_body)

    # Generate possible stances using ChatGPT
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": unrelated_example},
                  {"role": "user", "content": discuss_example},
                  {"role": "user", "content": disagree_example},
                  {"role": "user", "content": agree_example},
                  {"role": "user", "content": headline_template},
                  {"role": "user", "content": body_template},
                  {"role": "user", "content": question_template}
                  ],
        temperature = 0.5
    )

    # Filter stances to select the most relevant one
    stance = response.choices[0].message.content
    if stance[-1]=="." :
      stance = stance.rstrip(stance[-1])
    #if stance in ["unrelated", "discuss", "agree", "disagree"]:
    #    return stance
    #else:
    #    return None
    return stance

In [None]:
#Test Few-shot prompt
print(dataset_test.stances[248])
print("The stance given by GPT3.5 is: "+classify_stance_few_shot(dataset_test.articles[dataset_test.stances[248]['Body ID']], dataset_test.stances[248]['Headline']))

{'Headline': '‘Crabzilla’ spotted off the coast of Britain', 'Body ID': 893, 'Stance': 'disagree'}
The stance given by GPT3.5 is: disagree


In [None]:
#Classify the test dataset, we write the GPT responses to a csv file.
import time
filename = "/content/FakeNewsChallenge/result/"+"few_shot_prompt_prediction"+str(date.today())+".csv"
with open(filename, 'w', newline='') as csvfile:
    fieldnames = ["STANCE_INDEX","ACTUAL_STANCE","PREDICT_STANCE"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for i in testset_index:
      actual_stance = dataset_test.stances[i]['Stance']
      headline = dataset_test.stances[i]['Headline']
      body = dataset_test.articles[dataset_test.stances[i]['Body ID']]
      predict_stance = classify_stance_few_shot(dataset_test.articles[dataset_test.stances[i]['Body ID']], dataset_test.stances[i]['Headline'])
      writer.writerow({'STANCE_INDEX': str(i), 'ACTUAL_STANCE': str(actual_stance), 'PREDICT_STANCE': str(predict_stance)})
      csvfile.flush()
      time.sleep(21) #wait for 21s to not interrupt API access

#Scoring classifier

In [14]:
!pip install transformers datasets evaluate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.29.0-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m48.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m42.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.0-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0 (from transformers)
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from

In [15]:
from utils.score import report_score
import evaluate

In [50]:
#read the generated csv
output = pd.read_csv('/content/FakeNewsChallenge/result/zero_shot_prompt_prediction2023-04-26.csv')  

In [51]:
output['PREDICT_STANCE'].value_counts()

discuss                                                                                                     602
unrelated                                                                                                   173
agree                                                                                                       154
disagree                                                                                                     64
Discuss                                                                                                       6
unrelated. (There is no mention of Obama ordering the Fed to adopt the Euro currency in this news body.)      1
Name: PREDICT_STANCE, dtype: int64

We need to map the filled prompt to stance label.

In [52]:
output.loc[ output['PREDICT_STANCE'] == "Discuss", 'PREDICT_STANCE'] = 'discuss'
output.loc[ output['PREDICT_STANCE'] == "unrelated. (There is no mention of Obama ordering the Fed to adopt the Euro currency in this news body.)", 'PREDICT_STANCE'] = 'unrelated'

In [58]:
print("Confusion matrix for zero-shot learning:")
report_score(output["ACTUAL_STANCE"], output["PREDICT_STANCE"])

Confusion matrix for zero-shot learning:
-------------------------------------------------------------
|           |   agree   | disagree  |  discuss  | unrelated |
-------------------------------------------------------------
|   agree   |    25     |     5     |    44     |     0     |
-------------------------------------------------------------
| disagree  |     5     |     8     |     4     |     0     |
-------------------------------------------------------------
|  discuss  |    39     |    11     |    125    |     3     |
-------------------------------------------------------------
| unrelated |    85     |    40     |    435    |    171    |
-------------------------------------------------------------
Score: 227.75 out of 451.75	(50.41505257332595%)


50.41505257332595

In [18]:
f1 = evaluate.load("f1")

def compute_metric(df):
  #unrelated
  label2id = {"unrelated": 1, "discuss": 0, "agree": 0, "disagree": 0}
  df_unrelated = df.replace({"ACTUAL_STANCE": label2id})
  df_unrelated = df_unrelated.replace({"PREDICT_STANCE": label2id})
  f1_unrelated = f1.compute(references=df_unrelated["ACTUAL_STANCE"], predictions=df_unrelated["PREDICT_STANCE"])
  #discuss
  label2id = {"unrelated": 0, "discuss": 1, "agree": 0, "disagree": 0}
  df_discuss = df.replace({"ACTUAL_STANCE": label2id})
  df_discuss = df_discuss.replace({"PREDICT_STANCE": label2id})
  f1_discuss = f1.compute(references=df_discuss["ACTUAL_STANCE"], predictions=df_discuss["PREDICT_STANCE"])
  #agree
  label2id = {"unrelated": 0, "discuss": 0, "agree": 1, "disagree": 0}
  df_agree = df.replace({"ACTUAL_STANCE": label2id})
  df_agree = df_agree.replace({"PREDICT_STANCE": label2id})
  f1_agree = f1.compute(references=df_agree["ACTUAL_STANCE"], predictions=df_agree["PREDICT_STANCE"])
  #disagree
  label2id = {"unrelated": 0, "discuss": 0, "agree": 0, "disagree": 1}
  df_disagree = df.replace({"ACTUAL_STANCE": label2id})
  df_disagree = df_disagree.replace({"PREDICT_STANCE": label2id})
  f1_disagree = f1.compute(references=df_disagree["ACTUAL_STANCE"], predictions=df_disagree["PREDICT_STANCE"])

  #f1 macro
  f1_macro = (f1_unrelated['f1']+f1_discuss['f1']+f1_agree['f1']+f1_disagree['f1'])/4
  
  return {'f1_macro': f1_macro, 'f1_unrelated': f1_unrelated['f1'],'f1_discuss': f1_discuss['f1'],'f1_agree': f1_agree['f1'],'f1_disagree': f1_disagree['f1']}

Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

In [57]:
print('Other metrics for zero-shot learning:')
compute_metric(output)

Other metrics for zero-shot learning:


{'f1_macro': 0.27819895501464204,
 'f1_unrelated': 0.37790055248618787,
 'f1_discuss': 0.31806615776081426,
 'f1_agree': 0.21929824561403508,
 'f1_disagree': 0.19753086419753085}