<a href="https://colab.research.google.com/github/cecileloge/cs224v-truthsleuth-trendbender/blob/main/notebooks/TruthSleuth_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Truth Sleuth AI: Fact-Checking Agent for YouTube Videos | EVALUATION**
**Cecile Loge ep. Baccari** | ceciloge@stanford.edu | cecileloge@google.com \\
**Mohammad Rehan Ghori** | rghori@stanford.edu | rehang@google.com

---

**Motivation:** Misinformation is one of the most pressing threats of our time, and YouTube videos serve as a major platform through which it can spread [1]. Providing fact-checked information to address misleading content has been shown to be more effective than simply removing it [2].

**Project:** Can we build an application that takes a YouTube video as input and not only generates a list of the main claims made in the video but also fact-checks them?

---

* [1] An open letter to YouTube’s CEO from the world’s fact-checkers (on poynter.org), 2022. \\
* [2] Ecker, Ullrich KH, et al. "The effectiveness of short‐format refutational fact‐checks." British journal of psychology 111.1 (2020): 36-54.

---
## **Setting Up Everything**
Choosing the YouTube video url, and installing/importing libraries.

---

In [None]:
# Install & Import Libraries
import random
from IPython.display import clear_output
# Youtube Extractors
!pip install youtube-transcript-api
!pip install pytube
!pip install -U yt-dlp
!apt install ffmpeg
from youtube_transcript_api import YouTubeTranscriptApi
from pytube import extract

# Assembly AI
!pip install assemblyai
import assemblyai as aai

# Data & Tools
import pandas as pd
from google.colab import userdata
from PIL import Image
import json
import os
current_dir = os.getcwd()
from datetime import datetime, date
import time

# Google API
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Markdown for the final output
from IPython.display import display, Markdown, Latex
import textwrap

# Gemini API
!pip install -q -U google-generativeai
import google.generativeai as genai
from google.colab import userdata

# LangChain Prompting
!pip install langchain
from langchain import PromptTemplate

# For Web Scraping
!pip install requests
!pip install beautifulsoup4
!pip install wikipedia
from bs4 import BeautifulSoup
import requests
import wikipedia
import googlesearch as g

clear_output()
print("done!")

done!


In [None]:
# YouTube API Key (Google for Developers Platform)
DEVELOPER_KEY=userdata.get('DEVELOPER_KEY')

# Google Developer API Key for GenAI
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY2')
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel(model_name="gemini-1.5-flash")
GEM_SAFETY_SETTINGS = [
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_NONE"
    },
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_NONE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_NONE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_NONE"
    }
    ]

# Assembly AI API Key
AAI_API_KEY = userdata.get('AAI_API_KEY')
aai.settings.api_key = AAI_API_KEY

In [None]:
# Prompt Templates
!mkdir prompts
!curl -L -o prompts/reformat.prompt "https://drive.google.com/uc?export=download&id=1aykUMXxUR1gWOil5X83aF7aem1s5rjIT"
!curl -L -o prompts/claims.prompt "https://drive.google.com/uc?export=download&id=1AlKLI-IP05jMps4Ol7BN2RPcmAMH2H-J"
!curl -L -o prompts/factcheck.prompt "https://drive.google.com/uc?export=download&id=1cevbk44t7ZWJr3rwpu-b1ncqHYdq-p8x"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   224  100   224    0     0    102      0  0:00:02  0:00:02 --:--:--   116
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  8614  100  8614    0     0   3247      0  0:00:02  0:00:02 --:--:--  6734
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  4630  100  4630    0     0   1808      0  0:00:02  0:00:02 --:--:--  2040


---
## **[EVALUATION] STEP 3 | Fact-checking the claims by cross-checking reliable sources**

Cross-reference claims with reliable sources, and classify claims into true, unsure and false (ideally with links / sources).


This step leverages the Google FactCheck Claim Search API, the Wikipedia API and Google Search.

The LLM is called several times throughout this step via robust prompt engineering to interpret, cross-reference, and ultimately classify the claims.

---


In [None]:
# Our reliable sources for Fact-Checking
# 1 - Google Fact-Check API
GOOGLE_FACT_CHECK_API_KEY = userdata.get('GFC_API_KEY')
GOOGLE_FACT_CHECK_URL = 'https://factchecktools.googleapis.com/v1alpha1/claims:search'

# 2 - Wikipedia
wikipedia.set_lang('en')

# 3 - Google Search
GOOGLE_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"

In [None]:
def call_googlefacts(claim):
  """
  Function to call the Google Fact Check API
  Returns the claim reviews in the Claim Review structured format
  https://developers.google.com/search/docs/appearance/structured-data/factcheck
  """

  params = {
      'query': claim,
      'key': GOOGLE_FACT_CHECK_API_KEY
  }
  response = requests.get(GOOGLE_FACT_CHECK_URL, params=params)
  if response.status_code == 200:
      data = response.json()
      return data.get('claims', [])
  else:
      return None

def check_googlefacts(claim):
  """
  Function to process the Google Fact Check claim reviews
  Returns a string = concatenation of relevant results
  """
  results = call_googlefacts(claim)
  summary = ''
  if results:
      for i, r in enumerate(results):
          source = r.get('claimReview', [{}])[0].get('publisher').get('name')
          url = r.get('claimReview', [{}])[0].get('url')
          claimant = r.get('claimant')
          date = r.get('claimDate')
          text = r.get('text')
          truthfulness = r.get('claimReview', [{}])[0].get('textualRating')
          summary += f"Source #{i+1}: {source} at {url}.\n     Claimant: {claimant}\n     Description: {text}\n     Truthfulness: {truthfulness}\n\n"

  return summary

def check_googlesearch(claim, limit=2):
  """
  Function to call the Google Search API
  Calls Gemini to process the results and generate a summary
  Returns a string = concatenation of relevant results
  """
  urls = list(g.search(claim, stop=limit, lang='en'))
  headers = {"user-agent": GOOGLE_USER_AGENT}
  summary = ''
  for url in urls:
    session = requests.Session()
    website = session.get(url, headers=headers)
    web_soup = BeautifulSoup(website.text, 'html.parser')
    summary_promt = "I need you to give me the key information contained in a specific webpage article. I will provide you with html code. '\
    Do not describe the webpage or the article. Focus on extracting the key claims, and summarizing the information the page is giving in a precise, comprehensive yet concise way. '\
    Your answer should ideally help answer the question: " + claim + ".\n'\
    HTLM Code: <<< " + str(web_soup) + " >>>"
    retry_count = 0
    k = random.randint(1, 3)
    time.sleep(k)
    while retry_count < 3:
      try:
        response = model.generate_content(summary_promt, safety_settings=GEM_SAFETY_SETTINGS)
        summary += f"Source: {url}.\nDescription: {response.text}\n"
        retry_count = 3
      except Exception as e:
        k = random.randint(2, 7)
        time.sleep(k)
        retry_count += 1
    session.close()
  return summary

def check_wikipedia(claim):
  """
  Function to call the Wikipedia API and process the results
  Returns a string = concatenation/summary of relevant Wikipedia articles
  """
  summary = ''
  search_results = wikipedia.search(claim)
  for r in search_results[:4]:
    try:
      call = wikipedia.page(r)
      summary += f"From the \"{r}\" Wikipedia page ({call.url}): "+ call.content +"\n\n"
    except wikipedia.exceptions.DisambiguationError:
      summary += ''
    except wikipedia.exceptions.PageError:
      summary += ''
    except wikipedia.exceptions.WikipediaException:
      summary += ''
    except Exception:
      summary += ''
  return summary

In [None]:
def get_claim_summary(claim, questions):
  """
  Function to get the final say on a specific claim.
  Calls Gemini with the factcheck.prompt prompt.
  Returns json object with fields: `claim`, `verdict`, `reason`, `sources`.
  """
  summary_gfc = ''
  summary_search = ''
  for q in questions:
    if q == '':
      continue
    gfc = check_googlefacts(q)
    if gfc:
      summary_gfc += gfc + "\n"
  for q in questions:
    if q == '':
      continue
    search = check_googlesearch(q)
    summary_search += search + "\n"
  summary_wiki = check_wikipedia(claim)

  with open("prompts/factcheck.prompt", "r") as f:
        text = f.read()
  prompt_template = PromptTemplate.from_template(template=text, template_format="jinja2")
  fact_check_prompt: str = prompt_template.format(
      claim=claim,
      report_GFC=summary_gfc + "\n" + summary_search,
      report_wiki=summary_wiki,
      )

  retry_count = 0
  k = random.randint(1, 3)
  time.sleep(k)
  while retry_count < 3:
      try:
         response = model.generate_content(fact_check_prompt, safety_settings=GEM_SAFETY_SETTINGS)
         return response.text
      except Exception as e:
         #print(f"Error: {e}")
         k = random.randint(4, 8)
         time.sleep(k)
      retry_count += 1
  return None

---
## **FeVER Dataset**

Evaluating the Fact-Checking abilities with the FeVER Dataset.


Thorne, James, et al. "FEVER: a large-scale dataset for fact extraction and VERification." arXiv preprint arXiv:1803.05355 (2018).

---


In [None]:
import json
with open('paper_test.jsonl') as f:
    data = [json.loads(line) for line in f]

for element in data:
  del element["id"]
  del element["verifiable"]
  del element["evidence"]

fever = [x for x in data if x['label'] != 'NOT ENOUGH INFO']
fever = fever[50:100]

In [None]:
for i, element in enumerate(fever):
  print(f"element {i+1}")
  print(element)
  p = f"Output a list of 2 or 3 simple factual questions rephrasing the following claim: << {element['claim']} >>. \
  Questions should be factual and simple such that each single question contains only one single fact and can be answered through public records. \
  Each question should be able to be understood and provide enough context on its own. Write nothing more than the questions."
  response = model.generate_content(p, safety_settings=GEM_SAFETY_SETTINGS)
  k = random.randint(1, 5)
  time.sleep(k)
  element["questions"] = response.text.split("\n")
  clear_output()


In [None]:
for i, element in enumerate(fever):
  print(f"element {i+1}")
  print(element)

In [None]:
for i, element in enumerate(fever):
  print(f"element {i+1}")
  print(element)
  text = get_claim_summary(element["claim"], element["questions"])
  if text == None:
    element["verdict"] = "unsure"
    continue
  try:
    verdict = json.loads(text)
  except json.JSONDecodeError:
    verdict = json.loads(text[8:-4])

  element["verdict"] = verdict.get('verdict')
  clear_output()

In [None]:
capped = {"true": "SUPPORTS", "partly true": "SUPPORTS", "partly false": "REFUTES", "false": "REFUTES", "unsure": "NOT ENOUGH INFO"}
TP = 0 #True Positives from all but FALSE
TN = 0 #True Negatives from FALSE
FP = 0 #False Positives from all but FALSE
FN = 0 #False Negatives from FALSE
NE = 0 #Not enough info

for element in fever:
  element["final"] = capped[element["verdict"]]
  if element["label"] == element["final"]:
    if element["label"] == "REFUTES":
      TN += 1
    else:
      TP +=1
  elif element["label"] == "REFUTES" and element["final"] == "SUPPORTS":
    FP += 1
  elif element["label"] == "SUPPORTS" and element["final"] == "REFUTES":
    FN += 1
  else:
    NE +=1

print(f"TP: {TP}, TN: {TN}, FP: {FP}, FN: {FN}, NE: {NE}")


In [None]:
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
Accuracy = (TP + TN) / (TP + TN + FP + FN)
F1 = 2 * (Precision * Recall) / (Precision + Recall)

print(f"Precision: {Precision}, Recall: {Recall}, Accuracy: {Accuracy}, F1: {F1}")

Precision: 0.9523809523809523, Recall: 0.9523809523809523, Accuracy: 0.9574468085106383, F1: 0.9523809523809523


---
## **AVeriTeC Dataset**

Evaluating the Fact-Checking abilities with the AVeriTeC Dataset.


AVERITEC: A Dataset for Real-world Claim Verification with Evidence from the Web
Michael Schlichtkrull, Zhijiang Guo, Andreas Vlachos

---


In [None]:
import json

with open('data_dev.json') as f:
    data = json.load(f)

for element in data:
  del element["required_reannotation"]
  del element["justification"]
  del element["claim_date"]
  del element["speaker"]
  del element["original_claim_url"]
  del element["fact_checking_article"]
  del element["reporting_source"]
  del element["location_ISO_code"]
  del element["claim_types"]
  del element["fact_checking_strategies"]
  del element["questions"]
  if "cached_original_claim_url" in element:
    del element["cached_original_claim_url"]

averitec = [x for x in data if (x['label'] == 'Supported' or x['label'] == 'Refuted')]
averitec = averitec[50:100]

In [None]:
for i, element in enumerate(averitec[27:32]):
  print(f"element {i+1}")
  print(element)
  p = f"Output a list of 1 or 2 simple factual questions rephrasing the following claim: << {element['claim']} >>. \
  Questions should be factual and simple such that each single question contains only one single fact and can be answered through public records. \
  Each question should be able to be understood and provide enough context on its own. Write nothing more than the questions."
  response = model.generate_content(p, safety_settings=GEM_SAFETY_SETTINGS)
  k = random.randint(1, 5)
  time.sleep(k)
  q_list = response.text.split("\n")
  element["questions"] = [x for x in q_list if x != '']
  clear_output()

In [None]:
for i, element in enumerate(averitec[27:32]):
  print(f"element {i+1}")
  print(element)
  text = get_claim_summary(element["claim"], element["questions"])
  if text == None:
    element["verdict"] = "unsure"
    continue
  try:
    verdict = json.loads(text)
  except json.JSONDecodeError:
    verdict = json.loads(text[8:-4])

  element["verdict"] = verdict.get('verdict')
  clear_output()

In [None]:
capped = {"true": 'Supported', "partly true": 'Supported', "partly false": 'Refuted', "false": 'Refuted', "unsure": "NOT ENOUGH INFO"}
TP = 0 #True Positives from all but FALSE
TN = 0 #True Negatives from FALSE
FP = 0 #False Positives from all but FALSE
FN = 0 #False Negatives from FALSE
NEP = 0 #Not enough info for SUPPORTS
NEN = 0 #Not enough info for REFUTES

for element in averitec[27:32]:
  if "verdict" in element:
    element["final"] = capped[element["verdict"]]
    if element["label"] == element["final"]:
      if element["label"] == 'Refuted':
        TN += 1
      else:
        TP +=1
    elif element["label"] == 'Refuted' and element["final"] == 'Supported':
      FP += 1
    elif element["label"] == 'Supported' and element["final"] == 'Refuted':
      FN += 1
    elif element["label"] == 'Refuted' and element["final"] == "NOT ENOUGH INFO":
      NEN += 1
    elif element["label"] == 'Supported' and element["final"] == "NOT ENOUGH INFO":
      NEP += 1

print(f"TP: {TP}, TN: {TN}, FP: {FP}, FN: {FN}, NEP: {NEP}, NEN: {NEN}")