<a href="https://colab.research.google.com/github/cecileloge/cs224v-truthsleuth-trendbender/blob/main/notebooks/%5BCS224V%5D_Truth_Sleuth_%2B_Trend_Bender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Truth Sleuth & Trend Bender AI: Multimodal Agent for YouTube Videos**
**Mohammad Rehan Ghori** | rghori@stanford.edu | rehang@google.com \\
**Cecile Loge ep. Baccari** | ceciloge@stanford.edu | cecileloge@google.com

---

**Motivation:** Misinformation is one of the most pressing threats of our time, and YouTube videos serve as a major platform through which it can spread [1]. Providing fact-checked information to address misleading content has been shown to be more effective than simply removing it [2].



**Project:** Can we build an application that takes a YouTube video as input and not only generates a list of the main claims made in the video but also fact-checks them? Ultimately, could that even change people's minds?

In particular, could our conversational agent influence the comment section of a video, keep the discussion away from hate/conspiracy and protect people from scams?

**Theme 1**: Dangerous Diet Recommendations & Health Claims.
**Theme 2**: Manosphere & Misogynistic videos.


---

* [1] An open letter to YouTube’s CEO from the world’s fact-checkers (on poynter.org), 2022. \\
* [2] Ecker, Ullrich KH, et al. "The effectiveness of short‐format refutational fact‐checks." British journal of psychology 111.1 (2020): 36-54.

---
## **Setting Up Everything**
Choosing the YouTube video url, and installing/importing libraries.

---

In [None]:
# Provide the video url
VIDEO_URL = "https://www.youtube.com/watch?v=8Jl4zm5ftCM"

In [None]:
# Install & Import Libraries
import random
from IPython.display import clear_output

# Youtube Extractors
!pip install youtube-transcript-api
!pip install pytube
!pip install -U yt-dlp
!apt install ffmpeg
from youtube_transcript_api import YouTubeTranscriptApi
from pytube import extract

# Assembly AI
!pip install assemblyai
import assemblyai as aai

# Data & Tools
import pandas as pd
import csv
from google.colab import userdata
from PIL import Image
import json
import os
current_dir = os.getcwd()
from datetime import datetime, date
import time
import warnings
warnings.filterwarnings('ignore')
from html import unescape

# Google API
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Markdown for the final output
from IPython.display import display, Markdown, Latex
import textwrap

# Gemini API
!pip install -q -U google-generativeai
import google.generativeai as genai
from google.colab import userdata

# LangChain Prompting
!pip install langchain
from langchain import PromptTemplate

# For Web Scraping
!pip install requests
!pip install beautifulsoup4
!pip install wikipedia
from bs4 import BeautifulSoup
import requests
import wikipedia
import googlesearch as g

clear_output()
print("Libraries have been imported / installed! \n>> You can proceed! :)")

Libraries have been imported / installed! 
>> You can proceed! :)


In [None]:
# YouTube API Key (Google for Developers Platform)
DEVELOPER_KEY=userdata.get('DEVELOPER_KEY')

# Google Developer API Key for GenAI
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY2')
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel(model_name="gemini-1.5-flash") # "gemini-1.5-pro" "gemini-1.5-flash"
GEM_SAFETY_SETTINGS = [
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_NONE"
    },
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_NONE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_NONE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_NONE"
    }
    ]
# Assembly AI API Key
AAI_API_KEY = userdata.get('AAI_API_KEY')
aai.settings.api_key = AAI_API_KEY

print("API keys are now set!")

API keys are now set!


In [None]:
# Fact-Check Prompt Templates
!mkdir prompts
!curl -L -o prompts/reformat.prompt "https://drive.google.com/uc?export=download&id=1aykUMXxUR1gWOil5X83aF7aem1s5rjIT"
!curl -L -o prompts/claims.prompt "https://drive.google.com/uc?export=download&id=1AlKLI-IP05jMps4Ol7BN2RPcmAMH2H-J"
!curl -L -o prompts/factcheck.prompt "https://drive.google.com/uc?export=download&id=1cevbk44t7ZWJr3rwpu-b1ncqHYdq-p8x"
!curl -L -o prompts/trend.prompt "https://drive.google.com/uc?export=download&id=1GylnTltwIXGoVZ-s11EIsMdSuxKIloo6"
clear_output()
print("Fact-check prompts have been uploaded in the 'prompt' folder and are ready to be used!")

Fact-check prompts have been uploaded in the 'prompt' folder and are ready to be used!


---
## **STEP 1 | Extracting info & audio from Video URL**

Functions to process a video from a provided YouTube link. Should output a text transcript from the audio - along with descriptions of the video (title, author, tags) and a summary of the comments.

---

In [None]:
def get_comments(video_url):
  """
  Function to get top 100 comments from a YouTube video.
  Saves them into comments.csv. Returns a panda dataframe.
  """
  youtube = build("youtube", "v3", developerKey=DEVELOPER_KEY)
  video_id = extract.video_id(video_url)

  try:
    # Retrieve comment thread using the youtube.commentThreads().list() method
    response = youtube.commentThreads().list(
        part="snippet",
        videoId=video_id,
        maxResults=100,
        order="relevance"
    ).execute()

    comments = []
    for item in response["items"]:
      comment_text = item["snippet"]["topLevelComment"]["snippet"]["textDisplay"]
      likes = item["snippet"]["topLevelComment"]["snippet"]["likeCount"]
      comment_text = unescape(comment_text)
      comments.append({"comment": comment_text, "num_of_likes": likes})

      #if 'nextPageToken' in response:
      #  response = youtube.commentThreads().list(
      #      part="snippet",
      #      videoId=video_id,
      #      maxResults=100,
      #      order="relevance"
      #      pageToken = response['nextPageToken']
      #  ).execute()
      #else:
      #  break
      comments_df = pd.DataFrame(comments).sort_values(by=['num_of_likes'], ascending=False, ignore_index=True)
      comments_df.to_csv("comments.csv", index=False)
    if len(comments) == 0:
      return comments
    return comments_df
  except HttpError as error:
    print(f"An HTTP error {error.http_status} occurred:\n {error.content}")
    return None

In [None]:
def get_video_details(video_id):
  """
  Function to get details from a YouTube video.
  Returns a tuple for title, channel, tags, views, likes.
  """
  youtube = build('youtube', 'v3', developerKey=DEVELOPER_KEY)
  request = youtube.videos().list(part='snippet,statistics', id=video_id)
  details = request.execute()
  thumbnail_url = details['items'][0]['snippet']['thumbnails']['high']['url']
  channel = details['items'][0]['snippet']['channelTitle']
  title = details['items'][0]['snippet']['title']
  tags = details['items'][0]['snippet'].get('tags')
  likes = int(details['items'][0]['statistics']['likeCount'])
  views = int(details['items'][0]['statistics']['viewCount'])
  videodate = details['items'][0]['snippet']['publishedAt']
  return channel, title, tags, likes, views, thumbnail_url, videodate


In [None]:
def get_captions(video_url, video_id):
  """
  Function to get audio captions in 'en' (English) from a YouTube video.
  Either from the YouTube subtitles if they exist, or from Assembly AI.
  Uses Gemini to format the raw audio captions, and returns a string.
  """
  try:
    yt = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
    captions = ''
    for i in yt:
      captions += i['text']+" "
  except Exception as e:
    print(f"Error: {e}")
    print(f"Using Assembly AI instead...")
    !yt-dlp --get-url -f bestaudio $VIDEO_URL > audio.txt
    with open('audio.txt', 'r') as file:
      AUDIO_URL = file.read()
    config = aai.TranscriptionConfig(auto_highlights=True)
    transcriber = aai.Transcriber()
    transcript = transcriber.transcribe(AUDIO_URL, config)
    captions = transcript.text

  # Using Gemini to format the raw audio captions
  with open("prompts/reformat.prompt", "r") as f:
    text = f.read()
  prompt_template = PromptTemplate.from_template(template=text, template_format="jinja2")
  prompt: str = prompt_template.format(captions=captions)
  response = model.generate_content(prompt, safety_settings=GEM_SAFETY_SETTINGS)
  audio_captions_formatted = response.text

  return audio_captions_formatted

---
## **STEP 2 | Extracting the claims to fact-check from Video audio**

Functions to extract the top claims made in the video. We will be using Google's Gemini with robust prompt engineering - leveraging the LangChain library.

---

In [None]:
def extract_claims(video_url):
  """
  Function to extract the top claims that should be fact-checked.
  Uses Gemini with the claims.prompt prompt.
  Returns a tuple with: title, channel, thumbnail_url, claims, captions.
      claims is a json object with fields 'claim', 'questions', 'passage', 'relevance'
  """
  video_id = extract.video_id(video_url)
  channel, title, _, _, _, thumbnail_url, videodate = get_video_details(video_id)
  videodate = datetime.strptime(videodate[:10], '%Y-%m-%d').strftime("%Y-%m-%d")
  audio_captions_formatted = get_captions(video_url, video_id)

  # Using the prompt template and calling Gemini
  with open("prompts/claims.prompt", "r") as f:
    text = f.read()

  prompt_template = PromptTemplate.from_template(template=text, template_format="jinja2")
  claims_prompt: str = prompt_template.format(
      todaydate=date.today().strftime("%Y-%m-%d"),
      videodate=videodate,
      channel=channel,
      title=title,
      captions=audio_captions_formatted,
      )
  response = model.generate_content(claims_prompt, safety_settings=GEM_SAFETY_SETTINGS)

  json_promt = "Make sure the following text can be read directly by json.loads(): <<< " + response.text + " >>>. '\
  Don't output anything else than your version of the text."
  response = model.generate_content(json_promt, safety_settings=GEM_SAFETY_SETTINGS)
  claims = json.loads(response.text[8:-4])

  return title, channel, thumbnail_url, claims, audio_captions_formatted

---
## **STEP 3 | Fact-checking the claims by cross-checking reliable sources**

Cross-reference claims with reliable sources, and classify claims into true, unsure and false (ideally with links / sources).


This step leverages Data Commons, the Google FactCheck Claim Search API, the Wikipedia API and Google Search.

The LLM is called several times throughout this step via robust prompt engineering to interpret, cross-reference, and ultimately classify the claims.  

---

[3] Radhakrishnan, Prashanth, et al. "Knowing When to Ask--Bridging Large Language Models and Data." arXiv preprint arXiv:2409.13741 (2024).

---

In [None]:
# Our reliable sources for Fact-Checking
# 1 - Google Fact-Check API
GOOGLE_FACT_CHECK_API_KEY = userdata.get('GFC_API_KEY')
GOOGLE_FACT_CHECK_URL = 'https://factchecktools.googleapis.com/v1alpha1/claims:search'

# 2 - Wikipedia
wikipedia.set_lang('en')

# 3 - Google Search
GOOGLE_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"

In [None]:
def call_googlefacts(claim):
  """
  Function to call the Google Fact Check API
  Returns the claim reviews in the Claim Review structured format
  https://developers.google.com/search/docs/appearance/structured-data/factcheck
  """

  params = {
      'query': claim,
      'key': GOOGLE_FACT_CHECK_API_KEY
  }
  response = requests.get(GOOGLE_FACT_CHECK_URL, params=params)
  if response.status_code == 200:
      data = response.json()
      return data.get('claims', [])
  else:
      return None

def check_googlefacts(claim):
  """
  Function to process the Google Fact Check claim reviews
  Returns a string = concatenation of relevant results
  """
  results = call_googlefacts(claim)
  summary = ''
  if results:
      for i, r in enumerate(results):
          source = r.get('claimReview', [{}])[0].get('publisher').get('name')
          url = r.get('claimReview', [{}])[0].get('url')
          claimant = r.get('claimant')
          date = r.get('claimDate')
          text = r.get('text')
          truthfulness = r.get('claimReview', [{}])[0].get('textualRating')
          summary += f"Source #{i+1}: {source} at {url}.\n     Claimant: {claimant}\n     Description: {text}\n     Truthfulness: {truthfulness}\n\n"

  return summary

def check_googlesearch(claim, limit=2):
  """
  Function to call the Google Search API
  Calls Gemini to process the results and generate a summary
  Returns a string = concatenation of relevant results
  """
  urls = list(g.search(claim, stop=limit, lang='en'))
  headers = {"user-agent": GOOGLE_USER_AGENT}
  summary = ''
  for url in urls:
    session = requests.Session()
    website = session.get(url, headers=headers)
    web_soup = BeautifulSoup(website.text, 'html.parser')
    summary_promt = "I need you to give me the key information contained in a specific webpage article. I will provide you with html code. '\
    Do not describe the webpage or the article. Focus on extracting the key claims, and summarizing the information the page is giving in a precise, comprehensive yet concise way. '\
    Your answer should ideally help answer the question: " + claim + ".\n'\
    HTLM Code: <<< " + str(web_soup) + " >>>"
    retry_count = 0
    k = random.randint(0, 1)
    time.sleep(k)
    while retry_count < 3:
      try:
         response = model.generate_content(summary_promt, safety_settings=GEM_SAFETY_SETTINGS)
         summary += f"Source: {url}.\nDescription: {response.text}\n"
         retry_count = 3
      except Exception as e:
         time.sleep(2)
      retry_count += 1
    session.close()
  return summary

def check_wikipedia(claim):
  """
  Function to call the Wikipedia API and process the results
  Returns a string = concatenation/summary of relevant Wikipedia articles
  """
  summary = ''
  search_results = wikipedia.search(claim)
  for r in search_results[:2]:
    try:
      call = wikipedia.page(r)
      summary += f"From the \"{r}\" Wikipedia page ({call.url}): "+ call.content +"\n\n"
    except wikipedia.exceptions.DisambiguationError:
      summary += ''
    except wikipedia.exceptions.PageError:
      summary += ''
    except wikipedia.exceptions.WikipediaException:
      summary += ''
    except Exception:
      summary += ''
  return summary

In [None]:
def get_claim_summary(claim, questions):
  """
  Function to get the final say on a specific claim.
  Calls Gemini with the factcheck.prompt prompt.
  Returns json object with fields: `claim`, `verdict`, `reason`, `sources`.
  """
  summary_gfc = ''
  summary_search = ''
  for q in questions:
    gfc = check_googlefacts(q)
    if gfc:
      summary_gfc += gfc + "\n"
  for q in questions:
    search = check_googlesearch(q)
    summary_search += search + "\n"
  summary_wiki = check_wikipedia(claim)

  with open("prompts/factcheck.prompt", "r") as f:
        text = f.read()
  prompt_template = PromptTemplate.from_template(template=text, template_format="jinja2")
  fact_check_prompt: str = prompt_template.format(
      claim=claim,
      report_GFC=summary_gfc + "\n" + summary_search,
      report_wiki=summary_wiki,
      )
  k = random.randint(3, 5)
  time.sleep(k)
  retry_count = 0
  while retry_count < 3:
      try:
         response = model.generate_content(fact_check_prompt, safety_settings=GEM_SAFETY_SETTINGS)
         return response.text
      except Exception as e:
         #print(f"Error: {e}")
         k = random.randint(1, 3)
         time.sleep(k)
         retry_count += 1
  return None

---
## **TRUTH SLEUTH AGENT | Putting it all together & Generating the Fact-Check Report**

Generating the final report, leveraging Markdown for formatting. Option to skip printing the "Unsure" claims.

---


In [None]:
def generate_report(video_url, skip_unsure = True, print_report = True):

  # Extracting the claims
  title, channel, thumbnail_url, claims, captions = extract_claims(video_url)

  # Formatting
  color = {"question": "black", "true": "MediumSpringGreen", "partly true": "LightGreen", "partly false": "lightcoral", "false": "red", "unsure": "grey"}
  capped = {"true": "TRUE", "partly true": "PARTLY TRUE", "partly false": "PARTLY FALSE", "false": "FALSE", "unsure": "UNSURE"}
  delimiter = "\n"+"_"*100+"\n"

  # Generating Report Header
  thumb = Image.open(requests.get(thumbnail_url, stream=True).raw)
  formatted_title1 = f"<font size='+2' color='Bisque'><blockquote>📓📓 🔍 **TRUTH SLEUTH FACT-CHECK REPORT** 🔍 📓📓</blockquote></font>"
  formatted_title2 = f"<font size='+2' color='white'><blockquote>**{title}** by {channel}</blockquote></font>"

  # Getting each claim's verdict + Formatting again
  report = ""
  formatted_report = ""
  for i, c in enumerate(claims["claims"]):
    clear_output()
    print(f"Going over claim #{i+1}: {c['claim']}")
    text = get_claim_summary(c['claim'], c['questions'])
    if text == None:
      continue
    text = text.replace("$", "USD ")

    try:
      verdict = json.loads(text)
    except json.JSONDecodeError:
      verdict = json.loads(text[8:-4])
    if (verdict.get('verdict') == "unsure" and skip_unsure):
      continue

    formatted_claim = f"<font size='+1' color='white'><blockquote>**• {c['claim']}**</blockquote></font>"
    formatted_verdict = f"<font size='+1' color='{color[verdict.get('verdict')]}'><blockquote>**{capped[verdict.get('verdict')]}**\n\n</blockquote></font>"
    links = ""
    for l in verdict.get('sources'):
      links += "\n\n• " + l
    report += "Claim: " + c['claim'] + "This is: "+capped[verdict.get('verdict')] + "\n"+verdict.get('reason') + "\nSources: " + str(verdict.get('sources')) + "\n\n"
    formatted_report += delimiter + formatted_claim + formatted_verdict + verdict.get('reason') + links

  clear_output()
  if print_report:
    display(Markdown(formatted_title1 + formatted_title2))
    display(thumb)
    display(Markdown(formatted_report))
    display(Markdown(delimiter))

  return title, channel, claims, captions, report, formatted_report

In [None]:
warnings.filterwarnings('ignore')
title, channel, claims, captions, fact_check_report, formatted_fact_check_report = generate_report(VIDEO_URL, True, True)

---
## **TREND BENDER AGENT | Influencing the comment section**

Ultimately, could we even change people's minds? In particular, could our conversational agent influence the comment section of a video, keep the discussion away from hate/conspiracy and protect people from scams?

**Theme 1**: Dangerous Diet Culture / Weight Loss videos.
**Theme 2**: Manosphere & Misogynistic videos.  


---

In [None]:
# Choose your theme:
THEME = "manosphere" #"manosphere", "health", "diet"
CONTEXT_WIKI_PAGES = {
    "manosphere": ["gender role", "intimate relationship", "violence against women"],
    "health": ["Quackery", "Pseudoscience", "Alternative medicine"],
    "diet": ["Eating disorder", "Body image"]
}

CONTEXT_SEARCH = {
    "manosphere": ["understanding Andrew Tate’s appeal to lost men", "Do Men Actually Not Want to Date Intelligent Women?", "The Power Couple Effect: Why Working Together Yields Success"],
    "health": ["The Menace of Wellness Influencers", "Health misinformation is rampant on social media", "Is fear mongering just as bad as diet culture?"],
    "diet": ["Diet Culture", "A review of current knowledge about the impact of diet on mental health", "Can What We Eat Determine How We Think?", "How misinformation is making us fear our food"]
}

# Get context from wikipedia
def get_context(theme):
  """
  Function to call the Wikipedia API / Search API to get context from the theme's corresponding Wikipedia pages
  as well as key searches from Google
  Returns a string = concatenation/summary of relevant Wikipedia articles
  """
  summary = ''
  for r in CONTEXT_WIKI_PAGES[theme]:
    try:
      call = wikipedia.page(r)
      summary += f"From the \"{r}\" Wikipedia page ({call.url}): "+ call.summary +"\n"+"_"*150+"\n\n"
    except wikipedia.exceptions.DisambiguationError:
      summary += ''
    except wikipedia.exceptions.PageError:
      summary += ''
    except wikipedia.exceptions.WikipediaException:
      summary += ''
    except Exception:
      summary += ''
  for r in CONTEXT_SEARCH[theme]:
    summary += check_googlesearch(r) +"\n"+"_"*150+"\n\n"
  return summary

In [None]:
def extract_trend(channel, title, captions, comments, context):
  """
  Function to extract the trends from the video's comment section.
  Returns a string.
  """
  # Using the prompt template and calling Gemini
  with open("prompts/trend.prompt", "r") as f:
    text = f.read()

  prompt_template = PromptTemplate.from_template(template=text, template_format="jinja2")
  trend_prompt: str = prompt_template.format(
      channel=channel,
      title=title,
      captions=captions,
      comments=comments,
      context=context,
      )
  response = model.generate_content(trend_prompt, safety_settings=GEM_SAFETY_SETTINGS)

  return response.text

In [None]:
warnings.filterwarnings('ignore')

context = get_context(THEME)
comments = get_comments(VIDEO_URL)
trends = extract_trend(channel, title, captions, comments, context)

sample_comments = ''
for i, c in enumerate(comments.comment[:10]):
  sample_comments += "Comment #" + str(i+1) + c + "\n"

clear_output()
print(context)

In [None]:
# Comment Prompt Templates

!curl -L -o prompts/comment_1A.prompt "https://drive.google.com/uc?export=download&id=1Wc_CbJNHffAuh6xhAsOK329iLbqeWrsb"
!curl -L -o prompts/comment_1AB.prompt "https://drive.google.com/uc?export=download&id=1MO3kA51ukw8w8jVtH7IzdYVArlXREzAJ"
!curl -L -o prompts/comment_1ABC.prompt "https://drive.google.com/uc?export=download&id=1FEZ_ZOhAkLCIfn4wi9avsqwwnbGMVnbl"
!curl -L -o prompts/comment_1ABCD.prompt "https://drive.google.com/uc?export=download&id=10PFYXZ5DDA74IqIH-0bv3Ew8SI_GajOS"
!curl -L -o prompts/comment_2A.prompt "https://drive.google.com/uc?export=download&id=1p2alhX0BPgK9ghCF42RyKjcV7D4IcFtF"
!curl -L -o prompts/comment_2AB.prompt "https://drive.google.com/uc?export=download&id=1Y7l4xO1Rc4vAHx1_oyNI81mSnF2bIgg7"
!curl -L -o prompts/comment_2ABC.prompt "https://drive.google.com/uc?export=download&id=1XZGlLU1NAUOkWJOkycBwlIV3oOGhL3R5"
!curl -L -o prompts/comment_2ABCD.prompt "https://drive.google.com/uc?export=download&id=1nXzRNnQkg7bDVjRyyZvWvWjqJp2v5DeO"
!curl -L -o prompts/comment_3ABC.prompt "https://drive.google.com/uc?export=download&id=1fqDj5UOdNLnOk641qRnWDzKSDJxwonJc"
!curl -L -o prompts/comment_3ABCD.prompt "https://drive.google.com/uc?export=download&id=1QqLxOJsqkM_VktHAb-sbr0Vi85GIjsnX"

!curl -L -o prompts/comment_reply_2ABCD.prompt "https://drive.google.com/uc?export=download&id=16Un9vw2J4cf_k1nCiOBLB7j4-3np9P7M"
clear_output()
print("Comment prompts have been uploaded in the 'prompt' folder and are ready to be used!")

Comment prompts have been uploaded in the 'prompt' folder and are ready to be used!


In [None]:
# Evaluation & Feedback Prompt Templates
!curl -L -o prompts/comment_ABCD_feedback.prompt "https://drive.google.com/uc?export=download&id=1hSh6f6KiIrviL59JkglNM1chBKQnlSGi"
!curl -L -o prompts/evaluation.prompt "https://drive.google.com/uc?export=download&id=1ELsBKmxR7y9eXqkX11UjVpaxmqqR6sac"

!curl -L -o prompts/comment_reply_ABCD_feedback.prompt "https://drive.google.com/uc?export=download&id=1n_AdEgi_HO96gE4dRzeKBQZFUPt3BDLq"
!curl -L -o prompts/evaluation_reply.prompt "https://drive.google.com/uc?export=download&id=1kXxNT0ZI5QMWV_wrVBequpVtVghPku6N"

clear_output()
print("Self-evaluation prompts have been uploaded in the 'prompt' folder and are ready to be used!")

Self-evaluation prompts have been uploaded in the 'prompt' folder and are ready to be used!


In [None]:
def prompting(fileloc, channel, title, captions, theme, comments, trends, fact_check_report, context, response=None, feedback=None, specific_comment=None):
  """
  Function to turn the downloaded prompt file into a usable prompt string, using LangChain's prompt template.
  Returns a string.
  """
  with open(fileloc, "r") as f:
    text = f.read()
  prompt_template = PromptTemplate.from_template(template=text, template_format="jinja2")
  prompt: str = prompt_template.format(
      theme=theme,
      channel=channel,
      title=title,
      captions=captions,
      trends=trends,
      comments=comments,
      fact_check=fact_check_report,
      context=context,
      response=response,
      feedback=feedback,
      comment_to_focus=specific_comment,
      )
  return prompt

In [None]:
# Experimenting with different kinds of prompts...
PROMPT_LOCS = [
               #"prompts/comment_1A.prompt", "prompts/comment_1AB.prompt", "prompts/comment_1ABC.prompt",
               #"prompts/comment_1ABCD.prompt",
               #"prompts/comment_2A.prompt", "prompts/comment_2AB.prompt",
               #"prompts/comment_2ABC.prompt",
               "prompts/comment_2ABCD.prompt",
               #"prompts/comment_3ABC.prompt",
               "prompts/comment_3ABCD.prompt"
               ]

def experiment_generate_comments(channel, title, captions, theme, comments, context, trends, fact_check_report):
  """
  Function to generate an influential comment for the video's comment section.
  Returns a string.
  """
  # Using the prompt template and calling Gemini
  all_prompts = {}
  for loc in PROMPT_LOCS:
    prompt = prompting(loc, channel, title, captions, theme, comments, trends, fact_check_report, context)
    all_prompts[loc.split("/")[1].split(".")[0]] = prompt

  result = ""
  for p, prompt in all_prompts.items():
    print("Prompt: " + p)
    result += "Prompt: " + p + "\n"
    for i in range(3):
      response = model.generate_content(prompt, safety_settings=GEM_SAFETY_SETTINGS)
      print(i+1)
      print(response.text + "\n")
      result += str(i+1) + "\n" + response.text + "\n"
      time.sleep(2)
  return result

def experiment_generate_comments_with_feedback(channel, title, captions, theme, comments, context, trends, fact_check_report):
  """
  Function to generate an influential comment for the video's comment section.
  Interacts with Grader Agent to improve on first output.
  Returns two strings: one with the comments, one with the feedbacks.
  """
  # Using the prompt template and calling Gemini
  all_prompts = {}
  for loc in PROMPT_LOCS:
    prompt = prompting(loc, channel, title, captions, theme, comments, trends, fact_check_report, context)
    all_prompts[loc.split("/")[1].split(".")[0]] = prompt

  result = ""
  feedback = ""
  for p, prompt in all_prompts.items():
    print("Prompt: " + p)
    result += "Prompt: " + p + "\n"
    feedback += "Prompt: " + p + "\n"
    for i in range(3):
      response = model.generate_content(prompt, safety_settings=GEM_SAFETY_SETTINGS)
      print(i+1)
      #print(response.text + "\n")
      result += str(i+1) + "\n" + response.text + "\n"
      time.sleep(2)
      evaluation_prompt = prompting("prompts/evaluation.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context, response=response.text)
      grading = model.generate_content(evaluation_prompt, safety_settings=GEM_SAFETY_SETTINGS)
      improve_prompt = prompting("prompts/comment_ABCD_feedback.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context, response=response.text, feedback=grading.text)
      improved_response = model.generate_content(improve_prompt, safety_settings=GEM_SAFETY_SETTINGS)
      print("After feedback: \n" + improved_response.text + "\n")
      result += "After feedback: " + improved_response.text + "\n"
      feedback += str(i+1) + "\n" + grading.text + "\n"
  return result, feedback

In [None]:
_ = experiment_generate_comments(channel, title, captions, THEME, sample_comments, context, trends, fact_check_report)

In [None]:
_, feedback = experiment_generate_comments_with_feedback(channel, title, captions, THEME, sample_comments, context, trends, fact_check_report)

---
## **COMMENT EXPERIMENTS | Interacting with *real* users**

Now let's actually post under some Youtube videos and observe *real* users' reactions!

**Theme 1**: Dangerous Diet Culture / Weight Loss videos.
**Theme 2**: Manosphere & Misogynistic videos.  


---

In [None]:
# Getting the top comments...

def get_individual_comments(video_id, comments_order, max_results=5):
    youtube = build("youtube", "v3", developerKey=DEVELOPER_KEY)
    comments = []
    next_page_token = None
    while len(comments) < max_results:
        request = youtube.commentThreads().list(
            part="snippet",
            videoId=video_id,
            maxResults=min(100, max_results - len(comments)),
            order=comments_order,
            pageToken=next_page_token,
            textFormat="plainText",
        )
        try:
            response = request.execute()
            comments.extend(
                [
                    (
                        item["id"],  # Extract the comment thread ID
                        unescape(item["snippet"]["topLevelComment"]["snippet"]["textDisplay"])
                    )
                    for item in response["items"]
                ]
            )
            next_page_token = response.get("nextPageToken")
            if not next_page_token:
                break

        except HttpError as error:
            print(f"An HTTP error {error.http_status} occurred:\n {error.content}")

    return comments[:max_results]

def get_top_recent_comments(video_id, max_results):
    order="time"
    return get_individual_comments(video_id, order, max_results)

def get_top_liked_comments(video_id, max_results):
    order="relevance"
    return get_individual_comments(video_id, order, max_results)

In [None]:
def generate_general_comment_with_feedback(channel, title, captions, theme, comments, context, trends, fact_check_report):
  """
  Function to generate an influential comment for the video's comment section.
  Interacts with Grader Agent to improve on first output.
  Returns string of final improved response that should be used for posting.
  """
  prompt = prompting("prompts/comment_2ABCD.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context)

  # First response
  response = model.generate_content(prompt, safety_settings=GEM_SAFETY_SETTINGS)
  time.sleep(2)

  # Self-evaluating
  evaluation_prompt = prompting("prompts/evaluation.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context, response=response.text)
  grading = model.generate_content(evaluation_prompt, safety_settings=GEM_SAFETY_SETTINGS)

  # Improving on first reponse
  improve_prompt = prompting("prompts/comment_ABCD_feedback.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context, response=response.text, feedback=grading.text)
  improved_response = model.generate_content(improve_prompt, safety_settings=GEM_SAFETY_SETTINGS)

  return improved_response.text

def generate_reply_to_comment_with_feedback(channel, title, captions, theme, comments, context, trends, fact_check_report, comment_to_reply):
  """
  Function to generate an influential comment for the video's comment section, as a reply to a chosen comment.
  Interacts with Grader Agent to improve on first output.
  Returns string of final improved response that should be used for posting.
  """
  prompt = prompting("prompts/comment_reply_2ABCD.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context, specific_comment=comment_to_reply)

  # First response
  response = model.generate_content(prompt, safety_settings=GEM_SAFETY_SETTINGS)
  time.sleep(2)

  # Self-evaluating
  evaluation_prompt = prompting("prompts/evaluation_reply.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context, response=response.text, specific_comment=comment_to_reply)
  grading = model.generate_content(evaluation_prompt, safety_settings=GEM_SAFETY_SETTINGS)

  # Improving on first reponse
  improve_prompt = prompting("prompts/comment_reply_ABCD_feedback.prompt", channel, title, captions, theme, comments, trends, fact_check_report, context, response=response.text, feedback=grading.text, specific_comment=comment_to_reply)
  improved_response = model.generate_content(improve_prompt, safety_settings=GEM_SAFETY_SETTINGS)

  return improved_response.text


In [None]:
VIDEO_THEMES = {"YddQ66BcYIU": "manosphere",
                "8Jl4zm5ftCM": "manosphere",
                "VYvIenW1XzE": "diet",
                "Ou2IsE7BoFI": "diet"}

VIDEOS = ["https://www.youtube.com/watch?v=YddQ66BcYIU", "https://www.youtube.com/watch?v=Q0wKXhhOZZU",
          "https://www.youtube.com/watch?v=VYvIenW1XzE", "https://www.youtube.com/watch?v=Ou2IsE7BoFI"]

def initiate_csv(name_csv):
    with open(name_csv, "a", newline="") as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(["video_url", "video_id", "user_comment_id", "user_comment_text", "generated_comment"])

def populate_csv(name_csv, row_list):
    with open(name_csv, "a", newline="") as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(row_list)

def from_start_to_finish(video_url):
  """
  This function does three things:
    - Generates a comment to the video (opinion)
    - Reads top 5 most recent comments and generates a response to those comments
    - Reads top 5 most liked comments and generates a response to those comments

  To do so, it:
    - Extracts basic video data (title, channel, comments, captions)
    - Generates a Fact-Check report using Truth Sleuth
    - Gathers articles into a themed corpus for context
    - Generates convincing comments using Trend Bender
  """
  ### --- ### --- ### BASICS + TRUTH SLEUTH AGENT ### --- ### --- ###
  # Extract basics & Fact-Check report
  video_id = extract.video_id(video_url)
  channel, title, _, _, _, _, _ = get_video_details(video_id)
  print(f"Video: {title} from {channel} ... ... ...")
  initiate_csv(f"generated_comment_{video_id}.csv")

  print("Truth Sleuth generating Fact-Check report ... ... ...")
  _, _, claims, captions, fact_check_report, report = generate_report(video_url, True, False)
  print("Fact-Check report extracted!!")

  comments = get_comments(video_url)
  sample_comments = ''
  for i, c in enumerate(comments.comment[:10]):
    sample_comments += "Comment #" + str(i+1) + c + "\n"
  clear_output()


  # Extract trends & context
  print("Now putting together themed corpus ... ... ...")
  theme = VIDEO_THEMES[video_id]
  context = get_context(theme)
  trends = extract_trend(channel, title, captions, comments, context)
  clear_output()

  #comment_rows = []
  ### --- ### --- ###     TREND BENDER AGENT     ### --- ### --- ###
  print("Over to Trend Bender agent to generate comments ... ... ...")
  # Part 1: Generate a general comment (improved version) and post it to the video
  print("One general comment ... ... ...")
  comment_for_video = generate_general_comment_with_feedback(channel, title, captions, theme, sample_comments, context, trends, fact_check_report)
  populate_csv(f"generated_comment_{video_id}.csv", [video_url, video_id, "general", "", comment_for_video])
  #comment_rows.append([video_url, video_id, "general", "", comment_for_video])

  # Part 2: Generate responses to 5 most recent comments and post them (1 response for 1 comment)
  print("Five recent comments ... ... ...")
  comments_response = get_top_recent_comments(video_id, max_results=5)
  for comment_id, comment_text in comments_response:
      reply_to_comment = generate_reply_to_comment_with_feedback(channel, title, captions, theme, sample_comments, context, trends, fact_check_report, comment_text)
      populate_csv(f"generated_comment_{video_id}.csv", [video_url, video_id, comment_id, comment_text, reply_to_comment])
      k = random.randint(1, 3)
      time.sleep(k)

  # Part 3: Generate responses to top 5 comments and post them (1 response for 1 comment)
  print("Five top comments ... ... ...")
  comments_response = get_top_liked_comments(video_id, max_results=5)
  for comment_id, comment_text in comments_response:
      reply_to_comment = generate_reply_to_comment_with_feedback(channel, title, captions, theme, sample_comments, context, trends, fact_check_report, comment_text)
      populate_csv(f"generated_comment_{video_id}.csv", [video_url, video_id, comment_id, comment_text, reply_to_comment])
      k = random.randint(1, 3)
      time.sleep(k)

  clear_output()
  print(f"Comments were generated for {title} from {channel} with the theme: '{theme}'!")
  display(Markdown(report))



In [None]:
def from_start_to_finish_reply(video_url, comment_to_reply_to, fact_check_report=None):
  """
  This function does three things:
    - Generates a response to a comment

  To do so, it:
    - Extracts basic video data (title, channel, comments, captions)
    - Generates a Fact-Check report using Truth Sleuth
    - Gathers articles into a themed corpus for context
    - Generates convincing comments using Trend Bender
  """
  ### --- ### --- ### BASICS + TRUTH SLEUTH AGENT ### --- ### --- ###
  # Extract basics & Fact-Check report
  video_id = extract.video_id(video_url)
  channel, title, _, _, _, _, _ = get_video_details(video_id)
  print(f"Video: {title} from {channel} ... ... ...")

  if fact_check_report is None:
    print("Truth Sleuth generating Fact-Check report ... ... ...")
    _, _, claims, captions, fact_check_report, _ = generate_report(video_url, True, False)
    print("Fact-Check report extracted!!")
  else:
    captions = get_captions(video_url, video_id)

  comments = get_comments(video_url)
  sample_comments = ''
  for i, c in enumerate(comments.comment[:10]):
    sample_comments += "Comment #" + str(i+1) + c + "\n"
  clear_output()


  # Extract trends & context
  print("Now putting together themed corpus ... ... ...")
  theme = VIDEO_THEMES[video_id]
  context = get_context(theme)
  trends = extract_trend(channel, title, captions, comments, context)
  clear_output()

  # Generate response
  print("Responding to comment ... ... ...")
  reply_to_comment = generate_reply_to_comment_with_feedback(channel, title, captions, theme, sample_comments, context, trends, fact_check_report, comment_to_reply_to)

  clear_output()
  print(f"Comments were generated for {title} from {channel} with the theme: '{theme}'!")
  print(reply_to_comment)

  return reply_to_comment


In [None]:
from_start_to_finish(VIDEO_URL)

In [None]:
comment_vid = '''
@Venom_Byte:
Hold the line gentlemen

@SUSleuth:
Holding the line on what, exactly? The video's claim that men have "STOPPED Pursuing Modern Women!" ignores the complexities of modern relationships. Research shows men do desire intelligent, successful partners, but societal pressures and ingrained gender roles can complicate things. This video oversimplifies a nuanced issue. Let's examine the broader context of gender roles and relationships.

@MoltenMetalGod7:
@SUSleuth there’s nothing complex about it even if its desired if the ROI isn’t there it’s not worth it. There’s nothing to examine. Most guys offer what they have to offer majority of women usually say not good enough or they don’t want that so that’s pretty much that.

@SUSleuth:
@MoltenMetalGod7  I hear you –  "ROI isn't there, it's not worth it" is a very clear and understandable perspective.  The video frames "success" very narrowly, though.  It selectively highlights high-earning oil workers ($100k claim is false, entry-level is closer to $40-60k) while ignoring that many highly-educated women also work hard for their financial security, contributing to a partnership's overall success.  Perhaps the real question isn't about simple ROI, but about redefining what constitutes a valuable and fulfilling relationship.  Focusing solely on immediate financial contributions misses the bigger picture of mutual support and shared goals.

@JoseLopez-eo4ze:
@SUSleuth what's so complex about hypergamy & basic economics?
'''
reply = from_start_to_finish_reply(VIDEO_URL, comment_vid, fact_check_report)

Comments were generated for Why Men Have STOPPED Pursuing Modern Women! from Rational Male Clips with the theme: 'manosphere'!
@JoseLopez-eo4ze  Hypergamy and economics offer *part* of the picture, but the video simplifies things by focusing on "marrying up" educationally and financially, ignoring emotional connection. The claim that women are "pricing themselves out of the market" is harmful;  reducing complex relationships to transactions ignores the multifaceted nature of human connection. The video also falsely claims entry-level oil worker salaries are $100k+ (fact-check: closer to $40-60k).  A nuanced understanding of gender roles and relationship dynamics reveals a more complex reality. [https://en.wikipedia.org/wiki/Gender_role](https://en.wikipedia.org/wiki/Gender_role) [https://en.wikipedia.org/wiki/Intimate_relationship](https://en.wikipedia.org/wiki/Intimate_relationship) [https://albtriallawyers.com/how-much-do-oil-rig-workers-make/](https://albtriallawyers.com/how-much-do