# Building a VeraCT Scan-Like Fake News Detection System with RAG

We'll structure the system to ensure accurate, real-time fact verification by retrieving and analyzing credible sources. Here's the roadmap:

**1️⃣ Understanding the Core System Components**

A VeraCT Scan-like system has three major steps:

- Fact Extraction – Identify key claims in the news article.
- Information Retrieval – Search for corroborating/conflicting sources (using RAG).
- Fact Verification & Reasoning – Compare retrieved evidence and determine credibility.

2️⃣ System Architecture Breakdown

Your system will have the following key modules:


- `News Ingestion`-->	Users submit news articles for fact-checking
- `Fact Extraction`-->	Extract key claims using NLP
- `Retrieval-Augmented Generation (RAG)`-->	Search the web for supporting/opposing evidence
- `Source Credibility Analysis`-->	Assess trustworthiness of retrieved sources
-`Evidence Aggregation & Decision`-->	Compare claims with retrieved evidence to classify as TRUE, FALSE, or MIXED
- `User Interface (Web App)`-->	Show results visually, provide reasoning


**Step 1: Extracting Key Claims from News Articles**

`Goal`

We need to extract key claims (who, what, where, when) from a news article using Named Entity Recognition (NER) and keyword extraction.

**Tools We'll Use:**

- ✅ spaCy (for NER)
- ✅ KeyBERT (for keyword extraction)
- ✅ TextRank (for summarization)

In [3]:
import spacy
from keybert import KeyBERT
from summa import summarizer

# Load spaCy model for Named Entity Entity Recognition
nlp = spacy.load("en_core_web_sm")

# Sample news article
news_article = """Elon Musk announced that Tesla will build a new factory in Nigeria,
creating thousands of jobs. The Nigerian government confirmed this deal."""


# 1️⃣ Extract Named Entities
doc = nlp(news_article)
entities = [(ent.text, ent.label_) for ent in doc.ents]
print("Named Entities:", entities)

# 2️⃣ Extract Keywords using KeyBERT
kw_model = KeyBERT()
keywords = kw_model.extract_keywords(news_article, keyphrase_ngram_range=(1,2), stop_words='english')
print("keywords:", keywords)

# 3️⃣ Summarize the article using TextRank
summary = summarizer.summarize(news_article, ratio=0.5)
print("Summary:", summary)


Named Entities: [('Elon Musk', 'PERSON'), ('Tesla', 'ORG'), ('Nigeria', 'GPE'), ('thousands', 'CARDINAL'), ('Nigerian', 'NORP')]
keywords: [('tesla build', 0.6023), ('factory nigeria', 0.5619), ('announced tesla', 0.4734), ('tesla', 0.4663), ('nigeria creating', 0.4631)]
Summary: Elon Musk announced that Tesla will build a new factory in Nigeria,


In [5]:
from transformers import pipeline

# Load a summarization model to extract key claims
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

news_article = """Elon Musk announced that Tesla will build a new factory in Nigeria, 
                  creating thousands of jobs. The Nigerian government confirmed this deal."""

summary = summarizer(news_article, max_length=30, min_length=15, do_sample=False)
print(summary)


Device set to use cpu


[{'summary_text': 'Elon Musk announced that Tesla will build a new factory in Nigeria, creating thousands of jobs. The Nigerian government confirmed this deal.'}]


This helps extract key claims from the article.

**🔹 Step 2: Searching the Web for Supporting Evidence (RAG)**

We use search APIs to find supporting or conflicting news reports.

📌 Free Search APIs you can use:
- Brave Search API (Privacy-focused, free)
- DuckDuckGo API (Free, but limited)
- Google Search API (Paid but powerful)
- Bing Search API (Better for enterprise use)

In [4]:
!pip install langchain-community




In [None]:
from langchain_community.tools import BraveSearch

# Replace this with your actual Brave API key
api_key = "BRAVESEARCH_API_TOKEN"

# Initialize BraveSearch correctly
search = BraveSearch.from_api_key(api_key=api_key, search_kwargs={"count": 3})

query = "Tesla building new factory in Nigeria"

# Perform the search
results = search.run(query)

# Print the search results
print(results)


[{"title": "List of Tesla factories - Wikipedia", "link": "https://en.wikipedia.org/wiki/List_of_Tesla_factories", "snippet": "<strong>Tesla</strong>, Inc. operates plants worldwide for the manufacture of their products, including electric vehicles, lithium-ion batteries, solar shingles, chargers, automobile parts, manufacturing equipment and tools for its own <strong>factories</strong>, as well as a lithium ore refinery. Maxwell continued to operate as subsidiary until 2021. Due to the short holding time and no known products produced under Tesla, their production facilities are not listed above. ^ The eleventh character of the vehicle identification number (VIN) indicates the factory the car has been built in."}, {"title": "Tesla Gigafactory: Locations, Cost, Future, Electricity - Business Insider", "link": "https://www.businessinsider.com/tesla-gigafactory", "snippet": "Here&#x27;s a look at <strong>Tesla</strong>&#x27;s Gigafactory network and the expansion plans for the facilities

**🚨 Verdict: The News About Tesla in Nigeria Is Fake**

Your search results show no credible sources reporting that Tesla is building a factory in Nigeria. Instead, all sources (Wikipedia, Business Insider, and Tesla’s official site) confirm Tesla's current and planned factory locations, which include:

✅ Existing Gigafactories:

    - Fremont, California

    - Sparks, Nevada

    - Berlin, Germany

    - Shanghai, China

    - Austin, Texas

    - Buffalo, New York

✅ Planned Factory:

    - Mexico (Monterrey)

❌ No mention of Nigeria or any African country.

**🔍 Next Step: Fake News Detection**

Now, let's finalize this by running a fake news classifier. You already have NER, keyword extraction, and summarization. Next, let's check if this news is likely fake:

In [8]:
from transformers import pipeline

# Load a sentiment analysis model
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

news_text = "Elon Musk announced that Tesla will build a new factory in Nigeria, creating thousands of jobs. The Nigerian government confirmed this deal."

result = classifier(news_text)
print(result)


Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9983853101730347}]


**Using huggingface token: Load the Model with Your API Token**

Now, modify your code to authenticate and load the DeBERTa-v3-MNLI model correctly:

In [None]:
8+9


^C


In [None]:
import os
from transformers import pipeline
from huggingface_hub import login
from dotenv import load_dotenv

load_dotenv()  # loads from .env file
login(token=os.getenv("HUGGINGFACEHUB_API_TOKEN"))  # now it's secure


# Use an alternative model
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

news_text = "Elon Musk announced that Tesla will build a new factory in Nigeria, creating thousands of jobs. The Nigerian government confirmed this deal."

# Define candidate labels
labels = ["real news", "fake news"]

# Perform classification
result = classifier(news_text, candidate_labels=labels)
print(result)



  from .autonotebook import tqdm as notebook_tqdm
Device set to use cpu


{'sequence': 'Elon Musk announced that Tesla will build a new factory in Nigeria, creating thousands of jobs. The Nigerian government confirmed this deal.', 'labels': ['real news', 'fake news'], 'scores': [0.990280032157898, 0.009720000438392162]}


In [12]:
!pip install openai tiktoken




In [None]:
import requests
import spacy

# Load NLP model for named entity recognition (NER)
nlp = spacy.load("en_core_web_sm")

# News article to verify
news_text = """ Nigerian governors’ forum breaks silence on Fubara’s suspension

Tolulope Popoola

March 22, 2025
Nigerian Governors’ Forum
Share

The Nigeria Governors’ Forum (NGF) has defended its decision to remain silent on recent political events, emphasizing its role as a neutral policy body.

In a statement issued on Saturday, titled “NGF Clarifies Silence on Political Matters,” the forum explained that taking positions on partisan issues could divide its members, who belong to different political platforms.

The clarification follows President Bola Tinubu’s recent declaration of a state of emergency in Rivers State, which led to the six-month suspension of Governor Siminalayi Fubara, his deputy, and elected members of the State House of Assembly.

NGF’s Position on Political Development

Although the NGF did not directly reference the situation in Rivers, the statement signed by Dr Abdulateef Shittu, its Director General, stated that the forum is focused on governance and policy matters rather than partisan conflicts.

“The Nigeria Governors’ Forum (NGF) has received media inquiries requesting it to comment on some recent political developments in the country,” the statement read.
Related News

    Why democracy has failed in Africa – Obasanjo
    I never apologised for speaking my truth, Natasha disowns apology reports
    Peter Obi condemns Rivers State fund release, urges respect for Rule of Law

“The Forum wishes to clarify that it is an umbrella body for subnational governments, aimed at promoting unified policy positions and collaborating with relevant stakeholders in pursuit of sustainable socioeconomic growth and the well-being of the people.

“As a technical and policy hub comprising governors elected on different platforms, the body elects to steer clear of taking positions that may alienate members with varying political interests.”

Shittu noted that past political divisions within the forum have threatened its unity, making it essential to avoid controversial political stances.

However, he assured Nigerians that the NGF remains committed to governance issues affecting economic growth and public welfare.

“Regardless, the Forum is known for its bold positions on governance and general policy matters of profound consequence, such as wages, taxes, education, and universal healthcare, among others,” Shittu stated.

The NGF called for media and public understanding, expressing confidence that existing political institutions and crisis resolution mechanisms would address partisan disputes. 

"""

# Extract named entities
doc = nlp(news_text)
entities = " ".join([ent.text for ent in doc.ents if ent.label_ in ["ORG", "GPE", "PERSON"]])

# Reliable news sources
reliable_sources = [
    "punchng.com", "vanguardngr.com", "guardian.ng", "premiumtimesng.com",
    "dailypost.ng", "thenationonlineng.net", "saharareporters.com", "channelstv.com",
    "naijanews.com", "tribuneonlineng.com", "pmnewsnigeria.com", "sunnewsonline.com",
    "leadership.ng", "dailytrust.com", "thisdaylive.com", "businessday.ng",
    "independent.ng", "blueprint.ng", "thecable.ng", "nigerianbulletin.com",
    "today.ng", "newtelegraphng.com", "tvcnews.tv", "nigerianeye.com",
    "thestreetjournal.org", "pulseng.com", "newsdirect.ng", "nigerianmonitor.com",
    "insidebusiness.ng", "businesspost.ng", "economicconfidential.com", "financialwatchngr.com",
    "legit.ng", "newsdiaryonline.com", "nationalaccordnewspaper.com", "brandspurng.com",
    "techcabal.com", "nigerianstoday.com", "plustvafrica.com", "thewhistler.ng",
    "metrobusinessnews.com", "concisenews.global", "leadershipnigeria.com",
    "realnewsmagazine.net", "thebossnewspapers.com", "orientdailynews.com",
    "theeagleonline.com.ng", "nigeriatoday.ng", "aljazirahnews.com", "orderpaper.ng",
    "brandpowerng.com", "newsherald.com.ng", "theabujatimes.com", "verbatimnews.com.ng",
    "nigerianews.net", "politicsnigeria.com", "thestatesman.com.ng", "theyesng.com",
    "notablenigeria.com", "thepointng.com", "forefrontng.com", "elanzanews.ng",
    "nationaldailyng.com", "ecomarketafrica.com", "technext.ng", "newsverge.com",
    "sciencenigeria.com", "nigeriahealthwatch.com", "nigerianinsidernews.com",
    "alreporter.com", "thestandard.ng", "thespellng.com", "newsarena.com.ng",
    "sahelstandard.com", "nigeriannewspapers.today", "sunrisenews.com.ng",
    "businesselitesafrica.com", "thedefenderngr.com", "ikengaonline.com",
    "energytimesng.com", "thealvinreport.com", "newnigeriannewspaper.com",
    "technopreneur.com.ng", "thetrumpet.ng", "theanalyst.com.ng",
    "realnewsnigeria.com", "thenewsnigeria.com.ng", "nigerianpilot.com",
    "newsbeam.com.ng", "edutorial.ng", "mediacouncilnigeria.com",
    "viewpointnigeria.com", "technotren.com", "huhuonline.com",
    "nigeriatopnews.com", "thedaily-ng.com", "theimpactnewspaper.com",
    "mynewswatchtimesng.com", "nairametrics.com"
]


# Search API credentials
search_api_key = "SEARCH_API_KEY"
search_api_url = f"https://www.searchapi.io/api/v1/search?q={entities}&engine=google_news"
search_headers = {"Authorization": f"Bearer {search_api_key}"}

# Perform search request
response_search = requests.get(search_api_url, headers=search_headers)
search_results = response_search.json()

# Check for reliability
fake_news = True
if "organic_results" in search_results:
    for result in search_results.get("organic_results", []):
        if any(source in result.get("link", "") for source in reliable_sources):
            fake_news = False
            break

# Tavily API credentials
tavily_api_key = "TAVILY_API_KEY"
tavily_url = "https://api.tavily.com/search"
tavily_payload = {"api_key": tavily_api_key, "query": entities, "search_depth": "basic", "num_results": 5}

# Perform Tavily search request
response_tavily = requests.post(tavily_url, json=tavily_payload)
tavily_results = response_tavily.json()

# Validate Tavily results
if "results" in tavily_results:
    for result in tavily_results.get("results", []):
        if any(source in result.get("url", "") for source in reliable_sources):
            fake_news = False
            break

# Final verdict
print("✅ Verified News: Found in reliable sources" if not fake_news else "⚠️ Likely Fake News: No reliable sources found")


✅ Verified News: Found in reliable sources


In [1]:
import os
import requests
import spacy
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Load NLP model for named entity recognition (NER)
nlp = spacy.load("en_core_web_sm")

# News article to verify
news_text = """ Nigerian governors’ forum breaks silence on Fubara’s suspension

Tolulope Popoola

March 22, 2025
Nigerian Governors’ Forum
Share

The Nigeria Governors’ Forum (NGF) has defended its decision to remain silent on recent political events, emphasizing its role as a neutral policy body.

In a statement issued on Saturday, titled “NGF Clarifies Silence on Political Matters,” the forum explained that taking positions on partisan issues could divide its members, who belong to different political platforms.

The clarification follows President Bola Tinubu’s recent declaration of a state of emergency in Rivers State, which led to the six-month suspension of Governor Siminalayi Fubara, his deputy, and elected members of the State House of Assembly.

NGF’s Position on Political Development

Although the NGF did not directly reference the situation in Rivers, the statement signed by Dr Abdulateef Shittu, its Director General, stated that the forum is focused on governance and policy matters rather than partisan conflicts.

“The Nigeria Governors’ Forum (NGF) has received media inquiries requesting it to comment on some recent political developments in the country,” the statement read.

“The Forum wishes to clarify that it is an umbrella body for subnational governments, aimed at promoting unified policy positions and collaborating with relevant stakeholders in pursuit of sustainable socioeconomic growth and the well-being of the people.

“As a technical and policy hub comprising governors elected on different platforms, the body elects to steer clear of taking positions that may alienate members with varying political interests.”

Shittu noted that past political divisions within the forum have threatened its unity, making it essential to avoid controversial political stances.

However, he assured Nigerians that the NGF remains committed to governance issues affecting economic growth and public welfare.

“Regardless, the Forum is known for its bold positions on governance and general policy matters of profound consequence, such as wages, taxes, education, and universal healthcare, among others,” Shittu stated.

The NGF called for media and public understanding, expressing confidence that existing political institutions and crisis resolution mechanisms would address partisan disputes. 
"""

# Extract named entities
doc = nlp(news_text)
entities = " ".join([ent.text for ent in doc.ents if ent.label_ in ["ORG", "GPE", "PERSON"]])

# Reliable news sources
reliable_sources = [
    "punchng.com", "vanguardngr.com", "guardian.ng", "premiumtimesng.com",
    "dailypost.ng", "thenationonlineng.net", "saharareporters.com", "channelstv.com",
    "naijanews.com", "tribuneonlineng.com", "pmnewsnigeria.com", "sunnewsonline.com",
    "leadership.ng", "dailytrust.com", "thisdaylive.com", "businessday.ng",
    "independent.ng", "thecable.ng", "nigerianbulletin.com", "today.ng",
    "newtelegraphng.com", "tvcnews.tv", "nigerianeye.com", "thestreetjournal.org",
    "pulseng.com", "newsdirect.ng", "nigerianmonitor.com", "insidebusiness.ng",
    "businesspost.ng", "economicconfidential.com", "financialwatchngr.com",
    "legit.ng", "newsdiaryonline.com", "nationalaccordnewspaper.com", "brandspurng.com",
    "techcabal.com", "nigerianstoday.com", "plustvafrica.com", "thewhistler.ng"
]

# Load API keys from environment variables
search_api_key = os.getenv("SEARCHAPI_KEY")
tavily_api_key = os.getenv("TAVILY_API_KEY")

if not search_api_key or not tavily_api_key:
    raise ValueError("Missing API keys. Please set SEARCHAPI_KEY and TAVILY_API_KEY in your .env file.")

# SearchAPI.io request
search_api_url = "https://www.searchapi.io/api/v1/search"
search_params = {
    "q": entities,
    "engine": "google_news",
    "api_key": search_api_key
}

try:
    response_search = requests.get(search_api_url, params=search_params)
    response_search.raise_for_status()
    search_results = response_search.json()
except requests.RequestException as e:
    print(f"⚠️ Error fetching SearchAPI.io results: {e}")
    search_results = {}

# Tavily API request
tavily_url = "https://api.tavily.com/search"
tavily_payload = {
    "api_key": tavily_api_key,
    "query": entities,
    "search_depth": "basic",
    "num_results": 5
}

try:
    response_tavily = requests.post(tavily_url, json=tavily_payload)
    response_tavily.raise_for_status()
    tavily_results = response_tavily.json()
except requests.RequestException as e:
    print(f"⚠️ Error fetching Tavily results: {e}")
    tavily_results = {}

# Function to check reliability
def is_reliable(results, key):
    if key in results:
        for result in results.get(key, []):
            url = result.get("link") or result.get("url", "")
            if any(source in url for source in reliable_sources):
                return True
    return False

# Determine if the news is fake
is_fake_news = not (is_reliable(search_results, "organic_results") or is_reliable(tavily_results, "results"))

# Final verdict
print("✅ Verified News: Found in reliable sources" if not is_fake_news else "⚠️ Likely Fake News: No reliable sources found")


✅ Verified News: Found in reliable sources
