# Automated Fact-Checking System - Demo

This notebook demonstrates the full fact-checking pipeline:
1. **Triplet Extraction** - Extract (subject, predicate, object) from claims
2. **Entity Linking** - Map entities to DBpedia URIs
3. **Knowledge Base Query** - Verify claims against DBpedia
4. **Neural Classification** - BERT-based verdict prediction
5. **Final Verdict** - SUPPORTED / REFUTED / NOT ENOUGH INFO

In [None]:
import sys
sys.path.insert(0, '..')

import logging
logging.basicConfig(level=logging.WARNING)

from src.triplet_extractor import TripletExtractor
from src.entity_linker import EntityLinker
from src.knowledge_query import KnowledgeQuery
from src.fact_checker import FactChecker, format_result

print('All modules loaded successfully!')

## 1. Triplet Extraction

We use spaCy dependency parsing to extract (subject, predicate, object) triplets from English sentences.

In [None]:
extractor = TripletExtractor()

sentences = [
    "Paris is the capital of France",
    "Barack Obama was born in Hawaii",
    "Albert Einstein developed the theory of relativity",
    "The Eiffel Tower is located in Paris",
    "Tokyo is the capital of Japan",
]

for sent in sentences:
    triplets = extractor.extract(sent)
    print(f'\n"{sent}"')
    for s, p, o in triplets:
        print(f'  Subject: {s}')
        print(f'  Predicate: {p}')
        print(f'  Object: {o}')

## 2. Entity Linking

Map extracted entities to their DBpedia URIs using the DBpedia Lookup API.

In [None]:
linker = EntityLinker()

entities = ["Paris", "France", "Barack Obama", "Hawaii", "Eiffel Tower", "Albert Einstein", "Tokyo", "Japan"]

for entity in entities:
    uri = linker.link(entity)
    print(f'{entity:20s} -> {uri}')

## 3. Knowledge Base Query

Verify relations between entities using DBpedia SPARQL and JSON endpoints.

In [None]:
kq = KnowledgeQuery()

pairs = [
    ("http://dbpedia.org/resource/Paris", "http://dbpedia.org/resource/France"),
    ("http://dbpedia.org/resource/Barack_Obama", "http://dbpedia.org/resource/Hawaii"),
    ("http://dbpedia.org/resource/Eiffel_Tower", "http://dbpedia.org/resource/Paris"),
    ("http://dbpedia.org/resource/Tokyo", "http://dbpedia.org/resource/Japan"),
]

for subj, obj in pairs:
    result = kq.verify_triplet(subj, obj)
    subj_name = subj.split('/')[-1].replace('_', ' ')
    obj_name = obj.split('/')[-1].replace('_', ' ')
    print(f'\n{subj_name} <-> {obj_name}')
    print(f'  Found: {result["found"]} (via {result["method"]})')
    for p in result['predicates'][:3]:
        print(f'  Predicate: {p.split("/")[-1]}')

## 4. Full Pipeline - Fact Checking

Run the complete pipeline on 10 example claims.

In [None]:
# Load the full pipeline (with neural model if available, otherwise KB-only)
import os
model_path = '../models/fact_checker'
use_neural = os.path.exists(model_path)
checker = FactChecker(model_path=model_path if use_neural else None, use_neural=use_neural)
print(f'Pipeline loaded (neural model: {"enabled" if use_neural else "disabled - KB only"})')

In [None]:
claims = [
    # True claims
    "Paris is the capital of France",
    "Barack Obama was born in Hawaii",
    "The Eiffel Tower is located in Paris",
    "Albert Einstein developed the theory of relativity",
    "Tokyo is the capital of Japan",
    # False claims
    "The Earth is flat",
    "Napoleon was born in England",
    "Mars is the largest planet in the solar system",
    # Ambiguous claims
    "Chocolate causes acne",
    "Dogs can sense earthquakes before they happen",
]

expected = [
    "SUPPORTED", "SUPPORTED", "SUPPORTED", "SUPPORTED", "SUPPORTED",
    "REFUTED", "REFUTED", "REFUTED",
    "NOT ENOUGH INFO", "NOT ENOUGH INFO",
]

results = []
for claim in claims:
    result = checker.check(claim)
    results.append(result)
    print('=' * 60)
    print(format_result(result))
    print()

## 5. Metrics

Evaluate the pipeline's performance against expected verdicts.

In [None]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report

predicted = [r['verdict'] for r in results]

print('Claim-by-claim results:')
print(f'{"Claim":50s} {"Expected":18s} {"Predicted":18s} {"Match"}')
print('-' * 100)
for claim, exp, pred in zip(claims, expected, predicted):
    match = 'OK' if exp == pred else 'MISS'
    print(f'{claim:50s} {exp:18s} {pred:18s} {match}')

# Overall metrics
labels = ['SUPPORTED', 'REFUTED', 'NOT ENOUGH INFO']
acc = accuracy_score(expected, predicted)
print(f'\n{"=" * 50}')
print(f'Accuracy: {acc:.2%}')
print(f'\nClassification Report:')
print(classification_report(expected, predicted, labels=labels, zero_division=0))

In [None]:
# Confidence distribution
print('\nConfidence distribution by verdict:')
for verdict in labels:
    confs = [r['confidence'] for r in results if r['verdict'] == verdict]
    if confs:
        avg_conf = sum(confs) / len(confs)
        print(f'  {verdict:18s}: avg={avg_conf:.3f}, min={min(confs):.3f}, max={max(confs):.3f} (n={len(confs)})')

## 6. Try Your Own Claim

Enter any English claim to fact-check it.

In [None]:
custom_claim = "London is the capital of the United Kingdom"
result = checker.check(custom_claim)
print(format_result(result))