## Overall principle of the project

The primary objective of the project is to automate the process of fact-checking by implementing a structured approach. This involves first extracting fact statements from natural language text in the form of triples consisting of a Subject, Relation, and Object. The next step is to match the extracted relation to predefined properties, such as associating "is the capital of" with a property like dbpedia-owl:capital. Once the relation is identified, the system looks up the subject and object in DBpedia by converting their names into URIs—for example, resolving "Paris" to http://dbpedia.org/resource/Paris. Using these URIs, a SPARQL query is executed against the DBpedia knowledge base to verify whether the Subject–Relation–Object triple exists in the data. If the triple is found, the statement is deemed verified. In cases where a direct match cannot be established, the system may employ indirect methods or corrections, such as determining if the subject's "actual birthplace" aligns with the stated location.

List of the new words and their definition : 

- URI (Uniform Resource Identifier)
A URI is a string used to uniquely identify a resource on the internet. In DBpedia, every entity (for example, “Barack Obama” or “Paris”) is assigned a unique URI, such as http://dbpedia.org/resource/Barack_Obama. This URI allows different datasets and applications to talk about the same concept without ambiguity.

- DBpedia
DBpedia is a large, open-source knowledge base that is built from structured information extracted from Wikipedia. It provides data in the form of RDF triples and makes it possible to query this information using SPARQL. Actually  its an online database describing tens of thousands of people, places, organizations ...

- Triplet 
In our project, a triplet is a statement with three parts: Subject, Predicate (or Relation), Object. For example:
Subject: “Barack Obama”
Predicate/Relation: “was born in”
Object: “Hawaii”

- SPARQL
SPARQL is a query language designed for querying and manipulating RDF (Resource Description Framework) data. It is used to retrieve or update information stored in a knowledge base like DBpedia. 

- S, R, and O

S stands for Subject. This is the entity about which we are making a statement (e.g., “Barack Obama”).
R stands for Relation or Predicate. This is the property or relationship that links subject and object (e.g., “was born in”).
O stands for Object. This is the target entity or concept connected to the subject (e.g., “Hawaii”)

In [None]:
#%pip install spacy requests fuzzywuzzy python-Levenshtein flask flask-cors SPARQLWrapper

First we import the necessary libraries. We need requests for HTTP requests to the DBpedia API, fuzzywuzzy for approximate string matching, spacy for NLP tasks such as tokenization and part of speech analysis and SPARQLWrapper to query DBpedia through SPARQL. 

we also useET from xml.etree.ElementTree because the DBpedia Lookup service returns data in XML format instead of JSON

In [2]:
import os
import json
import re
import random
import requests

# For string matching
from fuzzywuzzy import fuzz

# spacyfor NLP
import spacy
from spacy import displacy


import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('omw-1.4')

# SPARQLWrapper for the queries ofDBpedia
from SPARQLWrapper import SPARQLWrapper, JSON

# For XML parsing to handle DBpedia XML response
import xml.etree.ElementTree as ET

print("All imports done.")

All imports done.


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\calar\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\calar\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\calar\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\calar\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [3]:
#!python -m spacy download en_core_web_sm

In [4]:
nlp = spacy.load("en_core_web_sm")

print("spaCy model loaded.")

spaCy model loaded.


In this cell, we define a dictionary that maps certain text-based relations (like “was born in”) to the corresponding DBpedia ontology properties (like dbpedia.org/ontology/birthPlace). This helps us to match the queries to actual DBpedia properties. We also have a fuzzy match threshold to decide how close the relation text must be to a known key. We store constants for SPARQL endpoints and the DBpedia Lookup URL.

In [5]:
RELATION_MAP = {
    "was born in": [
        "http://dbpedia.org/ontology/birthPlace",
        "http://dbpedia.org/property/birthPlace"
    ],
    "is the capital of": [
        "http://dbpedia.org/ontology/capital",
        "http://dbpedia.org/property/capital"
    ],
    "died in": [
        "http://dbpedia.org/ontology/deathPlace",
        "http://dbpedia.org/property/deathPlace"
    ],
    "is located in": [
        "http://dbpedia.org/ontology/location",
        "http://dbpedia.org/property/location"
    ],
    "founded": [
        "http://dbpedia.org/ontology/founder",
        "http://dbpedia.org/property/founder"
    ],
    "was founded by": [
        "http://dbpedia.org/ontology/founder",
        "http://dbpedia.org/property/founder"
    ]
}

FUZZY_MATCH_THRESHOLD = 70

DBPEDIA_SPARQL_ENDPOINT = "https://dbpedia.org/sparql"
DBPEDIA_LOOKUP_URL = "https://lookup.dbpedia.org/api/search/KeywordSearch"

print("Relation map and endpoints configured.")


Relation map and endpoints configured.


RELATION_MAP is a A manual dictionary that links a plain English phrase to the actual DBpedia ontology and property URIs
FUZZY_MATCH_THRESHOLD = 70 => We need at least 70% similarity to consider the relation a match.
DBPEDIA_SPARQL_ENDPOINT => The main SPARQL endpoint of DBpedia.
DBPEDIA_LOOKUP_URL => API for DBpedia Lookup service to find the correct resource URIs for an entity

next we define a helper function with spacy to show how a sentence is tokenized and what each tokens role is (like subject, verb, object). This helps us better understand how spaCy sees the sentence. We run a few example sentences to observe the dependency tree that spaCy outputs, which guide our approach to extracting subject-relation-object triplets

In [6]:
def debug_spacy_parse(sentence):
    doc = nlp(sentence)
    print(f"DEBUG: spaCy tokens for: '{sentence}'")
    for token in doc:
        print(f"  {token.text:<12} {token.dep_:<10} {token.pos_:<6} HEAD={token.head.text}")

test_sentences = [
    "Paris is the capital of France.",
    "Elon Musk founded SpaceX.",
    "Apple was founded by Steve Jobs."
]
for s in test_sentences:
    debug_spacy_parse(s)
    print("-"*50)

DEBUG: spaCy tokens for: 'Paris is the capital of France.'
  Paris        nsubj      PROPN  HEAD=is
  is           ROOT       AUX    HEAD=is
  the          det        DET    HEAD=capital
  capital      attr       NOUN   HEAD=is
  of           prep       ADP    HEAD=capital
  France       pobj       PROPN  HEAD=of
  .            punct      PUNCT  HEAD=is
--------------------------------------------------
DEBUG: spaCy tokens for: 'Elon Musk founded SpaceX.'
  Elon         compound   PROPN  HEAD=Musk
  Musk         nsubj      PROPN  HEAD=founded
  founded      ROOT       VERB   HEAD=founded
  SpaceX.      dobj       NOUN   HEAD=founded
--------------------------------------------------
DEBUG: spaCy tokens for: 'Apple was founded by Steve Jobs.'
  Apple        nsubjpass  PROPN  HEAD=founded
  was          auxpass    AUX    HEAD=founded
  founded      ROOT       VERB   HEAD=founded
  by           agent      ADP    HEAD=founded
  Steve        compound   PROPN  HEAD=Jobs
  Jobs         pobj  

spacy processes the textt, for token in doc we iterate over each token to print out dependency labels (like nsubj, ROOT, etc.).
We test with simple sentences to confirm how spaCy labels them, for instance verifying that Paris is recognized as nsubj.

This function has an approach to triple extraction => the subject is taken as the first noun chunk, the object is taken as the last noun chunk, and the words between them (verbs, auxiliaries, prepositions) are joined to form the relation text. 

In [7]:
def extract_triplet_spacy(sentence):
    """
     - subject = first noun chunk
     - object = last noun chunk
     - relation = verbs in between
    """
    doc = nlp(sentence)
    
    noun_chunks = list(doc.noun_chunks)
    if len(noun_chunks) < 2:
        print("DEBUG: Less than 2 noun chunks, can't extract triple properly.")
        return None
    
    subject_chunk = noun_chunks[0]
    object_chunk = noun_chunks[-1]
    subject_text = subject_chunk.text.strip()
    object_text = object_chunk.text.strip()
    
    start_idx = subject_chunk.end
    end_idx = object_chunk.start
    
    if start_idx >= end_idx:
        print("DEBUG: Overlapping subject/object chunk, can't parse relation.")
        return None
    
    relation_tokens = []
    for token in doc[start_idx:end_idx]:
        if token.pos_ in ["VERB", "AUX", "ADP", "PART"]:
            relation_tokens.append(token.text)

    relation_text = " ".join(relation_tokens).strip()
    
    if not relation_text:
        print(f"DEBUG: Extracted an empty relation between {subject_text} and {object_text}.")
    
    def normalize_entity_name(x):
        return " ".join(t.capitalize() for t in x.split())
    
    subject_text = normalize_entity_name(subject_text)
    object_text = normalize_entity_name(object_text)
    
    print(f"DEBUG EXTRACT: S='{subject_text}'  R='{relation_text}'  O='{object_text}'")
    return (subject_text, relation_text, object_text)

demo = [
    "Barack Obama was born in Hawaii.",
    "Paris is the capital of France.",
    "Elon Musk founded SpaceX.",
    "Apple was founded by Steve Jobs."
]
for d in demo:
    res = extract_triplet_spacy(d)
    print("Result =>", res)
    print("-"*50)

DEBUG EXTRACT: S='Barack Obama'  R='was born in'  O='Hawaii'
Result => ('Barack Obama', 'was born in', 'Hawaii')
--------------------------------------------------
DEBUG EXTRACT: S='Paris'  R='is of'  O='France'
Result => ('Paris', 'is of', 'France')
--------------------------------------------------
DEBUG EXTRACT: S='Elon Musk'  R='founded'  O='Spacex.'
Result => ('Elon Musk', 'founded', 'Spacex.')
--------------------------------------------------
DEBUG EXTRACT: S='Apple'  R='was founded by'  O='Steve Jobs'
Result => ('Apple', 'was founded by', 'Steve Jobs')
--------------------------------------------------


For the relation, we only consider tokens in between that are typical “relation indicators” like verbs, auxiliary verbs, prepositions, or particles.
This method illustrate the concept of triple extraction for fact checking.


We define a function that takes the extracted relation text (for ex “was born in”) and tries to match it to one of the known keys in RELATION_MAP. We use fuzz.ratio from fuzzywuzzy to handle slight differences like “was founded by” or “is the capital of”. If the match is above a threshold, we accept it.

In [8]:
def find_best_relation_match(relation_text, relation_map=RELATION_MAP):
    best_match = None
    best_score = 0
    for candidate in relation_map.keys():
        score = fuzz.ratio(relation_text.lower(), candidate.lower())
        if score > best_score and score >= FUZZY_MATCH_THRESHOLD:
            best_score = score
            best_match = candidate
    return best_match

raw_relation = "founded"
print("Testing 'founded' =>", find_best_relation_match(raw_relation))
raw_relation = "was founded by"
print("Testing 'was founded by' =>", find_best_relation_match(raw_relation))
raw_relation = "is the capital of"
print("Testing 'is the capital of' =>", find_best_relation_match(raw_relation))
raw_relation = "was born in"
print("Testing 'was born in' =>", find_best_relation_match(raw_relation))


Testing 'founded' => founded
Testing 'was founded by' => was founded by
Testing 'is the capital of' => is the capital of
Testing 'was born in' => was born in


the following cell addresses how we query DBpedia to turn a text like “Barack Obama” into a valid DBpedia URI (http://dbpedia.org/resource/Barack_Obama). Because the DBpedia Lookup API return XML, we parse the XML with xml.etree.ElementTree to get the <URI> tags. We also demonstrate a quick test with examples like “Barack Obama” and “Paris.”

In [9]:
#DBPEDIA LOOKUP + XML PARSING
from urllib.parse import quote

def lookup_dbpedia_entities(entity_text, max_results=3):
    #Query the DBpedia Lookup API for a text
    #Returns a list of candidate URIs (parsed from XML)
    url = f"{DBPEDIA_LOOKUP_URL}?QueryString={quote(entity_text)}&MaxHits={max_results}"
    headers = {
        "Accept": "application/json",  # DBpedia Lookup often ignores this
        "User-Agent": "Mozilla/5.0 (compatible; MyFactChecker/1.0)"
    }
    try:
        resp = requests.get(url, headers=headers, timeout=10)
        print(f"DEBUG: Lookup URL = {resp.url}, status_code={resp.status_code}")
        
        # Print first 200 chars for debugging
        text_sample = resp.text[:200]
        print("DEBUG: Response content (first 200 chars):", text_sample.replace("\n","\\n"))
        
        resp.raise_for_status()  # Raise if not 200
        # The returned data is actually XMLso we parse
        root = ET.fromstring(resp.text)
        uris = []
        for i, result_tag in enumerate(root.findall('Result')):
            if i >= max_results:
                break
            uri_tag = result_tag.find('URI')
            if uri_tag is not None:
                uris.append(uri_tag.text)
        return uris
    
    except requests.exceptions.RequestException as e:
        print(f"DBpedia Lookup error: {e}")
        return []
    except ET.ParseError as pe:
        # Error parsing XML
        print(f"DBpedia Lookup XML parse error: {pe}")
        return []

test_entities = ["Barack Obama", "Paris", "Elvis Presley"]
for ent in test_entities:
    print(f"Entity: {ent}")
    res = lookup_dbpedia_entities(ent)
    print("URIs =>", res)
    print("-"*50)

Entity: Barack Obama
DEBUG: Lookup URL = https://lookup.dbpedia.org/api/search/KeywordSearch?QueryString=Barack%20Obama&MaxHits=3, status_code=200
DEBUG: Response content (first 200 chars): <?xml version="1.0" encoding="UTF-8"?><ArrayOfResults><Result><Label>Barack Obama</Label><URI>http://dbpedia.org/resource/Barack_Obama</URI><Description>Barack Hussein Obama II ( (); born August 4, 19
URIs => ['http://dbpedia.org/resource/Barack_Obama', 'http://dbpedia.org/resource/List_of_federal_judges_appointed_by_Barack_Obama', 'http://dbpedia.org/resource/Family_of_Barack_Obama']
--------------------------------------------------
Entity: Paris
DEBUG: Lookup URL = https://lookup.dbpedia.org/api/search/KeywordSearch?QueryString=Paris&MaxHits=3, status_code=200
DEBUG: Response content (first 200 chars): <?xml version="1.0" encoding="UTF-8"?><ArrayOfResults><Result><Label>Paris</Label><URI>http://dbpedia.org/resource/Paris</URI><Description>Paris (French pronunciation: ​[paʁi] ()) is the capital an

We use quote(entity_text) to ensure the users text is URL-encoded.
We keep only up to max_results URIs so we don’t get too many.
The reason we use XML parsing  is because the DBpedia Lookup service  return XML by default, not JSON. In many cases, DBpedia ignores the Accept: application/json header, so we must parse the XML format

Next we performs a SPARQL ASK query to see if a specific triple exists in DBpedia. If subject_uri, property_uri, and object_uri appear together in DBpedia, we get True; otherwise, False. 

In [10]:
#  SPARQL ASK
def check_triple_in_dbpedia(subject_uri, property_uri, object_uri):
    """
    Query DBpedia via SPARQL to see if subject_uri property_uri object_uri exists.
    Return True/False.
    """
    query = f"""
    ASK WHERE {{
       <{subject_uri}> <{property_uri}> <{object_uri}> .
    }}
    """
    sparql = SPARQLWrapper(DBPEDIA_SPARQL_ENDPOINT)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    try:
        results = sparql.query().convert()
        return results["boolean"]  
    except Exception as e:
        print(f"SPARQL error: {e}")
        return False

test_subj = "http://dbpedia.org/resource/Paris"
test_prop = "http://dbpedia.org/ontology/capital"
test_obj = "http://dbpedia.org/resource/France"

res = check_triple_in_dbpedia(test_subj, test_prop, test_obj)
print("DEBUG SPARQL ASK:", test_subj, test_prop, test_obj, " =>", res)

DEBUG SPARQL ASK: http://dbpedia.org/resource/Paris http://dbpedia.org/ontology/capital http://dbpedia.org/resource/France  => False


DBpedia  encodes “Paris is the capital of France” differently like as France dbo:capital Paris maybe and not Paris dbo:capital France.  in that orientation, ASK returns false

Here, check_triple_in_dbpedia_flexible is an other approach where we retrieve the objects for (subject_uri, property_uri, ?val) and compare their labels to the users requested label. It uses fuzzy matching to see if any of these ?val labels are close to user_object_label. 

In [11]:
def check_triple_in_dbpedia_flexible(subject_uri, property_uri, user_object_label):
    query = f"""
    SELECT ?val WHERE {{
      <{subject_uri}> <{property_uri}> ?val .
    }}
    """
    results = query_dbpedia(query)  
    
    if not results:
        return False

    # 2) Pour chaque valeur ?val, tenter de récupérer un label
    for binding in results:
        val_uri = binding["val"]["value"]
        
        # on reconstruit un label en enlevant le préfixe DBpedia et en remplaçant _
        if val_uri.startswith("http://dbpedia.org/resource/"):
            short_name = val_uri.replace("http://dbpedia.org/resource/", "")
            short_name_clean = short_name.replace("_", " ").replace(",", "")
            # => "Los Angeles California"
            
            # Fuzzy match
            score = fuzz.ratio(short_name_clean.lower(), user_object_label.lower())
            if score > 80:
                return True
    
    return False

**This cell is the central part of the system**

A list of transitive location properties that help us see if child_uri is indirectly part of a bigger region (for ex Pretoria → South Africa).
Also a helper function to fetch object URIs for a given property, do BFS to see if something is included in a parent region, and to look up the “correct” object if the direct triple is not found.
The check_fact_statement function orchestrates the entire pipeline: extracting a triple from the user statement, fuzzy matching the relation, looking up DBpedia URIs, SPARQL-checking them, and if that fails, attempting a “correction” (e.g. we discover Elon Musk was actually born in “Pretoria,” then see if Pretoria is located in “South Africa”)

In [12]:
TRANSITIVE_PROPS_LOCATION = [
    "http://dbpedia.org/ontology/location",
    "http://dbpedia.org/property/location",
    "http://dbpedia.org/ontology/isPartOf",
    "http://dbpedia.org/ontology/country",
    "http://dbpedia.org/ontology/region",
    "http://dbpedia.org/ontology/city",
]

def get_objects_for_property(subject_uri, property_uri):
    #Fait un SELECT ?o WHERE { <subject_uri> <property_uri> ?o } et retourne la liste des URIs objets correspondants.
    sparql = SPARQLWrapper(DBPEDIA_SPARQL_ENDPOINT)
    query = f"""
    SELECT ?o WHERE {{
      <{subject_uri}> <{property_uri}> ?o .
    }}
    """
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = []
    try:
        res = sparql.query().convert()
        for b in res["results"]["bindings"]:
            results.append(b["o"]["value"])
    except Exception as e:
        print("SPARQL error in get_objects_for_property:", e)
    return results

def is_included_in(child_uri, parent_label, max_depth=2):
    #Parcours BFS pour déterminer si child_uri (ex. Louvre) 
    #est inclus (directement ou indirectement) dans parent_label (ex. "Paris").
    #On limite la profondeur (max_depth) pour éviter un trop grand nombre de requêtes.
    #S'il trouve un label proche de parent_label, renvoie True.
    from collections import deque
    visited = set()
    queue = deque([(child_uri, 0)])
    
    while queue:
        current_uri, depth = queue.popleft()
        if current_uri in visited:
            continue
        visited.add(current_uri)

        # Récupérer un label local (ex. "Louvre" depuis http://dbpedia.org/resource/Louvre)
        label_current = uri_to_label(current_uri)
        
        # Fuzzy match avec parent_label (ex. "Paris")
        score = fuzz.ratio(label_current.lower(), parent_label.lower())
        if score >= 80:
            return True  # on a trouvé une correspondance satisfaisante

        # Si on peut encore descendre, on explore les propriétés transitives
        if depth < max_depth:
            for p in TRANSITIVE_PROPS_LOCATION:
                voisins = get_objects_for_property(current_uri, p)
                for v in voisins:
                    if v not in visited:
                        queue.append((v, depth + 1))
    
    return False

def try_correction_lookup(subject_uris, matched_relation):
    #Pour certaines relations (ex: was born in, died in, located in...), 
    #on récupère la/les "vraies" valeurs DBpedia.
    #On renvoie un dict avec : - "text": un message textuel (ex. "Mona Lisa is actually located at the Louvre.")
    # et - "object_uris": la liste des URIs candidates (ex. [http://dbpedia.org/resource/Louvre])
    
    correction_map = {
        "was born in": "http://dbpedia.org/ontology/birthPlace",
        "died in": "http://dbpedia.org/ontology/deathPlace",
        "is located in": "http://dbpedia.org/ontology/location", 
    }
    if matched_relation not in correction_map:
        return None
    
    prop = correction_map[matched_relation]
    s_uri = subject_uris[0]  # on tente le premier URI du sujet
    values = fetch_property_values(s_uri, prop, limit=5)
    if not values:
        return None
    
    short_labels = [uri_to_label(v) for v in values]
    # On assemble un petit texte explicatif
    txt = (f"{uri_to_label(s_uri)}’s actual {matched_relation} is: "
           + ", ".join(short_labels) + ".")
    
    return {
        "text": txt,
        "object_uris": values
    }

def check_fact_statement(statement):
    print("\n==== FACT CHECK STATEMENT ====")
    print(f"User statement: '{statement}'")
    
    # 1) Extraction
    extracted = extract_triplet_spacy(statement)
    if not extracted:
        return {
            "success": False,
            "error": "Could not extract (subject, relation, object) from statement.",
            "statement": statement
        }
    
    subject_text, raw_relation, object_text = extracted
    
    # 2) Relation fuzzy match
    matched_relation_key = find_best_relation_match(raw_relation, RELATION_MAP)
    print(f"DEBUG: raw_relation='{raw_relation}', matched_relation_key='{matched_relation_key}'")
    
    if not matched_relation_key:
        return {
            "success": True,
            "found": False,
            "message": f"Relation '{raw_relation}' not recognized in RELATION_MAP. I cannot confirm or deny this statement.",
            "extracted_triplet": extracted
        }
    
    possible_props = RELATION_MAP[matched_relation_key]
    
    # 3) Lookup DBpedia URIs (sujet / objet)
    max_to_check = 5
    subj_candidates = lookup_dbpedia_entities(subject_text, max_results=15)[:max_to_check]
    obj_candidates  = lookup_dbpedia_entities(object_text, max_results=15)[:max_to_check]
    
    print("DEBUG: Subject candidates =>", subj_candidates)
    print("DEBUG: Object candidates  =>", obj_candidates)
    
    if not subj_candidates:
        return {
            "success": False,
            "found": False,
            "message": f"No DBpedia resource found for subject '{subject_text}'",
            "extracted_triplet": extracted
        }
    if not obj_candidates:
        return {
            "success": False,
            "found": False,
            "message": f"No DBpedia resource found for object '{object_text}'",
            "extracted_triplet": extracted
        }
    
    # 4) Vérif direct SPARQL
    found_any = False
    matched_triples = []
    
    for s_uri in subj_candidates:
        for prop_uri in possible_props:
            for o_uri in obj_candidates:
                if check_triple_in_dbpedia(s_uri, prop_uri, o_uri):
                    found_any = True
                    matched_triples.append((s_uri, prop_uri, "Fuzzy matched -> " + object_text))
                    break
            if found_any:
                break
        if found_any:
            break
    
    if found_any:
        # On a trouvé un triplet exact => la déclaration est vraie directement
        return {
            "success": True,
            "found": True,
            "message": "Statement verified. One or more matching triples found in DBpedia.",
            "extracted_triplet": extracted,
            "matched_triples": matched_triples
        }
    else:
        # Pas trouvé => on tente la "correction" + inclusion indirecte
        correction_info = try_correction_lookup(subj_candidates, matched_relation_key)
        
        if correction_info:
            # On regarde si la/les valeurs renvoyées sont incluses dans object_text
            real_uris = correction_info.get("object_uris", [])
            # On fait un BFS 
            for real_uri in real_uris:
                if is_included_in(real_uri, object_text, max_depth=2):
                    # => Indirectement vrai
                    return {
                        "success": True,
                        "found": True,
                        "message": (
                            f"Indirectly true: {uri_to_label(subj_candidates[0])} "
                            f"{matched_relation_key} {uri_to_label(real_uri)}, "
                            f"which is itself included in '{object_text}'."
                        ),
                        "extracted_triplet": extracted
                    }
            
            return {
                "success": True,
                "found": False,
                "message": (
                    "No matching triple found in DBpedia for the recognized relation. "
                    f"However, I found a different fact: {correction_info['text']}"
                ),
                "extracted_triplet": extracted
            }
        else:
            return {
                "success": True,
                "found": False,
                "message": "No matching triple found in DBpedia for the recognized relation.",
                "extracted_triplet": extracted
            }


In [13]:

def try_correction_lookup(subject_uris, matched_relation):
    correction_map = {
        "was born in": "http://dbpedia.org/ontology/birthPlace",
        "died in": "http://dbpedia.org/ontology/deathPlace",
        "is located in": "http://dbpedia.org/ontology/location", 
    }
    if matched_relation not in correction_map:
        return None
    
    prop = correction_map[matched_relation]
    s_uri = subject_uris[0]  
    values = fetch_property_values(s_uri, prop, limit=5)
    if not values:
        return None
    
    short_labels = [uri_to_label(v) for v in values]
    txt = (f"{uri_to_label(s_uri)}’s actual {matched_relation} is: "
           + ", ".join(short_labels) + ".")
    
    return {
        "text": txt,
        "object_uris": values
    }


def fetch_property_values(subject_uri, property_uri, limit=5):
    #SPARQL query: SELECT ?value WHERE { <subject_uri> <property_uri> ?value } LIMIT n
    #Return list of object URIs
    from SPARQLWrapper import SPARQLWrapper, JSON
    query = f"""
    SELECT ?val WHERE {{
      <{subject_uri}> <{property_uri}> ?val .
    }} LIMIT {limit}
    """
    endpoint = SPARQLWrapper(DBPEDIA_SPARQL_ENDPOINT)
    endpoint.setQuery(query)
    endpoint.setReturnFormat(JSON)
    values = []
    try:
        results = endpoint.query().convert()
        for b in results["results"]["bindings"]:
            values.append(b["val"]["value"])
    except Exception as e:
        print("SPARQL error in fetch_property_values:", e)
    return values

def uri_to_label(uri):
    if uri.startswith("http://dbpedia.org/resource/"):
        name_part = uri.replace("http://dbpedia.org/resource/", "")
        return name_part.replace("_", " ")
    else:
        return uri

Here we create a list of sample statements to test and then we call check_fact_statement on each. The debug output shows how the system extracts a triple, matches the relation, searches DBpedia, and either confirms or corrects the statement.

Each statement triggers the entire pipeline => triple extraction, DBpedia lookup, SPARQL ask, correction, and possible indirect checks.
The results show “true,” “false,” or an “indirectly true” scenario, plus any other fallback messages about discovered facts (like Elvis Presley actually died in Memphis, not Paris)

In [14]:
test_statements = [
    "Barack Obama was born in Hawaii",
    "Paris is the capital of France",
    "Elon Musk was born in South Africa",
    "Elvis Presley died in Paris",
    "Albert Einstein was born in Ulm",
    "Albert Einstein was born in Berlin",
    "Donald Trump was born in Paris",
    "Michael Jackson died in Los Angeles",
    "Michael Jackson died in 2009 in the city of Los Angeles",
    "Apple was founded by Steve Jobs",
    "Elon Musk founded SpaceX"
]

for st in test_statements:
    result = check_fact_statement(st)
    print("RESULT =>", json.dumps(result, indent=2))
    print("="*70)



==== FACT CHECK STATEMENT ====
User statement: 'Barack Obama was born in Hawaii'
DEBUG EXTRACT: S='Barack Obama'  R='was born in'  O='Hawaii'
DEBUG: raw_relation='was born in', matched_relation_key='was born in'
DEBUG: Lookup URL = https://lookup.dbpedia.org/api/search/KeywordSearch?QueryString=Barack%20Obama&MaxHits=15, status_code=200
DEBUG: Response content (first 200 chars): <?xml version="1.0" encoding="UTF-8"?><ArrayOfResults><Result><Label>Barack Obama</Label><URI>http://dbpedia.org/resource/Barack_Obama</URI><Description>Barack Hussein Obama II ( (); born August 4, 19
DEBUG: Lookup URL = https://lookup.dbpedia.org/api/search/KeywordSearch?QueryString=Hawaii&MaxHits=15, status_code=200
DEBUG: Response content (first 200 chars): <?xml version="1.0" encoding="UTF-8"?><ArrayOfResults><Result><Label>Hawaii</Label><URI>http://dbpedia.org/resource/Hawaii</URI><Description>Hawaiʻi ( () hə-WY-ee; Hawaiian: Hawaiʻi [həˈvɐjʔi]) is a s
DEBUG: Subject candidates => ['http://dbpedia.org/res

In [15]:
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")
query = """
SELECT ?s WHERE {
    ?s rdfs:label ?label .
    FILTER (lang(?label) = 'en').
    FILTER (CONTAINS(lcase(?label), lcase("Barack Obama"))).
} LIMIT 10
"""

sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
    print(result["s"]["value"])

http://dbpedia.org/resource/Cabinet_of_Barack_Obama
http://dbpedia.org/resource/A_Singular_Woman:_The_Untold_Story_of_Barack_Obama's_Mother
http://dbpedia.org/resource/BARACK_OBAMA
http://dbpedia.org/resource/Presidency_of_Barack_Obama
http://dbpedia.org/resource/President_Barack_Obama
http://dbpedia.org/resource/President_Elect_Barack_Obama
http://dbpedia.org/resource/Electoral_history_of_Barack_Obama
http://dbpedia.org/resource/Energy_policy_of_the_Barack_Obama_administration
http://dbpedia.org/resource/List_of_awards_and_honors_received_by_Barack_Obama
http://dbpedia.org/resource/List_of_bills_sponsored_by_Barack_Obama_in_the_United_States_Senate


In [16]:

#!pip install flask flask-cors requests 

In [17]:
import random
import io
from flask import Flask, request, jsonify
from flask_cors import CORS
from contextlib import redirect_stdout
from threading import Thread
import json

app = Flask(__name__)
CORS(app)

SAMPLE_QUESTIONS = [
    "Barack Obama was born in Hawaii",
    "Elvis Presley died in Memphis",
    "Albert Einstein was born in Ulm",
    "Donald Trump was born in Queens",
    "Paris is the capital of France",
    "Michael Jackson died in Los Angeles",
    "Celine Dion was born in Canada",
    "Elvis Presley died in Paris",          
    "Donald Trump was born in Paris",       
    "Albert Einstein was born in Berlin",   
]

@app.route("/get_suggestions", methods=["GET"])
def get_suggestions():
    suggestions = random.sample(SAMPLE_QUESTIONS, 2)
    return jsonify({
        "suggestions": suggestions
    })

@app.route("/ask", methods=["POST"])
def ask(): 
    #exécute check_fact_statement, et renvoie le résultat + les logs debug.
    data = request.get_json(force=True)
    user_question = data.get("question", "").strip()
    
    if not user_question:
        return jsonify({"error": "No question provided."}), 400
    
    print("=== DEBUG FLASK: Question reçue:", user_question)  # Log de debug
    
    f_debug = io.StringIO()
    with redirect_stdout(f_debug):
        print("=== DEBUG FLASK: Début de l'exécution de check_fact_statement") 
        result = check_fact_statement(user_question)
        print("=== DEBUG FLASK: Résultat brut de check_fact_statement:", result)  
        print("RESULT =>", json.dumps(result, indent=2))
        print("=== DEBUG FLASK: Fin de l'exécution") 
    
    debug_output = f_debug.getvalue()  # tout ce qui a été "printé"
    print("=== DEBUG FLASK: Contenu capturé dans debug_output:", debug_output)  
    
    response = {
        "debug": debug_output,
        "result": result
    }
    print("=== DEBUG FLASK: Réponse finale envoyée:", json.dumps(response, indent=2)) 
    
    return jsonify(response)
    
def run_flask():
    app.run(debug=True, use_reloader=False)  

#  thread si on est dans un notebook
flask_thread = Thread(target=run_flask)
flask_thread.daemon = True  # Le thread s'arrêtera quand le notebook sera arrêté
flask_thread.start()

print("Le serveur Flask est démarré sur http://127.0.0.1:5000")

Le serveur Flask est démarré sur http://127.0.0.1:5000


 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit


In [18]:
'''import requests

BASE_URL = "http://127.0.0.1:5000"

# 1) Récupérer 2 suggestions
resp = requests.get(f"{BASE_URL}/get_suggestions")
if resp.status_code == 200:
    print("Suggestions =>", resp.json())
else:
    print("Error:", resp.text)

# 2) Envoyer une question
question_to_ask = "Barack Obama was born in Hawaii"
resp2 = requests.post(f"{BASE_URL}/ask", json={"question": question_to_ask})
if resp2.status_code == 200:
    data = resp2.json()
    print("\n=== DEBUG LOGS ===")
    print(data["debug"])
    print("=== RESULT ===")
    print(data["result"])
else:
    print("Error:", resp2.text)
'''

'import requests\n\nBASE_URL = "http://127.0.0.1:5000"\n\n# 1) Récupérer 2 suggestions\nresp = requests.get(f"{BASE_URL}/get_suggestions")\nif resp.status_code == 200:\n    print("Suggestions =>", resp.json())\nelse:\n    print("Error:", resp.text)\n\n# 2) Envoyer une question\nquestion_to_ask = "Barack Obama was born in Hawaii"\nresp2 = requests.post(f"{BASE_URL}/ask", json={"question": question_to_ask})\nif resp2.status_code == 200:\n    data = resp2.json()\n    print("\n=== DEBUG LOGS ===")\n    print(data["debug"])\n    print("=== RESULT ===")\n    print(data["result"])\nelse:\n    print("Error:", resp2.text)\n'

127.0.0.1 - - [15/Feb/2025 15:25:10] "GET /get_suggestions HTTP/1.1" 200 -
127.0.0.1 - - [15/Feb/2025 15:25:12] "OPTIONS /ask HTTP/1.1" 200 -


=== DEBUG FLASK: Question reçue: Barack Obama was born in Hawaii


127.0.0.1 - - [15/Feb/2025 15:25:15] "POST /ask HTTP/1.1" 200 -


=== DEBUG FLASK: Contenu capturé dans debug_output: === DEBUG FLASK: Début de l'exécution de check_fact_statement

==== FACT CHECK STATEMENT ====
User statement: 'Barack Obama was born in Hawaii'
DEBUG EXTRACT: S='Barack Obama'  R='was born in'  O='Hawaii'
DEBUG: raw_relation='was born in', matched_relation_key='was born in'
DEBUG: Lookup URL = https://lookup.dbpedia.org/api/search/KeywordSearch?QueryString=Barack%20Obama&MaxHits=15, status_code=200
DEBUG: Response content (first 200 chars): <?xml version="1.0" encoding="UTF-8"?><ArrayOfResults><Result><Label>Barack Obama</Label><URI>http://dbpedia.org/resource/Barack_Obama</URI><Description>Barack Hussein Obama II ( (); born August 4, 19
DEBUG: Lookup URL = https://lookup.dbpedia.org/api/search/KeywordSearch?QueryString=Hawaii&MaxHits=15, status_code=200
DEBUG: Response content (first 200 chars): <?xml version="1.0" encoding="UTF-8"?><ArrayOfResults><Result><Label>Hawaii</Label><URI>http://dbpedia.org/resource/Hawaii</URI><Description

127.0.0.1 - - [15/Feb/2025 15:25:21] "GET /get_suggestions HTTP/1.1" 200 -


## Conclusion
With all these cells, we build a pipeline that:

- Extracts a candidate triple from a user statement.
- Matches the relation with a known DBpedia property.
- Finds the best URI for subject/object using DBpedia Lookup.
- Uses SPARQL queries to check if the triple is in DBpedia.
- If not found, attempts an indirect or “correction” approach to handle partial truths

Finally, our fact-checking can verify statements by extracting triplets, matching them to a known set of relations, and querying DBpedia to see if those relations appear in the knowledge base. The current project is still quite , because of a lack of time to develop it. It handles only a few predefined types of relations (e.g. “was born in,” “died in,” “founded”) and relies on fairly simple methods of extracting triples from text. The usage of DBpedias Lookup API can also fail or produce uncertain results for more ambiguous queries, and we only implement a small part of DBpedia’s ontology (not all relations or classes).

- What is missing and what could be improved:

 The system now depends on a “naive chunking” approach where the subject is assumed to be the first noun phrase and the object the last noun phrase. A more sophisticated NLP pipeline could better distinguish multiple subjects, objects, and verb phrases.

 We only still handle a limited set of known relations like “was born in,” “died in,” etc. We would need to integrate a richer set of relations  to handle more complex statements.

 Using SPARQL queries at scale might become slow if we process large volumes of statements or rely on more advanced reasoning steps. We would want to consider local caching or more optimized infrastructure solutions.

- How to extend toward a fully automated system:

Integrate advanced NLP: Use a more comprehensive relation-extraction framework that identifies both standard and custom relations, as well as handles complex sentence structures (passive voices, coordination, or nested clauses).

Expand the ontology: Link to a broader range of DBpedia or Wikidata properties to cover more statements. We could also maintain our own local knowledge graph that is frequently updated and refined.

Better reasoning: Develop a reasoning engine that can chain multiple facts, handle potential contradictions, and rank the most likely matches. This could involve both symbolic logic (for deducing transitive or hierarchical facts) and machine learning (for computing similarity or confidence scores).


Overall, the existing code demonstrates the core logic extracting triplets, matching relations, querying a knowledge base, and returning whether the statement is confirmed, partially confirmed, or not confirmed at all. Much work remains to handle more diverse, ambiguous, or indirect statements, and to scale up robustly for real-world, automated fact-checking scenarios.