# Semantic Web Tools Introduction

This notebook demonstrates the basic usage of semantic web tools used in the Persuasion-Aware MUSE project:

1. **RDFLib** - Python library for working with RDF graphs
2. **SPARQLWrapper** - SPARQL endpoint client for querying knowledge bases
3. **Wikidata Integration** - Entity linking via Wikidata SPARQL endpoint

---

## Setup

Install required packages (if not already installed):

In [1]:
# !pip install rdflib SPARQLWrapper

In [2]:
from rdflib import Graph, Namespace, Literal, URIRef, RDF, RDFS, XSD
from rdflib.namespace import FOAF, OWL
from SPARQLWrapper import SPARQLWrapper, JSON
import json

---

## 1. RDFLib Basics

RDFLib is a Python library for working with RDF (Resource Description Framework) data. RDF represents information as **triples**: (Subject, Predicate, Object).

### 1.1 Creating an RDF Graph

In [3]:
# Create a new RDF graph
g = Graph()

# Define custom namespace for our ontology
PERSUASION = Namespace("http://example.org/persuasion#")
WD = Namespace("http://www.wikidata.org/entity/")

# Bind prefixes for cleaner serialization
g.bind("persuasion", PERSUASION)
g.bind("wd", WD)
g.bind("foaf", FOAF)

print("Empty graph created with", len(g), "triples")

Empty graph created with 0 triples


### 1.2 Adding Triples to the Graph

In [4]:
# Create a sample social media post
post_uri = URIRef("http://example.org/post#001")

# Add triples about the post
g.add((post_uri, RDF.type, PERSUASION.Post))
g.add((post_uri, PERSUASION.postId, Literal("001")))
g.add((post_uri, PERSUASION.textContent, Literal("The EU is forcing countries to accept unlimited migrants!")))
g.add((post_uri, PERSUASION.platform, Literal("Twitter")))
g.add((post_uri, PERSUASION.timestamp, Literal("2025-01-15T10:30:00Z", datatype=XSD.dateTime)))

print(f"Graph now contains {len(g)} triples")

Graph now contains 5 triples


### 1.3 Adding Claims and Persuasion Techniques

In [5]:
# Create a claim extracted from the post
claim_uri = URIRef("http://example.org/claim#001_1")

g.add((claim_uri, RDF.type, PERSUASION.Claim))
g.add((claim_uri, PERSUASION.claimText, Literal("EU is forcing countries to accept unlimited migrants")))
g.add((claim_uri, PERSUASION.confidenceScore, Literal(0.92, datatype=XSD.float)))

# Link claim to post
g.add((post_uri, PERSUASION.containsClaim, claim_uri))

# Add persuasion techniques
g.add((claim_uri, PERSUASION.usesTechnique, PERSUASION.FearAppeal))
g.add((claim_uri, PERSUASION.usesTechnique, PERSUASION.LoadedLanguage))

# Add verification status
g.add((claim_uri, PERSUASION.hasVerificationStatus, PERSUASION.False_))

print(f"Graph now contains {len(g)} triples")

Graph now contains 12 triples


### 1.4 Linking to Wikidata Entities

In [6]:
# Create entity and link to Wikidata
entity_uri = URIRef("http://example.org/entity#European_Union")

g.add((entity_uri, RDF.type, PERSUASION.Entity))
g.add((entity_uri, PERSUASION.entityName, Literal("European Union")))
g.add((entity_uri, PERSUASION.entityType, Literal("Organization")))

# Link to Wikidata (Q458 = European Union)
g.add((entity_uri, PERSUASION.linkedToWikidata, WD.Q458))

# Connect entity to claim
g.add((claim_uri, PERSUASION.targetsEntity, entity_uri))

print(f"Graph now contains {len(g)} triples")

Graph now contains 17 triples


### 1.5 Serializing the Graph

In [7]:
# Serialize to Turtle format (human-readable)
turtle_output = g.serialize(format="turtle")
print("=== Turtle Format ===")
print(turtle_output)

=== Turtle Format ===
@prefix persuasion: <http://example.org/persuasion#> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/post#001> a persuasion:Post ;
    persuasion:containsClaim <http://example.org/claim#001_1> ;
    persuasion:platform "Twitter" ;
    persuasion:postId "001" ;
    persuasion:textContent "The EU is forcing countries to accept unlimited migrants!" ;
    persuasion:timestamp "2025-01-15T10:30:00+00:00"^^xsd:dateTime .

<http://example.org/claim#001_1> a persuasion:Claim ;
    persuasion:claimText "EU is forcing countries to accept unlimited migrants" ;
    persuasion:confidenceScore "0.92"^^xsd:float ;
    persuasion:hasVerificationStatus persuasion:False_ ;
    persuasion:targetsEntity <http://example.org/entity#European_Union> ;
    persuasion:usesTechnique persuasion:FearAppeal,
        persuasion:LoadedLanguage .

<http://example.org/entity#European_Union> a persuasion:Entity ;
    persuasi

In [8]:
# Serialize to JSON-LD format (machine-readable)
jsonld_output = g.serialize(format="json-ld")
print("=== JSON-LD Format ===")
print(json.dumps(json.loads(jsonld_output), indent=2)[:1500], "...")

=== JSON-LD Format ===
[
  {
    "@id": "http://example.org/claim#001_1",
    "@type": [
      "http://example.org/persuasion#Claim"
    ],
    "http://example.org/persuasion#claimText": [
      {
        "@value": "EU is forcing countries to accept unlimited migrants"
      }
    ],
    "http://example.org/persuasion#confidenceScore": [
      {
        "@type": "http://www.w3.org/2001/XMLSchema#float",
        "@value": "0.92"
      }
    ],
    "http://example.org/persuasion#hasVerificationStatus": [
      {
        "@id": "http://example.org/persuasion#False_"
      }
    ],
    "http://example.org/persuasion#targetsEntity": [
      {
        "@id": "http://example.org/entity#European_Union"
      }
    ],
    "http://example.org/persuasion#usesTechnique": [
      {
        "@id": "http://example.org/persuasion#FearAppeal"
      },
      {
        "@id": "http://example.org/persuasion#LoadedLanguage"
      }
    ]
  },
  {
    "@id": "http://example.org/entity#European_Union",
    "

### 1.6 Querying the Graph with SPARQL

In [9]:
# Query all posts with their claims
query = """
PREFIX persuasion: <http://example.org/persuasion#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?post ?postId ?claimText ?technique
WHERE {
    ?post rdf:type persuasion:Post ;
          persuasion:postId ?postId ;
          persuasion:containsClaim ?claim .
    ?claim persuasion:claimText ?claimText ;
           persuasion:usesTechnique ?technique .
}
"""

results = g.query(query)
print("Query Results:")
print("-" * 80)
for row in results:
    print(f"Post: {row.postId}")
    print(f"Claim: {row.claimText}")
    print(f"Technique: {row.technique.split('#')[-1]}")
    print()

Query Results:
--------------------------------------------------------------------------------
Post: 001
Claim: EU is forcing countries to accept unlimited migrants
Technique: FearAppeal

Post: 001
Claim: EU is forcing countries to accept unlimited migrants
Technique: LoadedLanguage



---

## 2. SPARQLWrapper - Querying External Endpoints

SPARQLWrapper allows querying external SPARQL endpoints like Wikidata, DBpedia, etc.

### 2.1 Querying Wikidata

In [10]:
# Setup Wikidata SPARQL endpoint
WIKIDATA_ENDPOINT = "https://query.wikidata.org/sparql"
sparql = SPARQLWrapper(WIKIDATA_ENDPOINT)
sparql.setReturnFormat(JSON)

# Query to find the European Union entity
query = """
SELECT ?item ?itemLabel ?itemDescription WHERE {
    ?item rdfs:label "European Union"@en .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 5
"""

sparql.setQuery(query)
results = sparql.query().convert()

print("Wikidata results for 'European Union':")
print("-" * 60)
for result in results["results"]["bindings"]:
    item_id = result["item"]["value"].split("/")[-1]
    label = result.get("itemLabel", {}).get("value", "N/A")
    description = result.get("itemDescription", {}).get("value", "N/A")
    print(f"ID: {item_id}")
    print(f"Label: {label}")
    print(f"Description: {description}")
    print()

Wikidata results for 'European Union':
------------------------------------------------------------
ID: Q458
Label: European Union
Description: political and economic union of 27 European states

ID: Q319328
Label: European Union
Description: antifascist resistance group



### 2.2 Entity Linking Function

In [11]:
def find_wikidata_entity(entity_name: str, entity_type: str = None) -> dict:
    """
    Search Wikidata for an entity by name.
    
    Args:
        entity_name: Name of the entity to search
        entity_type: Optional type filter (Person, Organization, Location)
        
    Returns:
        Dictionary with entity info or None
    """
    sparql = SPARQLWrapper(WIKIDATA_ENDPOINT)
    sparql.setReturnFormat(JSON)
    
    # Build query with optional type filter
    type_filter = ""
    if entity_type == "Person":
        type_filter = "?item wdt:P31 wd:Q5 ."  # instance of human
    elif entity_type == "Organization":
        type_filter = "?item wdt:P31/wdt:P279* wd:Q43229 ."  # instance of organization
    elif entity_type == "Location":
        type_filter = "?item wdt:P31/wdt:P279* wd:Q618123 ."  # geographical feature
    
    query = f"""
    SELECT ?item ?itemLabel ?itemDescription WHERE {{
        ?item rdfs:label "{entity_name}"@en .
        {type_filter}
        SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en". }}
    }}
    LIMIT 1
    """
    
    try:
        sparql.setQuery(query)
        results = sparql.query().convert()
        
        if results["results"]["bindings"]:
            result = results["results"]["bindings"][0]
            return {
                "wikidata_id": result["item"]["value"].split("/")[-1],
                "label": result.get("itemLabel", {}).get("value", entity_name),
                "description": result.get("itemDescription", {}).get("value", "")
            }
    except Exception as e:
        print(f"Error querying Wikidata: {e}")
    
    return None

# Test the function
entities_to_test = [
    ("European Union", "Organization"),
    ("Angela Merkel", "Person"),
    ("COVID-19", None),
    ("Twitter", "Organization")
]

print("Entity Linking Results:")
print("=" * 60)
for name, etype in entities_to_test:
    result = find_wikidata_entity(name, etype)
    if result:
        print(f"\n{name}:")
        print(f"  Wikidata ID: {result['wikidata_id']}")
        print(f"  Description: {result['description']}")
    else:
        print(f"\n{name}: Not found")

Entity Linking Results:

European Union:
  Wikidata ID: Q458
  Description: political and economic union of 27 European states

Angela Merkel:
  Wikidata ID: Q567
  Description: chancellor of Germany from 2005 to 2021

COVID-19:
  Wikidata ID: Q84263196
  Description: contagious disease caused by SARS-CoV-2

Twitter:
  Wikidata ID: Q1390577
  Description: defunct American social media company


---

## 3. Loading and Validating Our Ontology

Let's load and explore our project's persuasion ontology.

In [12]:
# Load the persuasion ontology
ontology = Graph()
ontology.parse("../persuasion_ontology.ttl", format="turtle")

print(f"Ontology loaded with {len(ontology)} triples")

Ontology loaded with 183 triples


In [13]:
# Query all classes defined in the ontology
classes_query = """
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?cls ?comment
WHERE {
    ?cls a owl:Class .
    OPTIONAL { ?cls rdfs:comment ?comment }
}
ORDER BY ?cls
"""

print("Classes in Persuasion Ontology:")
print("=" * 60)
for row in ontology.query(classes_query):
    class_name = str(row.cls).split("#")[-1]
    comment = str(row.comment) if row.comment else "No description"
    print(f"- {class_name}: {comment[:80]}")

Classes in Persuasion Ontology:
- AppealToAuthority: Citing authority figures to lend credibility without proper evidence
- Claim: A factual assertion made in a post that can be verified
- Correction: A structured correction of misinformation in a post
- Entity: A real-world entity (person, organization, location, etc.) mentioned in a post
- Evidence: Source or proof supporting or refuting a claim
- Exaggeration: Overstating or understating facts to manipulate perception
- False: No description
- FearAppeal: Technique that uses fear to influence behavior or beliefs
- LLMAgent: An AI agent (e.g., GPT-4, MUSE) that performs annotation
- LoadedLanguage: Use of emotionally charged words to influence opinion
- Misleading: No description
- MostlyFalse: No description
- MostlyTrue: No description
- PersuasionTechnique: A strategy used to influence opinion through manipulation
- Post: A social media post that may contain claims and persuasion techniques
- Scapegoating: Blaming a person or grou

In [14]:
# Query persuasion technique subclasses
techniques_query = """
PREFIX : <http://example.org/persuasion#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?technique ?comment
WHERE {
    ?technique rdfs:subClassOf :PersuasionTechnique .
    OPTIONAL { ?technique rdfs:comment ?comment }
}
"""

print("\nPersuasion Techniques:")
print("=" * 60)
for row in ontology.query(techniques_query):
    technique_name = str(row.technique).split("#")[-1]
    comment = str(row.comment) if row.comment else "No description"
    print(f"- {technique_name}: {comment}")


Persuasion Techniques:
- FearAppeal: Technique that uses fear to influence behavior or beliefs
- LoadedLanguage: Use of emotionally charged words to influence opinion
- AppealToAuthority: Citing authority figures to lend credibility without proper evidence
- Scapegoating: Blaming a person or group for problems they didn't cause
- Exaggeration: Overstating or understating facts to manipulate perception


---

## 4. Complete Example: Creating Annotated Post

Let's create a complete example following our ontology structure.

In [15]:
def create_annotated_post_graph(
    post_id: str,
    text: str,
    claims: list,
    platform: str = "Twitter"
) -> Graph:
    """
    Create an RDF graph for an annotated social media post.
    
    Args:
        post_id: Unique identifier for the post
        text: Full text content of the post
        claims: List of claim dictionaries with keys:
                - text: claim text
                - techniques: list of persuasion technique names
                - entities: list of (name, wikidata_id) tuples
                - verification_status: True/False/Misleading/etc.
        platform: Social media platform name
        
    Returns:
        RDFLib Graph with the annotated post
    """
    g = Graph()
    
    # Namespaces
    PERSUASION = Namespace("http://example.org/persuasion#")
    WD = Namespace("http://www.wikidata.org/entity/")
    PROV = Namespace("http://www.w3.org/ns/prov#")
    
    g.bind("persuasion", PERSUASION)
    g.bind("wd", WD)
    g.bind("prov", PROV)
    
    # Create post node
    post_uri = URIRef(f"http://example.org/post#{post_id}")
    g.add((post_uri, RDF.type, PERSUASION.Post))
    g.add((post_uri, PERSUASION.postId, Literal(post_id)))
    g.add((post_uri, PERSUASION.textContent, Literal(text)))
    g.add((post_uri, PERSUASION.platform, Literal(platform)))
    
    # Add claims
    for i, claim_data in enumerate(claims, 1):
        claim_uri = URIRef(f"http://example.org/claim#{post_id}_{i}")
        g.add((claim_uri, RDF.type, PERSUASION.Claim))
        g.add((claim_uri, PERSUASION.claimText, Literal(claim_data["text"])))
        g.add((post_uri, PERSUASION.containsClaim, claim_uri))
        
        # Add techniques
        for technique in claim_data.get("techniques", []):
            technique_uri = URIRef(f"http://example.org/persuasion#{technique}")
            g.add((claim_uri, PERSUASION.usesTechnique, technique_uri))
        
        # Add entities
        for entity_name, wikidata_id in claim_data.get("entities", []):
            entity_uri = URIRef(f"http://example.org/entity#{entity_name.replace(' ', '_')}")
            g.add((entity_uri, RDF.type, PERSUASION.Entity))
            g.add((entity_uri, PERSUASION.entityName, Literal(entity_name)))
            if wikidata_id:
                g.add((entity_uri, PERSUASION.linkedToWikidata, WD[wikidata_id]))
            g.add((claim_uri, PERSUASION.targetsEntity, entity_uri))
        
        # Add verification status
        if "verification_status" in claim_data:
            status_uri = URIRef(f"http://example.org/persuasion#{claim_data['verification_status']}")
            g.add((claim_uri, PERSUASION.hasVerificationStatus, status_uri))
    
    return g

# Create a sample annotated post
sample_post = create_annotated_post_graph(
    post_id="demo_001",
    text="BREAKING: Vaccines cause autism! Big Pharma doesn't want you to know this!",
    claims=[
        {
            "text": "Vaccines cause autism",
            "techniques": ["FearAppeal", "AppealToAuthority"],
            "entities": [("Vaccines", "Q134808"), ("Autism", "Q38404")],
            "verification_status": "False"
        },
        {
            "text": "Big Pharma is hiding information",
            "techniques": ["Scapegoating", "LoadedLanguage"],
            "entities": [("Pharmaceutical industry", "Q507443")],
            "verification_status": "Misleading"
        }
    ]
)

print("Generated RDF Graph:")
print("=" * 60)
print(sample_post.serialize(format="turtle"))

Generated RDF Graph:
@prefix persuasion: <http://example.org/persuasion#> .
@prefix wd: <http://www.wikidata.org/entity/> .

<http://example.org/post#demo_001> a persuasion:Post ;
    persuasion:containsClaim <http://example.org/claim#demo_001_1>,
        <http://example.org/claim#demo_001_2> ;
    persuasion:platform "Twitter" ;
    persuasion:postId "demo_001" ;
    persuasion:textContent "BREAKING: Vaccines cause autism! Big Pharma doesn't want you to know this!" .

<http://example.org/claim#demo_001_1> a persuasion:Claim ;
    persuasion:claimText "Vaccines cause autism" ;
    persuasion:hasVerificationStatus persuasion:False ;
    persuasion:targetsEntity <http://example.org/entity#Autism>,
        <http://example.org/entity#Vaccines> ;
    persuasion:usesTechnique persuasion:AppealToAuthority,
        persuasion:FearAppeal .

<http://example.org/claim#demo_001_2> a persuasion:Claim ;
    persuasion:claimText "Big Pharma is hiding information" ;
    persuasion:hasVerificationStatu

---

## Summary

This notebook demonstrated:

1. **RDFLib basics**: Creating graphs, adding triples, serializing to Turtle/JSON-LD
2. **SPARQLWrapper**: Querying external endpoints like Wikidata
3. **Entity Linking**: Finding Wikidata IDs for named entities
4. **Ontology Usage**: Loading and querying our persuasion ontology

These tools form the foundation of our semantic web pipeline for persuasion detection and knowledge graph construction.