# Working with RDF and SPARQL for Knowledge Graph Queries

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Understand RDF (Resource Description Framework) structure and syntax
- Learn how to create and represent knowledge graphs using RDF
- Master SPARQL query language for querying knowledge graphs
- Apply RDF/SPARQL to real-world knowledge representation problems

## ðŸ”— Prerequisites

- âœ… Python 3.8+ installed
- âœ… Basic understanding of knowledge representation
- âœ… Understanding of graph structures

---

## Official Structure Reference

This notebook covers the practical activity from **Course 01, Unit 2**:
- **Activity:** Working with RDF and SPARQL for knowledge graph queries
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 2 Practical Content

---

## Introduction to RDF and Knowledge Graphs

RDF (Resource Description Framework) is a standard model for data interchange on the Web. It represents information as a graph structure where:
- **Resources** are nodes (entities)
- **Properties** are edges (relationships)
- **Statements** are triples (Subject-Predicate-Object)



In [None]:
# Install required libraries if not already installed
# !pip install rdflib sparqlwrapper

import rdflib
from rdflib import Graph, Namespace, URIRef, Literal, BNode
from rdflib.namespace import RDF, RDFS, FOAF, XSD
from SPARQLWrapper import SPARQLWrapper, JSON
import json

print("âœ… Libraries imported successfully!")
print("RDFLib version:", rdflib.__version__)


## Part 1: Creating an RDF Knowledge Graph

Let's create a simple knowledge graph about university courses and instructors.


In [None]:
# Create a new RDF graph
g = Graph()

# Define custom namespaces
UNI = Namespace("http://example.org/university/")
PERSON = Namespace("http://example.org/person/")

# Bind namespaces for cleaner output
g.bind("uni", UNI)
g.bind("person", PERSON)
g.bind("foaf", FOAF)

print("âœ… RDF Graph created with namespaces defined")


In [None]:
# Add triples to the graph
# Format: (Subject, Predicate, Object)

# Course: AIAT 111
g.add((UNI.AIAT111, RDF.type, UNI.Course))
g.add((UNI.AIAT111, RDFS.label, Literal("Introduction to AI & Applications")))
g.add((UNI.AIAT111, UNI.courseCode, Literal("AIAT 111", datatype=XSD.string)))
g.add((UNI.AIAT111, UNI.creditHours, Literal(3, datatype=XSD.integer)))

# Course: AIAT 112
g.add((UNI.AIAT112, RDF.type, UNI.Course))
g.add((UNI.AIAT112, RDFS.label, Literal("Python for AI")))
g.add((UNI.AIAT112, UNI.courseCode, Literal("AIAT 112", datatype=XSD.string)))
g.add((UNI.AIAT112, UNI.creditHours, Literal(3, datatype=XSD.integer)))

# Instructor: Dr. Smith
g.add((PERSON.DrSmith, RDF.type, FOAF.Person))
g.add((PERSON.DrSmith, FOAF.name, Literal("Dr. John Smith")))
g.add((PERSON.DrSmith, UNI.teaches, UNI.AIAT111))
# Instructor: Dr. Ali
g.add((PERSON.DrAli, RDF.type, FOAF.Person))
g.add((PERSON.DrAli, FOAF.name, Literal("Dr. Ahmed Ali")))
g.add((PERSON.DrAli, UNI.teaches, UNI.AIAT112))
# Relationships: Course prerequisites
g.add((UNI.AIAT112, UNI.hasPrerequisite, UNI.AIAT111))
print(f"âœ… Added {len(g)} triples to the knowledge graph")
print(f"Graph contains {len(set(g.subjects()))} unique subjects")


## Part 2: Visualizing and Inspecting the RDF Graph

Let's examine what we've created:


In [None]:
# Display all triples in the graph
print("=" * 60)
print("All Triples in the Knowledge Graph:")
print("=" * 60)
for s, p, o in g:
 print(f"Subject: {s}")
 print(f" Predicate: {p}")
 print(f" Object: {o}")
 print()


In [None]:
# Serialize the graph in different formats
print("=" * 60)
print("RDF/XML Format:")
print("=" * 60)
xml_output = g.serialize(format='xml')
if isinstance(xml_output, bytes):
    xml_output = xml_output.decode('utf-8')
print(xml_output)

In [None]:
# Serialize in Turtle format (more readable)
print("=" * 60)
print("Turtle Format (TTL):")
print("=" * 60)
# Serialize in Turtle format (more readable)
print("=" * 60)
print("Turtle Format (TTL):")
print("=" * 60)
turtle_output = g.serialize(format='turtle')
if isinstance(turtle_output, bytes):
    turtle_output = turtle_output.decode('utf-8')
print(turtle_output)


## Part 3: Querying with SPARQL

SPARQL (SPARQL Protocol and RDF Query Language) is the standard query language for RDF graphs.


In [None]:
# SPARQL Query 1: Find all courses
query1 = """
PREFIX uni: <http://example.org/university/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?course ?nameWHERE {
 ?course rdf:type uni:Course .
 ?course rdfs:label ?name .
}
"""

print("Query 1: Find all courses")
print("=" * 60)
results = g.query(query1)
for row in results:
 print(f"Course: {row.course}")
 print(f" Name: {row.name}")
 print()


In [None]:
# SPARQL Query 2: Find all instructors and the courses they teach
query2 = """
PREFIX uni: <http://example.org/university/>
PREFIX person: <http://example.org/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?instructor ?name ?course ?courseNameWHERE {
 ?instructor rdf:type foaf:Person .
 ?instructor foaf:name ?name .
 ?instructor uni:teaches ?course .
 ?course rdfs:label ?courseName .
}
"""

print("Query 2: Find instructors and their courses")
print("=" * 60)
results = g.query(query2)
for row in results:
 print(f"Instructor: {row.name}")
 print(f" Teaches: {row.courseName}")
 print()


In [None]:
# SPARQL Query 3: Find courses with prerequisites
query3 = """
PREFIX uni: <http://example.org/university/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?course ?courseName ?prerequisite ?prereqNameWHERE {
 ?course uni:hasPrerequisite ?prerequisite .
 ?course rdfs:label ?courseName .
 ?prerequisite rdfs:label ?prereqName .
}
"""

print("Query 3: Find course prerequisites")
print("=" * 60)
results = g.query(query3)
for row in results:
 print(f"Course: {row.courseName}")
 print(f" Requires: {row.prereqName}")
 print()


## Part 4: Real-World Example - Knowledge Graph for AI Research

Let's create a more complex knowledge graph about AI research areas and publications.


In [None]:
# Create a new graph for AI research knowledge
research_g = Graph()

# Define namespaces
AI = Namespace("http://example.org/ai/")
RESEARCH = Namespace("http://example.org/research/")

research_g.bind("ai", AI)
research_g.bind("research", RESEARCH)

# Add research areas
research_g.add((AI.MachineLearning, RDF.type, AI.ResearchArea))
research_g.add((AI.MachineLearning, RDFS.label, Literal("Machine Learning")))
research_g.add((AI.DeepLearning, RDF.type, AI.ResearchArea))
research_g.add((AI.DeepLearning, RDFS.label, Literal("Deep Learning")))
research_g.add((AI.NLP, RDF.type, AI.ResearchArea))
research_g.add((AI.NLP, RDFS.label, Literal("Natural Language Processing")))

# Add relationships (Deep Learning is a sub-area of Machine Learning)
research_g.add((AI.DeepLearning, RDFS.subClassOf, AI.MachineLearning))
research_g.add((AI.NLP, AI.usesTechnique, AI.DeepLearning))
# Add publications
research_g.add((RESEARCH.Paper1, RDF.type, RESEARCH.Publication))
research_g.add((RESEARCH.Paper1, RESEARCH.title, Literal("Transformer Models in NLP")))
research_g.add((RESEARCH.Paper1, RESEARCH.inArea, AI.NLP))
research_g.add((RESEARCH.Paper2, RDF.type, RESEARCH.Publication))
research_g.add((RESEARCH.Paper2, RESEARCH.title, Literal("Introduction to Neural Networks")))
research_g.add((RESEARCH.Paper2, RESEARCH.inArea, AI.DeepLearning))
print(f"âœ… Created AI research knowledge graph with {len(research_g)} triples")


In [None]:
# Complex SPARQL Query: Find all research areas and their publications
complex_query = """
PREFIX ai: <http://example.org/ai/>
PREFIX research: <http://example.org/research/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?area ?areaName ?publication ?pubTitleWHERE {
 ?area rdf:type ai:ResearchArea .
 ?area rdfs:label ?areaName .
 ?publication research:inArea ?area .
 ?publication research:title ?pubTitle .
}
ORDER BY ?areaName
"""

print("Complex Query: Research areas and their publications")
print("=" * 60)
results = research_g.query(complex_query)
for row in results:
 print(f"Research Area: {row.areaName}")
 print(f" Publication: {row.pubTitle}")
 print()


## Part 5: Saving and Loading RDF Graphs

RDF graphs can be saved to files and loaded later for reuse.


In [None]:
# Save the graph to a file
output_file = "university_knowledge_graph.ttl"
g.serialize(destination=output_file, format='turtle')
print(f"âœ… Saved knowledge graph to {output_file}")

# Load the graph back
loaded_g = Graph()
loaded_g.parse(output_file, format='turtle')
print(f"âœ… Loaded knowledge graph from {output_file}")
print(f" Loaded {len(loaded_g)} triples")


## Summary | Ø§Ù„Ù…Ù„Ø®Øµ

### Key Concepts Learned:

1. **RDF (Resource Description Framework)**
   - Represents knowledge as triples (Subject-Predicate-Object)
   - Uses namespaces to avoid naming conflicts
   - Can be serialized in multiple formats (XML, Turtle, JSON-LD)

2. **SPARQL Query Language**
   - Standard query language for RDF graphs
   - Similar to SQL but works with graph data
   - Supports SELECT, ASK, CONSTRUCT, and DESCRIBE queries

3. **Knowledge Graphs**
   - Graph-based representation of knowledge
   - Useful for representing complex relationships
   - Enables powerful querying and reasoning

### Real-World Applications:

- **Semantic Web**: RDF is the foundation of the Semantic Web
- **Knowledge Bases**: Wikipedia data, DBpedia, Wikidata use RDF
- **Enterprise Knowledge Management**: Organizing organizational knowledge
- **AI Systems**: Representing domain knowledge for expert systems

### Next Steps:

1. Practice writing SPARQL queries for different scenarios
2. Explore public RDF datasets (DBpedia, Wikidata)
3. Learn about RDF Schema (RDFS) and OWL ontologies
4. Integrate RDF/SPARQL into AI applications

---

**Reference:** This notebook covers the practical requirement from Course 01, Unit 2: "Working with RDF and SPARQL for knowledge graph queries"
