# Introduction to RDFLib: Working with Knowledge Graphs in Python
This notebook will guide you step-by-step through RDFLib, a Python library for working with RDF data. We will cover RDF basics, knowledge graphs, and practical examples using RDFLib.

## Agenda
- What is RDF?
- What is RDFLib?
- Installing RDFLib
- RDF Graph Basics (Creating, Parsing, Adding Triples)
- Serializing RDF Data
- Querying with SPARQL
- Example: Building a Knowledge Graph
- Real-world Use Case

## What is RDF?
### RDF Basics
RDF stands for Resource Description Framework. It is a standard model for data interchange on the web.

- **Triples**: RDF data is represented as triples, consisting of:
  - **Subject**: The resource we are describing (like a person, object, or concept).
  - **Predicate**: The relationship or property of the subject (e.g., 'name' or 'knows').
  - **Object**: The value or another resource the subject is related to.

### Example of an RDF Triple:
`<http://example.org/person/Alice> <http://xmlns.com/foaf/0.1/name> 'Alice'`

This represents that the subject `<Alice>` has the property `name` with the value `Alice`. This forms a graph, where the nodes are resources, and edges represent the relationships.

## What is RDFLib?
RDFLib is a Python library that simplifies working with RDF data. It allows you to:
- Create and manipulate RDF graphs.
- Serialize RDF data to different formats (like Turtle, XML, JSON-LD).
- Query RDF graphs using the SPARQL query language.

### Why use RDFLib?
RDFLib is widely used for working with knowledge graphs, which are essential in fields like Linked Data, Semantic Web, and ontologies. It's also used for integrating data from different sources and reasoning about relationships.

In [None]:
# Install RDFLib using pip (run this command in your terminal)
!pip install rdflib

## RDF Graph Basics
In RDFLib, an RDF Graph is used to store triples. Each triple consists of a subject, predicate, and object, which together represent data in a structured format.

### Creating an RDF Graph
Let's start by creating an empty RDF graph using RDFLib.

In [None]:
from rdflib import Graph
# Create an empty graph
g = Graph()
print(f'Created an empty graph with {len(g)} triples.')

The graph is currently empty. Next, let's learn how to parse RDF data into this graph.

## Parsing RDF Data
We can load RDF data from an external file or URL into the graph. RDF data can be represented in various formats, such as XML, Turtle, and JSON-LD.

### Parsing Example:
Here we parse an RDF file into the graph. The RDF file could contain data in any RDF format, like Turtle or XML.

In [None]:
# Parsing an RDF file (assuming we have an RDF file available)
# You can replace 'example.rdf' with a path to your own RDF file.
g.parse('example.rdf')
print(f'Graph has {len(g)} triples after parsing.')

## Serializing RDF Data
Once we have RDF data in a graph, we can serialize (export) it into various formats such as Turtle, XML, and JSON-LD. Serialization is useful for sharing or saving the data for later use.

In [None]:
# Serializing RDF data to XML format
g.serialize(destination='output.rdf', format='xml')
print('Serialized the RDF graph to XML format and saved it as output.rdf')

## Adding Triples to the Graph
We can programmatically add triples to the graph using RDFLib. Triples are added by defining a subject, predicate, and object, which represent the data.

In [None]:
from rdflib import URIRef, Literal, Namespace
# Define a namespace for our RDF data
EX = Namespace('http://example.org/')

# Add a few triples to the graph
g.add((URIRef(EX.Alice), URIRef(EX.name), Literal('Alice')))
g.add((URIRef(EX.Bob), URIRef(EX.knows), URIRef(EX.Alice)))

print(f'Graph now contains {len(g)} triples after adding.')

## Querying RDF with SPARQL
SPARQL is a query language for RDF. It allows us to query the graph for specific triples based on patterns.

### Example SPARQL Query:
We will query for all the subjects, predicates, and objects in the graph.

In [None]:
qres = g.query(
    '''
    SELECT ?subject ?predicate ?object
    WHERE {
      ?subject ?predicate ?object.
    }
    '''
)
# Print the results of the query
for row in qres:
    print(f'{row.subject} {row.predicate} {row.object}')

## Example: Building a Knowledge Graph
Now let's expand our graph by adding more data. We will create a small knowledge graph of people and relationships, and query the graph to retrieve specific information.

In [None]:
g.add((URIRef(EX.Carol), URIRef(EX.knows), URIRef(EX.Bob)))
# Query to find who knows Alice
qres = g.query(
    '''
    SELECT ?s WHERE { ?s <http://example.org/knows> <http://example.org/Alice> }
    '''
)
# Print out the results
for row in qres:
    print(f'{row.s} knows Alice.')

# Querying Local TTL File Using RDFLib with RecipeKG


In this section, we will use the RDFLib library to load a local TTL file (`recipekg_100.ttl`) and run SPARQL queries on it.
We’ll go through three example queries, starting with a basic query and gradually increasing the complexity.
Each example will be explained step-by-step.
    

## Step 1: Loading the TTL File

In [2]:
from rdflib import Graph
# Load the recipekg_100.ttl file into an RDFLib Graph
g = Graph()
g.parse("recipekg_100.ttl", format="ttl")
print(f"Graph has {len(g)} triples after loading 'recipekg_100.ttl'.")

Graph has 9753 triples after loading 'recipekg_100.ttl'.


## Example 1: Basic Query - Retrieving Triples

In [23]:
# Define a simple query to retrieve all subjects, predicates, and objects in the graph
query1 = '''
SELECT DISTINCT ?subject ?predicate ?object
WHERE {
    ?subject ?predicate ?object .
}
LIMIT 5
'''

# Run the query
results1 = g.query(query1)

# Print each subject, predicate, and object from the query results
for row in results1:
    print(f"Subject: {row.subject}, Predicate: {row.predicate}, Object: {row.object}")

Subject: http://purl.org/recipekg/recipe/skillet-pepper-and-garlic-pork-chops, Predicate: https://schema.org/name, Object: Skillet Pepper and Garlic Pork Chops
Subject: http://purl.org/recipekg/recipe/broccoli-casserole, Predicate: https://schema.org/datePublished, Object: 2000-04-24T03:18:48.000Z
Subject: nce80cf32b190476e8a182152a37af8e0b147, Predicate: http://purl.org/recipekg/hasFSAColor, Object: http://purl.org/recipekg/FSAGreen
Subject: nce80cf32b190476e8a182152a37af8e0b1662, Predicate: http://purl.org/recipekg/hasQuantity, Object: 1/3
Subject: nce80cf32b190476e8a182152a37af8e0b21, Predicate: http://purl.org/recipekg/hasUnit, Object: teaspoon


## Example 2: Intermediate Query - Retrieving Recipes

In [24]:
# Define a query to retrieve all subjects associated with a specific predicate (e.g., schema:Recipe)
query2 = '''
PREFIX schema: <https://schema.org/>
PREFIX recipeKG: <http://purl.org/recipekg/>
  SELECT ?recipe
  WHERE { ?recipe a schema:Recipe . }
  LIMIT 5
'''

# Run the query
results2 = g.query(query2)

# Print each recipe from the query results
for row in results2:
    print(f"Recipe: {row.recipe}")

Recipe: http://purl.org/recipekg/recipe/peanut-butter-tandy-bars
Recipe: http://purl.org/recipekg/recipe/the-best-oatmeal-cookies
Recipe: http://purl.org/recipekg/recipe/peach-cobbler-ii
Recipe: http://purl.org/recipekg/recipe/pie-crust-v
Recipe: http://purl.org/recipekg/recipe/dads-beef-and-chive-dip


## Example 3: Advanced Query - Recipes and their information

In [25]:
# Define a query to retrieve recipes with specific details (e.g., calorie, category, USDAScore)
query3 = """
     PREFIX schema: <https://schema.org/>
     PREFIX recipeKG: <http://purl.org/recipekg/>
     PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
     SELECT DISTINCT ?recipe ?calorie ?category ?USDAScore
     WHERE {
                 ?recipe a schema:Recipe .

                 ?recipe recipeKG:hasNutritionalInformation ?a .
                 ?a recipeKG:hasCalorificData ?b .
                 ?b recipeKG:hasAmount ?calorie .

                 ?recipe recipeKG:belongsTo ?subcategory .
                 ?subcategory rdfs:subClassOf* ?category .
                 ?category a recipeKG:RecipeCategory .

                 ?recipe recipeKG:hasUSDAScore ?USDAScore .
         }
         LIMIT 5
     """

# Run the query
results3 = g.query(query3)

# Print each recipe with its details from the query results
for row in results3:
    print(f"Recipe: {row.recipe}, Calorie: {row.calorie}, Category: {row.category}, USDA Score: {row.USDAScore}")

Recipe: http://purl.org/recipekg/recipe/peanut-butter-tandy-bars, Calorie: 230.0, Category: http://purl.org/recipekg/categories/desserts/, USDA Score: 3
Recipe: http://purl.org/recipekg/recipe/the-best-oatmeal-cookies, Calorie: 172.8, Category: http://purl.org/recipekg/categories/desserts/, USDA Score: 4
Recipe: http://purl.org/recipekg/recipe/peach-cobbler-ii, Calorie: 672.4, Category: http://purl.org/recipekg/categories/desserts/, USDA Score: 1
Recipe: http://purl.org/recipekg/recipe/pie-crust-v, Calorie: 210.4, Category: http://purl.org/recipekg/categories/desserts/, USDA Score: 3
Recipe: http://purl.org/recipekg/recipe/dads-beef-and-chive-dip, Calorie: 77.6, Category: http://purl.org/recipekg/categories/appetizers-and-snacks/, USDA Score: 4


## Conclusion
In this detailed tutorial, we covered:
- The basics of RDF and its structure (triples).
- How to use RDFLib to create, parse, and query RDF data.
- Practical examples of building and querying RDF graphs.

RDFLib provides a powerful toolkit for working with RDF data in Python. Explore more through its official documentation and experiment with real-world data to master RDF and knowledge graphs.


In this query, we find recipes that include a specific ingredient, such as "sugar".
We assume that each ingredient has an associated label property (e.g., `rdfs:label`),
and we filter for ingredients labeled as "sugar". This query can be extended to other ingredients by changing the filter value.
    