# Knowledge Representation on the Web -- RDF tutorial

In this tutorial we'll learn the basics of interacting with RDF graphs with Python. We'll be using rdflib for this, a widely used Ptyhon library for RDF (all documentation can be found [here](https://rdflib.readthedocs.io/en/stable/index.html))

## Imports
These are the main classes and types we'll be using from rdflib

In [1]:
import sys
# !{sys.executable} -m pip install rdflib

from rdflib import Graph, ConjunctiveGraph, Literal, BNode, Namespace, RDF, URIRef
from rdflib.namespace import DC, FOAF, RDFS

import pprint


## Loading data remotely and from files

rdflib accepts importing RDF data from a variety of sources, either locally from a file (including an extensive support of serializations), or remotely via a URI (this is a great way of checking practically if URIs return RDF according to the 3rd Linked Data principle).

A Graph object is always required to load triples.
**Note**: to load quads, and hence supporting named graphs, you'll need to use an instance of ConjunctiveGraph instead

**Exercise 1** 

For each step, use a different cell: 
1. create two graphs using rdflib:
    - and load one with triples from the site https://csarven.ca/ and/or http://www.w3.org/People/Berners-Lee/card 
    - load one with triples from ./data/ingredients.rdf. 

In [2]:
#TIP: look at the documentation of the rdflib library for how to LOAD and PARSE a graph - https://rdflib.readthedocs.io/en/stable/gettingstarted.html

# Create a Graph

def print_graph(loc):

    g = Graph()

    g.parse(loc)

    print(g.serialize(format="turtle"))
# g = Graph()

# Parse in an RDF file hosted on the Internet
print_graph("http://www.w3.org/People/Berners-Lee/card")

# print(g.serialize(format="turtle"))

@prefix : <http://xmlns.com/foaf/0.1/> .
@prefix Be: <https://www.w3.org/People/Berners-Lee/> .
@prefix Pub: <https://timbl.com/timbl/Public/> .
@prefix blog: <http://dig.csail.mit.edu/breadcrumbs/blog/> .
@prefix card: <https://www.w3.org/People/Berners-Lee/card#> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix con: <http://www.w3.org/2000/10/swap/pim/contact#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix doap: <http://usefulinc.com/ns/doap#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix s: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sioc: <http://rdfs.org/sioc/ns#> .
@prefix solid: <http://www.w3.org/ns/solid/terms#> .
@prefix space: <http://www.w3.org/ns/pim/space#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix w3c: <http://www.w3.org/data#> .
@

In [3]:
print_graph("https://csarven.ca/")

@prefix cs: <https://csarven.ca/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dok: <https://csarven.ca/dokie.li.inbox/> .
@prefix html: <http://www.w3.org/ns/iana/media-types/text/html#> .
@prefix inbox: <https://csarven.ca/inbox/> .
@prefix key: <https://csarven.ca/key/> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix medi: <https://csarven.ca/media/> .
@prefix n0: <https://csarven.ca/archives/> .
@prefix n1: <http://www.w3.org/ns/iana/media-types/application/atom+xml#> .
@prefix pl: <http://www.w3.org/ns/iana/media-types/text/plain#> .
@prefix pre: <https://csarven.ca/presentations/> .
@prefix scr: <https://csarven.ca/scripts/> .
@prefix space: <http://www.w3.org/ns/pim/space#> .
@prefix stat: <http://www.w3.org/ns/posix/stat#> .
@prefix tmp: <https://csarven.ca/tmp/> .
@prefix tur: <http://www.w3.org/ns/iana/media-types/text/turtle#> .
@prefix vnd: <http://www.w3.org/ns/iana/media-types/image/vnd.microsoft.icon#> .
@prefix xm: <http://www.w3.org/ns/iana/media-types/app

In [4]:
print_graph("./data/ingredients.rdf")

@prefix dct: <http://purl.org/dc/terms/> .
@prefix ind: <http://purl.org/heals/ingredient/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix sm: <https://www.omg.org/techprocess/ab/SpecificationMetadata/> .
@prefix wtm: <http://purl.org/heals/food/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ind:AlmondMeal a obo:FOODON_03400662,
        obo:FOODON_03400685,
        wtm:Ingredient,
        owl:NamedIndividual ;
    rdfs:label "almond meal" ;
    dct:source "Wikipedia, \"Almond meal.\" [Online]. Available:https://en.wikipedia.org/wiki/Almond_meal. [Accessed: Nov. 10, 2018]" ;
    wtm:hasGluten false ;
    wtm:hasGlycemicIndex "25"^^xsd:nonNegativeInteger ;
    skos:definition "an ingredient made from ground up almonds" ;
    skos:scopeNote "used as an ingredient" .

ind:AppleCiderVinegar a obo:FOODON_03400760,


## Serialising and saving RDF graphs

There are different formats for storing RDF triples. Semantically, these mean the same, they differ only in their syntax. 


Use the function Graph.serialize(format). 

**Exercise 2**

1. serialise one of the graphs to the .ttl, .xml and .nt format, and print the first n lines to compare the syntax
1. save your graph in the turtle format to the ./data/ folder

In [3]:
#serialize the chosen graph
def serialize_graph(loc, f):

    g = Graph()

    g.parse(loc)

    print(g.serialize(format=f))


serialize_graph("./data/ingredients.rdf", 'ttl')



@prefix dct: <http://purl.org/dc/terms/> .
@prefix ind: <http://purl.org/heals/ingredient/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix sm: <https://www.omg.org/techprocess/ab/SpecificationMetadata/> .
@prefix wtm: <http://purl.org/heals/food/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ind:AlmondMeal a obo:FOODON_03400662,
        obo:FOODON_03400685,
        wtm:Ingredient,
        owl:NamedIndividual ;
    rdfs:label "almond meal" ;
    dct:source "Wikipedia, \"Almond meal.\" [Online]. Available:https://en.wikipedia.org/wiki/Almond_meal. [Accessed: Nov. 10, 2018]" ;
    wtm:hasGluten false ;
    wtm:hasGlycemicIndex "25"^^xsd:nonNegativeInteger ;
    skos:definition "an ingredient made from ground up almonds" ;
    skos:scopeNote "used as an ingredient" .

ind:AppleCiderVinegar a obo:FOODON_03400760,


In [6]:
serialize_graph("./data/ingredients.rdf", 'nt')

<http://purl.org/heals/ingredient/CanolaOil> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/FOODON_03400764> .
<http://purl.org/heals/ingredient/VanillaExtract> <http://purl.org/heals/food/hasGlycemicIndex> "5"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
<http://purl.org/heals/ingredient/Smoky> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/heals/food/Flavor> .
<http://purl.org/heals/ingredient/Paprika> <http://www.w3.org/2004/02/skos/core#scopeNote> "used as an ingredient" .
<http://purl.org/heals/ingredient/Tart> <http://www.w3.org/2000/01/rdf-schema#label> "tart" .
<http://purl.org/heals/ingredient/Tomato> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/heals/food/Ingredient> .
<http://purl.org/heals/ingredient/AlmondMeal> <http://purl.org/dc/terms/source> "Wikipedia, \"Almond meal.\" [Online]. Available:https://en.wikipedia.org/wiki/Almond_meal. [Accessed: Nov. 10, 2018]" .
<http://purl.org/heals

In [4]:
#save the graph in ttl format
def save_graph(g, file_name, f):
    file = open("./data/"+file_name+"."+f, "x")
    file.write(g.serialize(format=f))
#     print(g.serialize(format=f))
    file.close()
g = Graph()
g.parse("./data/ingredients.rdf")

# save_graph(g, "demo_save", "ttl")


<Graph identifier=Nb44908df607444af987552cc9bbab743 (<class 'rdflib.graph.Graph'>)>

##  Merging graphs

Merging graphs can be done via sequential parsings or by the overloaded operator +

**Note:** Set-theoretic graph semantics apply

The Food knowledge graph FoodKG contains a graph of statements about ingredients, as well as a graph with statements about recipes. 

**Exercise 3**: 

1. load ./data/ingredients.rdf and ./data/ghostbusters.ttl into a single graph, either by sequential parsing or using the operator +.

2. count the number of statements in each graph, and the intersection of the two graphs. 

3. check whether the combined graph is connected (using graph.connected()) 

4. load ./data/ingredients.rdf and ./data/recipes.rdf into a single graph, either by sequential parsing or using the operator +. 

5. count the number of statements in each graph, and the intersection of the two graphs. 

6. check whether the combined graph is connected (using graph.connected()). Explain the result with respect to point 3! 

In [5]:

#look at rdflib documentation - Navigating Graphs
g = Graph()

g1 = g.parse("./data/ghostbusters.ttl")

g = Graph()

g2 = g.parse("./data/ingredients.rdf")

# print(g2.serialize(format='ttl'))

print(len(g1))

print(len(g2))

g3 = g1 + g2

print(len(g3))

print(len(g1 & g2))


g3.connected()

52337
837
53174
0


False

In [10]:
g = Graph()

g1 = g.parse("./data/recipes.rdf")

g = Graph()

g2 = g.parse("./data/ingredients.rdf")

# print(g2.serialize(format='ttl'))

print(len(g1))

print(len(g2))

g3 = g1 + g2

print(len(g3))

print(len(g1 & g2))


g1.connected()

480
837
1299
18


False

In [22]:
g = Graph()

g1 = g.parse("./data/recipes.rdf")

# g = Graph()

g2 = g.parse("./data/ingredients.rdf")

# print(g2.serialize(format='ttl'))

# print(len(g1))

# print(len(g2))

# g3 = g1 + g2

# print(len(g3))

# print(len(g1 & g2))


g2.connected()

False

## Namespaces 

Remind yourself what namespaces are. 

In RDFLib, the namespace module defines many common namespaces such as RDF, RDFS, OWL, FOAF, SKOS, etc., but you can also easily add URIs within a different namespace:


In [2]:
TEACH = Namespace("http://linkedscience.org/teach/ns#")
TEACH.Teacher

rdflib.term.URIRef('http://linkedscience.org/teach/ns#Teacher')

Check out the specification to see which other terms are used within the TEACH namespace. http://linkedscience.org/teach/ns/#sec-specification. 
You can use a NamespaceManager to bind a prefix to a namespace: 

In [3]:
g = Graph()
g.namespace_manager.bind('TEACH', URIRef('http://linkedscience.org/teach/ns#'))
TEACH.Teacher.n3(g.namespace_manager)

'TEACH:Teacher'

In [5]:
KRW = Namespace("http://krw.vu.nl/data#")

#creating individuals within your namespace
print(KRW.Teacher)
print(KRW.Student)

http://krw.vu.nl/data#Teacher
http://krw.vu.nl/data#Student


In [8]:
g = Graph()
g.namespace_manager.bind('KWR', URIRef('http://krw.vu.nl.data#'))

print(g.serialize(format='ttl'))





**Exercise 4:**
1. create your own namespace (can be made up) 


## Creating RDF triples

Triples are added to the graph with the function Graph.add()

The parameter is a triple given in a Python **tuple** (subject, predicate, object)

Notice the namespace convenience syntax!

**Exercise 5:** 

1. create a new graph and add triples (~10) within your made-up namespace using Graph.add(). These triples can be about anything, for instance ingredients or recipes. Make sure they include the predicates RDF.type, RDFS.label and RDFS.subClassOf

2. open yourRDF.ttl, and write your triples out by hand in a syntax of your choice (turtle is recommended, notice the file extension!). Load the triples here with rdflib. 

In [47]:
#create graph

g = Graph()

#example namespace

TEACH = Namespace("http://krw.vu.nl/data#")

g.namespace_manager.bind('TEACH', URIRef('http://linkedscience.org/teach/ns#'))
g.bind("teach", TEACH)
g.bind("foaf", FOAF)
TEACH.Teacher.n3(g.namespace_manager)

# Add triples using store's add method.

bob = URIRef("http://example.org/people/Bob")
name_b = Literal("Bob")
age_b = Literal(24)


linda = URIRef("http://example.org/people/Linda")  # a GUID is generated
name_l = Literal("Linda")
age_l = Literal(40)

# s_group = BNode()



KRW_course = BNode()
g.add((KRW_course, RDF.type, TEACH.Course))
g.add((KRW_course, TEACH.hasTitle, Literal("Knowledge Representation on the Web")))
g.add((KRW_course, TEACH.hasDescription, Literal("Weekly fun stuff")))

g.add((bob, RDF.type, FOAF.Person))
g.add((bob, FOAF.name, name_b))
g.add((bob, FOAF.age, age_b))
g.add((bob, FOAF.knows, linda))
g.add((linda, RDF.type, FOAF.Person))
g.add((linda, FOAF.name, name_l))
g.add((linda, FOAF.age, age_l))

g.add((linda, RDF.type, TEACH.Teacher))
g.add((bob, RDF.type, TEACH.Student))


g.add((linda, TEACH.teacherOf, KRW_course))

print(g.serialize(format='ttl'))


@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix teach: <http://krw.vu.nl/data#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/people/Bob> a teach:Student,
        foaf:Person ;
    foaf:age 24 ;
    foaf:knows <http://example.org/people/Linda> ;
    foaf:name "Bob" .

<http://example.org/people/Linda> a teach:Teacher,
        foaf:Person ;
    teach:teacherOf [ a teach:Course ;
            teach:hasDescription "Weekly fun stuff" ;
            teach:hasTitle "Knowledge Representation on the Web" ] ;
    foaf:age 40 ;
    foaf:name "Linda" .




In [48]:
#save the graph to destination in ttl format - myRDF.ttl (look at RDFLib documentation - Loading and saving RDF)
# save_graph(g, "myRDF", "ttl")
g.serialize(destination="./data/myRDF.ttl")

<Graph identifier=Nb3e512bf4c9b46609ba4062f90de7629 (<class 'rdflib.graph.Graph'>)>

In [6]:
g_new = Graph()

g_new.parse("./data/myRDF.ttl")

print(g_new.serialize(format='ttl'))
#load the saved graph and print it in ttl format

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix teach: <http://krw.vu.nl/data#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/people/Bob> a teach:Student,
        foaf:Person ;
    foaf:age 24 ;
    foaf:knows <http://example.org/people/Linda> ;
    foaf:name "Bob" .

<http://example.org/people/Linda> a teach:Teacher,
        foaf:Person ;
    teach:teacherOf [ a teach:Course ;
            teach:hasDescription "Weekly fun stuff" ;
            teach:hasTitle "Knowledge Representation on the Web" ] ;
    foaf:age 40 ;
    foaf:name "Linda" .




## Navigating graphs

rdflib uses iterators to navigate Graphs. The methods for navigating subjects, predicates and objects are Graph.subjects, Graph.predicates, Graph.objects

**Exercise 6:**

1. print all the triples in yourRDF.ttl
2. print all subjects in yourRDF.ttl
3. print all predicates in yourRDF.ttl
4. print all objects in yourRDF.ttl


In [50]:
g = Graph()

g.parse("./data/myRDF.ttl")


#TIP you have to loop in the graph 
for (s, p, o) in g:
    print(f'triplet: ({s}, {p}, {o})')
    
s, p, o = [[e[i] for e in g] for i in range(3)]

print(s)
print(p)
print(o)

triplet: (http://example.org/people/Bob, http://xmlns.com/foaf/0.1/knows, http://example.org/people/Linda)
triplet: (n144468f03ad84f4c8015d420c50c2d1ab1, http://krw.vu.nl/data#hasDescription, Weekly fun stuff)
triplet: (http://example.org/people/Linda, http://xmlns.com/foaf/0.1/name, Linda)
triplet: (n144468f03ad84f4c8015d420c50c2d1ab1, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://krw.vu.nl/data#Course)
triplet: (http://example.org/people/Bob, http://xmlns.com/foaf/0.1/age, 24)
triplet: (http://example.org/people/Linda, http://krw.vu.nl/data#teacherOf, n144468f03ad84f4c8015d420c50c2d1ab1)
triplet: (http://example.org/people/Bob, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://xmlns.com/foaf/0.1/Person)
triplet: (http://example.org/people/Linda, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://krw.vu.nl/data#Teacher)
triplet: (http://example.org/people/Bob, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://krw.vu.nl/data#Student)
triplet: (n144468f03ad84f

We can also filter the subjects, predicates and objects we want to retrieve, and match their values like in a database "join" operation


**Exercise 7:**

1. print all subject types in yourRDF.ttl
2. print all subject labels yourRDF.ttl

In [51]:
g = Graph()

g.parse("./data/myRDF.ttl")

print('\nPrinting RDF types:')
for s, p, o in g.triples((None,  RDF.type, None)):
    print(o)

print('\nPrinting RDFS labels:')
for s, p, o in g.triples((None, RDFS.label, None)):
    print(o)


Printing RDF types:
http://krw.vu.nl/data#Student
http://xmlns.com/foaf/0.1/Person
http://xmlns.com/foaf/0.1/Person
http://krw.vu.nl/data#Teacher
http://krw.vu.nl/data#Course

Printing RDFS labels:


### Basic triple matching (almost querying!)

We use method Graph.triples and a Python tuple that acts as a mask for specifying our criteria

**Exercise 8:**

1. check whether a triple is in your graph -> print true or false
2. print all triples related to a certain subject in your graph
3. print all triples related to a certain object in your graph

In [60]:
def in_graph(s, p, o):
    return s, p, o in g.triples((None, None, None))

def get_triples_per_s(s):
    return g.triples((s, None, None))

def get_triples_per_o(o):
    return g.triples((None, None, o))

for e in get_triples_per_s(URIRef("http://example.org/people/Linda")):
    print(e)

(rdflib.term.URIRef('http://example.org/people/Linda'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://krw.vu.nl/data#Teacher'))
(rdflib.term.URIRef('http://example.org/people/Linda'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
(rdflib.term.URIRef('http://example.org/people/Linda'), rdflib.term.URIRef('http://krw.vu.nl/data#teacherOf'), rdflib.term.BNode('n313fa83690e5483c9455925f6e9dfc63b1'))
(rdflib.term.URIRef('http://example.org/people/Linda'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'), rdflib.term.Literal('40', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.URIRef('http://example.org/people/Linda'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Linda'))


## Assignment part 1: your own webapplication. 

You are a chef in a restaurant, and you need to serve someone that is gluten intolerant. 

1. load the ./data/recipes.rdf and ./data/ingredients.rdf datasets in one graph
2. query your graph (as we did in previous exercises) to retrieve all recipes without gluten
3. query your graph for all recipes that you can make for your gluten intolerant guest. 
4. the guest asks you whether there are more options. Can you find the recipes for which an ingredient with gluten can be replaced, solely using pattern matching? (Hint: you need to write multiple of these pattern matching queries, and check the predicate __substitutesFor__) 
5. another guest is allergic to pecan nuts, which recipes could you serve them (including those for which pecan nuts can be replaced) 

**Note that this is a bit tedious: later on, we will be querying more complicated patterns with SPARQL!**

In [8]:
g = Graph()

g.parse("./data/recipes.rdf")
g.parse("./data/ingredients.rdf")

WTM = Namespace('http://purl.org/heals/food/')
IND = Namespace('http://purl.org/heals/ingredient/')

for ns_prefix, namespace in g.namespaces():
    print(f'{ns_prefix}: {namespace}')
print(len(g))

for o in g.predicates(unique=True):
    print(o)

#     print(f'{s}, {p}, {o}')

owl: http://www.w3.org/2002/07/owl#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#
xsd: http://www.w3.org/2001/XMLSchema#
xml: http://www.w3.org/XML/1998/namespace
wtm: http://purl.org/heals/food/
dct: http://purl.org/dc/terms/
skos: http://www.w3.org/2004/02/skos/core#
obo: http://purl.obolibrary.org/obo/
prov: http://www.w3.org/ns/prov#
ind: http://purl.org/heals/ingredient/
sm: https://www.omg.org/techprocess/ab/SpecificationMetadata/
1299
http://purl.org/dc/terms/source
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2004/02/skos/core#scopeNote
http://www.w3.org/2000/01/rdf-schema#label
http://purl.org/heals/food/hasIngredient
http://purl.org/heals/food/hasCookTime
http://purl.org/heals/food/hasGluten
http://www.w3.org/2004/02/skos/core#definition
http://purl.org/heals/food/isRecommendedForMeal
http://purl.org/heals/food/hasCookingTemperature
http://purl.org/heals/food/substitutesFor
http://purl.org/heals/food/isRecomm

In [28]:
def check_gluten(recipe):
    for _, _, ingredient in g.triples((recipe, WTM.hasIngredient, None)):
#         print(ingredient)
        if (ingredient, WTM.hasGluten, Literal(True)) in g:
            print(f'Recipe {recipe} has gluten! Ingredient: {ingredient}')
            return True
        
glutLessRecipes = [recipe for recipe, _, _ in g.triples((None, RDF.type, WTM.Recipe)) if not check_gluten(recipe)]

for r in glutLessRecipes:
    print(f'Recipe: {r}')
    print('ingedients:')
    for _, _, ingredient in g.triples((r, WTM.hasIngredient, None)):
        print(ingredient)
    print('\n')
    check_gluten(r)
    

Recipe http://purl.org/heals/ingredient/AlmondBiscotti has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Recipe http://purl.org/heals/ingredient/BananaBread has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Recipe http://purl.org/heals/ingredient/Brownies has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Recipe http://purl.org/heals/ingredient/ChickenSalad has gluten! Ingredient: http://purl.org/heals/ingredient/Mayonnaise
Recipe http://purl.org/heals/ingredient/GoldenKamutBread has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Recipe http://purl.org/heals/ingredient/KamutMuffin has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Recipe http://purl.org/heals/ingredient/KamutPancake has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Recipe http://purl.org/heals/ingredient/ThaiChicken has gluten! Ingredient: http://purl.org/heals/ingredient/ChickenBroth
R

In [31]:
g2 = Graph()

g2.parse('./data/ingredients.rdf')

for r in glutLessRecipes:
    for _, _, i in g.triples((r, WTM.hasIngredient, None)):
        if ((i, None, None)) in g2:
            print(f'{i} is in the ingedients graph!')
        else: 
            print(f'{i} is NOT in the ingedients graph!')
#     print(e[0])

http://purl.org/heals/ingredient/BlackPepper is in the ingedients graph!
http://purl.org/heals/ingredient/Chicken is in the ingedients graph!
http://purl.org/heals/ingredient/Garlic is in the ingedients graph!
http://purl.org/heals/ingredient/LemonJuice is in the ingedients graph!
http://purl.org/heals/ingredient/Salt is in the ingedients graph!
http://purl.org/heals/ingredient/Tarragon is in the ingedients graph!
http://purl.org/heals/ingredient/WholeGrainMustard is in the ingedients graph!
http://purl.org/heals/ingredient/AlmondMeal is in the ingedients graph!
http://purl.org/heals/ingredient/AppleCiderVinegar is in the ingedients graph!
http://purl.org/heals/ingredient/BakingSoda is in the ingedients graph!
http://purl.org/heals/ingredient/Banana is in the ingedients graph!
http://purl.org/heals/ingredient/Blueberry is in the ingedients graph!
http://purl.org/heals/ingredient/ChickenEgg is in the ingedients graph!
http://purl.org/heals/ingredient/Honey is in the ingedients graph!
ht

In [45]:
def findSubsitude(i):
    for sub, _, _ in g.triples((None, WTM.substitutesFor, i)):
        if (sub, WTM.hasGluten, Literal(False)) in g:
#             print(f'found suitable alternative for {i}: {sub}')
            return sub

def replacable_gluten(recipe):
    for _, _, ingredient in g.triples((recipe, WTM.hasIngredient, None)):
#         print(ingredient)
        if (ingredient, WTM.hasGluten, Literal(True)) in g:
            print(f'Recipe {recipe} has gluten! Ingredient: {ingredient}')
            print(f'Gluten free subsitute exists: {findSubsitude(ingredient)}')
            return True

adjustableRecipes = [s for s, _, _ in g.triples((None, RDF.type, WTM.Recipe)) if replacable_gluten(s)]
# for s, p, o in g.triples((None, RDF.type, WTM.Recipe)):
#     replacable_gluten(s)    
print(adjustableRecipes)
 

Recipe http://purl.org/heals/ingredient/AlmondBiscotti has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Gluten free subsitute exists: http://purl.org/heals/ingredient/GlutenFreeFlour
Recipe http://purl.org/heals/ingredient/BananaBread has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Gluten free subsitute exists: http://purl.org/heals/ingredient/GlutenFreeFlour
Recipe http://purl.org/heals/ingredient/Brownies has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Gluten free subsitute exists: http://purl.org/heals/ingredient/GlutenFreeFlour
Recipe http://purl.org/heals/ingredient/ChickenSalad has gluten! Ingredient: http://purl.org/heals/ingredient/Mayonnaise
Gluten free subsitute exists: None
Recipe http://purl.org/heals/ingredient/GoldenKamutBread has gluten! Ingredient: http://purl.org/heals/ingredient/AllPurposeFlour
Gluten free subsitute exists: http://purl.org/heals/ingredient/GlutenFreeFlour
Recipe http://purl.org/

In [62]:
import numpy as np

def findSubstitute(i):
    substitutes = [sub for sub, _, _ in g.triples((None, WTM.substitutesFor, i)) if ((sub, WTM.hasGluten, Literal(False)) in g and (sub != IND.Pecan))]
    return substitutes

def check_gluten(recipe):
    found_glutes = False
    substitutes = {}
    for _, _, ingredient in g.triples((recipe, WTM.hasIngredient, None)):
#         print(ingredient)
        if (ingredient, WTM.hasGluten, Literal(True)) in g:
            found_glutes = True
            ing_substitudes = findSubstitute(ingredient)
            if len(ing_substitudes) == 0:
                print(f'Did not find alternative for {ingredient}')
                return found_glutes, {}
            substitutes[ingredient] = ing_substitudes
#             print(f'Recipe {recipe} has gluten! Ingredient: {ingredient}')
            print(f'Gluten free subsitute(s) for {ingredient} exist(s): {substitutes[ingredient]}')
    return found_glutes, substitutes

def check_pecan(recipe):
    if (recipe, WTM.hasIngredient, IND.Pecan) in g:
        substitutes = {}
        print(f'Recipe {recipe} has pecan!')
        substitutes = {IND.Pecan: sub for sub, _, _ in g.triples((None, WTM.substitutesFor, IND.Pecan)) if ((sub, WTM.hasGluten, Literal(False)) in g)}
        print(f'Pecan free subsitute(s) exist(s): {substitutes[IND.Pecan]}')
        return True, substitutes
    else:
        return False, {}
     
def check_recipe(recipe):
    has_gluten, gluten_subs = check_gluten(recipe)
    has_pecan, pecan_subs = check_pecan(recipe)
    
    if (has_gluten and len(gluten_subs) == 0) or (has_pecan and len(pecan_subs) == 0):
        return False, {}, {}
    else:
        return True, gluten_subs, pecan_subs


for s, _, _ in g.triples((None, RDF.type, WTM.Recipe)):
    print(f'checking recipe {s}')
    fit, glut_alt, pec_alt = check_recipe(s)
    print(f'Fits diet restrictions: {fit}')
    if fit:
        if len(glut_alt)!=0:
            print(f'Switch these ingredients: {glut_alt.items()}')
            
        if len(pec_alt)!=0:
            print(f'Switch these ingredients: {pec_alt.items()}')
    print('\n')
    
            
# #         if (sub, WTM.hasGluten, Literal(True)):
#     for _, _, ingredient in g.triples((recipe, WTM.hasIngredient, None)):
# #         print(ingredient)
#         if (ingredient, WTM.hasGluten, Literal(True)) in g:
#             print(f'Recipe {recipe} has gluten! Ingredient: {ingredient}')
#             print(f'Gluten free subsitute exists: {findSubsitute(ingredient)}')
#             return True

checking recipe http://purl.org/heals/ingredient/AlmondBiscotti
Gluten free subsitute(s) for http://purl.org/heals/ingredient/AllPurposeFlour exist(s): [rdflib.term.URIRef('http://purl.org/heals/ingredient/GlutenFreeFlour')]
Fits diet restrictions: True
Switch these ingredients: dict_items([(rdflib.term.URIRef('http://purl.org/heals/ingredient/AllPurposeFlour'), [rdflib.term.URIRef('http://purl.org/heals/ingredient/GlutenFreeFlour')])])


checking recipe http://purl.org/heals/ingredient/BakedChickenTender
Fits diet restrictions: True


checking recipe http://purl.org/heals/ingredient/BananaBlueberryAlmondFlourMuffin
Fits diet restrictions: True


checking recipe http://purl.org/heals/ingredient/BananaBread
Gluten free subsitute(s) for http://purl.org/heals/ingredient/AllPurposeFlour exist(s): [rdflib.term.URIRef('http://purl.org/heals/ingredient/GlutenFreeFlour')]
Fits diet restrictions: True
Switch these ingredients: dict_items([(rdflib.term.URIRef('http://purl.org/heals/ingredient/Al