# Quality Assessment of Food Knowledge Graph

The goal is to craft rules to identify at least 10 different data quality issues across multiple dimensions including **semantic accuracy**, **consistency**, **completeness**, **conciseness**.

## Create the SHACL shapes

SHACL shapes define constraints on the structure and content of RDF data. These shapes will be used to validate the quality of the knowledge graph in `food_kg.ttl`. The goal is to identify different data quality issues that fall under semantic accuracy, consistency, completeness and conciseness.

In [52]:
# Import required libraries
import rdflib
from rdflib import Graph, Literal, Namespace, BNode
from rdflib.collection import Collection
from pyshacl import validate, shapes_graph

# Define namespaces
OWL = Namespace('http://www.w3.org/2002/07/owl#')
RDF = Namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#')
RDFS = Namespace('http://www.w3.org/2000/01/rdf-schema#')
XSD = Namespace('http://www.w3.org/2001/XMLSchema#')
SCHEMA = Namespace('https://schema.org/')
DBO = Namespace('http://dbpedia.org/ontology/')
DBR = Namespace('http://dbpedia.org/resource/')
SH = Namespace('http://www.w3.org/ns/shacl#')
KGS = Namespace('http://kg-course.io/food-nutrition/')

# Parse the ontology graph
ontology_g = Graph().parse('food_kg_ontology.ttl', format='ttl' )

# Create a new graph for SHACL shapes
shapes_g = Graph()
shapes_g.bind('owl', OWL)
shapes_g.bind('rdf', RDF)
shapes_g.bind('rdfs', RDFS)
shapes_g.bind('xsd', XSD)
shapes_g.bind('schema', SCHEMA)
shapes_g.bind('dbo', DBO)
shapes_g.bind('dbr', DBR)
shapes_g.bind('sh', SH)
shapes_g.bind('kgs', KGS)

def get_name_from_uri(uri) -> str:
    return uri.split('#')[-1] if '#' in uri else uri.split('/')[-1]

# Procedure to create SHACL shapes for classes from an ontology graph
def create_class_shape(ontology_graph: Graph, shapes_graph: Graph):
    for cls in ontology_graph.subjects(RDF.type, OWL.Class):
        shape_name = get_name_from_uri(cls) + "Shape"
        shapes_graph.add((KGS[shape_name], RDF.type, SH.NodeShape))
        shapes_graph.add((KGS[shape_name], SH.targetClass, cls))

# Procedure to create SHACL properties from an ontology graph
def create_property_shape(ontology_graph: Graph, shapes_graph: Graph):
    for prop in ontology_graph.subjects(RDF.type, [OWL.ObjectProperty, OWL.DatatypeProperty]):
        # Get the domain and range of the property
        domain = ontology_graph.value(prop, RDFS.domain)
        range = ontology_graph.value(prop, RDFS.range)
        # Get the names of the domain and range to creaet the shape name
        node_name = get_name_from_uri(domain) + "Shape"
        shape_name = node_name + f"_{get_name_from_uri(prop)}"
        # Add the property shape to the graph
        shapes_graph.add((KGS[node_name], SH.property, KGS[shape_name]))
        shapes_graph.add((KGS[shape_name], RDF.type, SH.PropertyShape))
        shapes_graph.add((KGS[shape_name], SH.path, prop))
        if get_name_from_uri(range) == 'string':
            shapes_graph.add((KGS[shape_name], SH.datatype, RDF.langString))
            list_node = BNode()
            Collection(shapes_graph, list_node, [Literal('en')])
            shapes_graph.add((KGS[shape_name], SH.languageIn, list_node))
        else:
            shapes_graph.add((KGS[shape_name], SH.datatype, range))
        shapes_graph.add((KGS[shape_name], SH.minCount, Literal(1)))
    # print(shapes_graph.serialize(format='turtle'))

create_class_shape(ontology_g, shapes_g)
create_property_shape(ontology_g, shapes_g)
shapes_g.serialize(destination='food_kg_shapes.ttl', format='ttl')
print(shapes_g.serialize(format='turtle'))

@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix kgs: <http://kg-course.io/food-nutrition/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <https://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kgs:NutritionInformationShape a sh:NodeShape ;
    sh:property kgs:NutritionInformationShape_calories,
        kgs:NutritionInformationShape_carbohydrateContent,
        kgs:NutritionInformationShape_cholesterolContent,
        kgs:NutritionInformationShape_fatContent,
        kgs:NutritionInformationShape_fiberContent,
        kgs:NutritionInformationShape_proteinContent,
        kgs:NutritionInformationShape_saturatedFatContent,
        kgs:NutritionInformationShape_sodiumContent,
        kgs:NutritionInformationShape_sugarContent ;
    sh:targetClass schema:NutritionInformation .

kgs:NutritionShape a sh:NodeShape ;
    sh:targetClass dbo:Nutrition .

kgs:RecipeShape a sh:NodeShape ;
    sh:p

## Validate the knowledge graph using the created SHACL shapes
Now that we have created the SHACL shapes based on the ontology, we can use these shapes to validate the `food_kg.ttl` knowledge graph. The validation will check for any violations of the constraints defined in the SHACL shapes, such as missing properties, incorrect data types, or structural issues.

In [56]:
conforms, v_graph, v_text = validate(data_graph='food_kg.ttl', shacl_graph='food_kg_shapes.ttl', inference='rdfs', debug=False, serialize_report_graph=True)

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse_xsd_duration at 0x10c7639c0>
Traceback (most recent call last):
  File "/Users/alexanderleonidas/Maastricht University/AI Masters/KEN4256 Building and Mining Knowledge Graphs/fluffy_octo_spoon/venv/lib/python3.13/site-packages/rdflib/term.py", line 2262, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/Users/alexanderleonidas/Maastricht University/AI Masters/KEN4256 Building and Mining Knowledge Graphs/fluffy_octo_spoon/venv/lib/python3.13/site-packages/rdflib/xsd_datetime.py", line 427, in parse_xsd_duration
    raise ValueError("Unable to parse duration string " + dur_string)
ValueError: Unable to parse duration string nan
Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse_xsd_duration at 0x10c7639c0>
Traceback (most recent call last):
  File "/

In [57]:
print(conforms)

False


In [41]:
print(v_graph)

b'@prefix kgs: <http://kg-course.io/food-nutrition/> .\n@prefix schema: <https://schema.org/> .\n@prefix sh: <http://www.w3.org/ns/shacl#> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\n[] a sh:ValidationReport ;\n    sh:conforms false ;\n    sh:result [ a sh:ValidationResult ;\n            sh:focusNode <http://kg-course/food-nutrition/recipe/57879> ;\n            sh:resultMessage "Value is not Literal with datatype xsd:string" ;\n            sh:resultPath schema:recipeInstructions ;\n            sh:resultSeverity sh:Violation ;\n            sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;\n            sh:sourceShape kgs:RecipeShape_recipeInstructions ;\n            sh:value "Cream butter and slowly beat in sugar."@en ],\n        [ a sh:ValidationResult ;\n            sh:focusNode <http://kg-course/food-nutrition/recipe/57856> ;\n            sh:resultMessage "Value is not Literal with datatype xsd:string" ;\n            sh:resultPath schema:recipeInstructions ;\n

In [58]:
print(v_text)

Validation Report
Conforms: False
Results (104):
Constraint Violation in DatatypeConstraintComponent (http://www.w3.org/ns/shacl#DatatypeConstraintComponent):
	Severity: sh:Violation
	Source Shape: kgs:NutritionInformationShape_calories
	Focus Node: <http://kg-course/food-nutrition/recipe/49/nutrition>
	Value Node: Literal("6.277e+02")
	Result Path: schema:calories
	Message: Value is not Literal with datatype xsd:double
Constraint Violation in DatatypeConstraintComponent (http://www.w3.org/ns/shacl#DatatypeConstraintComponent):
	Severity: sh:Violation
	Source Shape: kgs:NutritionInformationShape_fiberContent
	Focus Node: <http://kg-course/food-nutrition/recipe/41/nutrition>
	Value Node: Literal("-0.2", datatype=xsd:decimal)
	Result Path: schema:fiberContent
	Message: Value is not Literal with datatype xsd:double
Constraint Violation in DatatypeConstraintComponent (http://www.w3.org/ns/shacl#DatatypeConstraintComponent):
	Severity: sh:Violation
	Source Shape: kgs:NutritionInformationSha