# Tensile Test Data Annotation Pipeline

This notebook demonstrates how the script takes in a dataset input in JSON format, uses the TTO vocabulary to annotate the data in RDF triples with help from the RDFLib library, and finally serializes the annotated data into Turtle and JSON-LD formats.

First, import the necessary packages.
From rdflib we want to import graph which stores the RDF triples, namespace which allows us to define prefixes, and literal which helps with value formatting. We also want to import RDF, for standard datatypes, and XSD for when we want to serialize datatypes differently.

In [1]:
import os, json
from rdflib import Graph, Namespace, Literal
from rdflib.namespace import RDF, XSD

Here, we set the input and output directory path names. We should specify where to pull the input data from. The input data for this script should be in JSON format. Then, we specify where we want the annotated output datasets to go. We create two sets of outputs, one serialized with the Turtle format, and the other with the JSON-LD format. These two sets of annotated data go to separate directories as specified.

In [2]:
input_dir = "../data/FAIRtrain_data_json"
output_dir_jsonld = "../output/annotatedBy_TTO/jsonld_output"
output_dir_ttl = "../output/annotatedBy_TTO/ttl_output"

Now, we loop through each JSON file in the input directory. We want to create a new graph to store the RDF triples we will create for each file. Then, we create the prefixes for our namespaces, which just helps make it easier to use throughout the script.

Next, we begin by making a list of the rows in the input file we are currently iterating over. We want to first create the necessary URIs to identify this specific test. We do so by first creating a URI for the tensile test itself using the sample id, then for the test piece, and finally for the machine group it belongs to. Using these URIs we then declare the test, test piece, and test machine by adding them to the graph we previously created as an RDF triple. Additionally, we establish relationships between the test piece and machine to the tensile test.

It's important to note that since this script uses vocabulary from the TTO ontology, we are limited to using the "relatesTo" property to link concepts together. This will be prevalent throughout the rest of the script.

Following this, we see that we are annotating the measured properties by creating a node or instance of it unique to this test, annotating it's type corresponding to the vocabulary we want to use from the TTO ontology, adding in the numeric value we read from the input while using the "numeric value" property from the QUDT ontology, and then finally relating it to the test/test piece to show the linkage.

For the force and elongation pairs, since we are limited in our relationship vocabulary, the script creates each value as a node and then links a force and elongation node together to indicate that they are corresponding with each other.

Finally, we serialize all this annotated data in turtle and json-ld and output to the specified directories.

In [3]:
for filename in os.listdir(input_dir):
    if filename.endswith(".json"):
        filepath = os.path.join(input_dir, filename)

        g = Graph()  #Empty graph to store triples

        TTO = Namespace("https://materialdigital.github.io/application-ontologies/tto/#/")
        g.bind("tto", TTO)
        QUDT = Namespace("http://qudt.org/schema/qudt/")
        g.bind("qudt", QUDT)

        '''
        Need to update namespace below to our domain
        '''
        EX = Namespace("http://example.org/tensile/")  #This is an example, its gna be what builds the uri for our subject to annotate.
        g.bind("ex", EX)

        with open(filepath) as f:
            data = json.load(f)

            sample_id = data["sample_id"]
            test_uri = EX[sample_id]
            test_piece_id = sample_id.split("_")[2]
            test_piece_uri = EX["testPiece_" + test_piece_id]
            machine_uri = EX[data["sample_id"][:6]]

            g.add((test_uri, RDF.type, TTO.TensileTest))
            g.add((test_piece_uri, RDF.type, TTO.TestPiece))
            g.add((machine_uri, RDF.type, TTO.TensileTestingMachine))

            g.add((test_uri, TTO.relatesTo, test_piece_uri))
            g.add((test_uri, TTO.relatesTo, machine_uri))

            #Original width
            #Creating a node to represent the concept of width for our data
            width_node = EX[f"width"]
            #Saying this node (subject) is a (predicate) object of this type (object)
            g.add((width_node, RDF.type, TTO.OriginalWidth))
            #Saying this node (subject) has numeric value (predicate) of this float (object)
            g.add((width_node, QUDT.numericValue, Literal(data["width"], datatype=XSD.float)))
            #Saying this node (subject) is related to (predicate) the test piece (object)
            g.add((width_node, TTO.relatesTo, test_piece_uri))

            #Original thickness
            thickness_node = EX[f"thickness"]
            g.add((thickness_node, RDF.type, TTO.OriginalThickness))
            g.add((thickness_node, QUDT.numericValue, Literal(data["thickness"], datatype=XSD.float)))
            g.add((thickness_node, TTO.relatesTo, test_piece_uri))

            #Gauge length
            length_node = EX[f"length"]
            g.add((length_node, RDF.type, TTO.OriginalGaugeLength))
            g.add((length_node, QUDT.numericValue, Literal(data["length"], datatype=XSD.float)))
            g.add((length_node, TTO.relatesTo, test_piece_uri))

            #Youngs modulus / slope of the elastic part
            youngs_mod_node = EX[f"youngs_modulus"]
            g.add((youngs_mod_node, RDF.type, TTO.SlopeOfTheElasticPart))
            g.add((youngs_mod_node, QUDT.numericValue, Literal(data["extracted_properties"]["youngs_modulus"], datatype=XSD.float)))
            g.add((youngs_mod_node, TTO.relatesTo, test_piece_uri))

            #Ultimate tensile strength / upper yield strength
            ult_tensile_strength = EX[f"ultimate_tensile_strength"]
            g.add((ult_tensile_strength, RDF.type, TTO.UpperYieldStrength))
            g.add((ult_tensile_strength, QUDT.numericValue, Literal(data["extracted_properties"]["ultimate_tensile_strength"], datatype=XSD.float)))
            g.add((ult_tensile_strength, TTO.relatesTo, test_piece_uri))

            for i, point in enumerate(data["data"]):
                #Force
                force_node = EX[f"pair_{i}_force"]
                g.add((force_node, RDF.type, TTO.Force))
                g.add((force_node, QUDT.numericValue, Literal(point["N"], datatype=XSD.float)))
                g.add((force_node, TTO.relatesTo, test_piece_uri))

                #Elongation
                elong_node = EX[f"pair_{i}_elongation"]
                g.add((elong_node, RDF.type, TTO.Elongation))
                g.add((elong_node, QUDT.numericValue, Literal(point["mm"], datatype=XSD.float)))
                g.add((elong_node, TTO.relatesTo, test_piece_uri))

                #Pairing the elongation and force
                g.add((elong_node, TTO.relatesTo, force_node))
                g.add((force_node, TTO.relatesTo, elong_node))

            output_base_jsonld = os.path.join(output_dir_jsonld, f"annotated_{sample_id}")
            output_base_ttl = os.path.join(output_dir_ttl, f"annotated_{sample_id}")

            #Serializing in jsonld and ttl
            g.serialize(output_base_jsonld + ".jsonld", format="json-ld")
            g.serialize(output_base_ttl + ".ttl", format="turtle")