# Blazegraph Database

[Blazegraph DB](https://blazegraph.com/) is an open-source triplestore database supporting Blueprints and RDF/SPARQL APIs. It supports up to 50 Billion edges on a single machine and has since been commercialized as [AWS Neptune](https://aws.amazon.com/neptune/)

Docker images preloaded with IEEE test feeders are available from https://hub.docker.com/r/gridappsd/blazegraph/tags.


## About Triplestore Databases

The triple-store database offers a semantic solution for data management. Unlike the relational database solution (which requires a DDL database schema), the triple-store database structure is comprised of resource descriptive framework (RDF) statements. These RDF statements take the form of subject (node), predicate (relation), object (node) that can be dynamically generated to form inter-related complex class structures. RDF Schema (RDFS) supports polymorphism concepts, provides a means to create graph constructs, and supports the concept of subgraphs so that partitioning of the graph space is possible. RDF can be specified in a variety of ways including XML, TTL, and JSON-LD. Other languages, such as the Web Ontology Language (OWL), can be used to offer more sophisticated constraints and additional constructs for reasoning and inferencing.

The RDF statement structure intuitively corresponds to the structure of object-attribute specifications used in CIM (e.g. `ACLineSegment` (subject) has an attribute `length` (predicate) with value `105 meters` (object)). The CIM supports translation of data structures directly into RDF with the ability to automate correlation of data. The CIMTool software supports translation of a CIM profile into RDFS and OWL. After ingesting the power system network model data and the RDFS / OWL structures from the data profile, the user can use a triple-store databased to correlate well-defined CIM structures with newly generated data automatically. 

The main advantage of using a triple-store database is that it directly supports CIM development.  It is highly agile and supports dynamic data structures generated by multiple developers. It is also supported by a standardized, mature language for specifying directed graph and hierarchical class structures as well as a standardized way to manage RDF data. RDF supports both type-checking and the Shapes Constraints Language (SHACL) for structure validation. The main disadvantage of triple-store is the risk for garbage-in-garbage-out (GIGO), meaning that without a well-thought-out strategy to support data management activities, maintenance can be difficult. Likewise, without a certain rigor for data contributors, dangling references or conflicts can appear in the database.

## Blazegraph Environment Variables

If using in conjunction with GridAPPS-D or if using one of the docker images from the GridAPPS-D dockerhub, the following default variables are recommended.

For GridAPPS-D / Blazegraph tags between `v2021.01.0` and `v2024.09.0`, the following environment variables are recommended:

In [1]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'rc4_2021'
os.environ['CIMG_URL'] = 'http://localhost:8889/bigdata/namespace/kb/sparql'
os.environ['CIMG_IEC61970_301'] = '7'

For GridAPPS-D / Blazegraph tags `v2025.01.0` and later, the following environment variables are recommended:

In [2]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
os.environ['CIMG_URL'] = 'http://localhost:8889/bigdata/namespace/kb/sparql'
os.environ['CIMG_IEC61970_301'] = '8'

The BlazegraphConnection class then uses the .get_[var_name] methods to retrieve the environment variables

In [3]:
from cimgraph.databases import get_cim_profile
cim_profile, cim = get_cim_profile()

----

## Blazegraph Database Connection

The `BlazegraphConnection` class provides the interface to the database and all core methods needed to query and update the database. It is a specialization of the abstract ConnectionInterface class in CIM-Graph.

The class can be imported from the `cimgraph.databases` module as shown below

In [4]:
# Import class from cimgraph databases module
from cimgraph.databases import BlazegraphConnection
# Create a new connection to the database
database = BlazegraphConnection()

----

## API Reference

The `BlazegraphConnection` class offers the following methods:

### Public Methods for Users

The following methods are intended for users who wish to access and retrieve objects prior to creating a GraphModel representation of the full power system.

* `.get_object()` -- Retrieves a CIM object based on its mRID

* `.get_from_triple` -- Retrieves a list of objects/strings based on an RDF triple



Internal methods for database access and queries:

* `.connect()` -- Connects to database endpoint

* `.execute()` -- Executes SPARQL query

* `.update()` -- Runs update statement such as drop all

* `.disconnect()` -- Disconnects from database endpoint

* `.create_new_graph()` -- Creates network graph from an EquipmentContainer object

* `.create_distributed_graph()` -- Creates graph of a switch-delimited topological area

* `.get_all_edges()` -- Parallel execution of get_all_edges queries

* `.get_all_attributes()` -- Runs get_all_edges without creating new objects

* `.parse_node_query()` -- Processes query response for .create_new_graph

* `.edge_query_parser()` -- Processes query response for .get_all_edges

---- 

### get_object()

The `get_object()` method is used to retrieve an object from the Blazegraph database using its mRID. 

The arguments are

* `mRID` (str): The mRID of the object to be retrieved.

* `graph` (dict[type, uuid]): Optional -- An existing graph to which the object should be added

The return values are

* `object`: A CIM object of the correct type with the requested mRID.

* `graph`: A graph dictionary with the object added

##### Example 1

The example below shows how to retrieve a Feeder object using its mRID:

In [None]:
feeder = database.get_object(mRID="49AD8E07-3BF9-A4E2-CB8F-C3722F837B62") #ieee 13 bus
feeder.pprint()

{
    "@id": "49ad8e07-3bf9-a4e2-cb8f-c3722f837b62",
    "@type": "Feeder"
}


----

### get_from_triple()

The `.get_from_triple()` method is used to retrieve the object completing a subject-predict-object triple based on a source object and the full CIM property string. 

The arguments are

* `subject` (object): A CIM object instance created using CIM-Graph

* `predicate` (str): A CIM RDF property string, such as `IdentifiedObject.name`

The return value is a list of the retrieved object(s) or property value(s).


##### Example 1

The first example below shows how to retrieve the name of the feeder retrieved in the previous example. As `name` is a string attribute, the return value is a list of string.

In [6]:
name = database.get_from_triple(subject=feeder, predicate='IdentifiedObject.name')
print(name)

['ieee13nodeckt']


##### Example 2

The second example below shows how to retrieve the substation associated with the same feeder. In this case, the return value is a list of the dataclass instance of type `cim.Substation`.

In [7]:
substation = database.get_from_triple(subject=feeder, predicate='Feeder.NormalEnergizingSubstation')
print(substation)

[{"@id": "6c62c905-6fc7-653d-9f1e-1340f974a587", "@type": "Substation"}]


----

### connect()

When the `BlazegraphConnection` object is instantiated, it will automatically attempt to connect to the database. The `.connect()` creates a new SPARQL wrapper for executing queries using the specified database url.

It does not take any input arguments or return any values:

In [8]:
database.connect()

----

### execute()

The `.execute()` method is used to execute all SPARQL queries developed by the user or auto-generated by CIM-Graph. The input argument is a text string with the full query. Addition single `\` or double `\\` escape characters are typically needed for all special characters (e.g. quotation marks) in the query text.

The input argument is 

* `query_message`(str): A string with the SPARQL query text

The output argument is a python dictionary with the column headers contained in `query_output['head']['vars']` and the row values in `query_output['results']['bindings']`.


##### Example 1:

The example below shows a simple SPARQL query and the query response

In [9]:
query_text = '''
PREFIX r:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cim:  <http://iec.ch/TC57/CIM100#>
SELECT DISTINCT ?identifier ?obj_class
WHERE {
VALUES ?identifier {"49AD8E07-3BF9-A4E2-CB8F-C3722F837B62"}
bind(iri(concat("urn:uuid:", ?identifier)) as ?eq)
?eq a ?classraw.
bind(strafter(str(?classraw),"http://iec.ch/TC57/CIM100#") as ?obj_class)
}
ORDER by  ?identifier
'''

query_response = database.execute(query_text)
query_response

{'head': {'vars': ['identifier', 'obj_class']},
 'results': {'bindings': [{'identifier': {'type': 'literal',
     'value': '49AD8E07-3BF9-A4E2-CB8F-C3722F837B62'},
    'obj_class': {'type': 'literal', 'value': 'Feeder'}}]}}

A simple example of how to parse the query results is shown below

In [10]:
headers = query_response['head']['vars']
print(headers)
for row in query_response['results']['bindings']:
    value0 = row[headers[0]]['value']
    value1 = row[headers[1]]['value']
    print(value0, value1)

['identifier', 'obj_class']
49AD8E07-3BF9-A4E2-CB8F-C3722F837B62 Feeder


----

### update()

The `.update()` method executes an blazegraph update message, such as `drop all`. It is also invoked by the `.upload()` method for adding new CIM objects to the database.

The arguments of the method are

* `update_message` (string): The SPARQL message or database routine

The method returns a string with the status message from the database



----

### create_new_graph()

The `.create_new_graph()` method creates a new graph structure for a CIM EquipmentContainer object. The method uses a SPARQL query to obtain all terminals in the graph, along with all nodes and conducting equipment associated with each terminal. This forms the baseline knowledge graph for the GraphModel. If a graph is specified, the new objects will be added to the existing graph. Otherwise, a new graph will be created from scratch.

The arguments of the method are

* container (object): The container object for which the graph is created.

* graph (dict, optional): Graph of CIM objects, grouped by class and UUID.

The method returns a Graph dict consisting of types and UUID mapped to object instances.


----

### create_distributed_graph()

The `.create_distributed_graph()` method creates a new graph structure for a CIM EquipmentContainer object. The method uses a SPARQL query to obtain all terminals in the graph, along with all nodes and conducting equipment associated with each terminal. This forms the baseline knowledge graph for the GraphModel. If a graph is specified, the new objects will be added to the existing graph. Otherwise, a new graph will be created from scratch.

The arguments of the method are

* container (object): The container object for which the graph is created.

* graph (dict, optional): Graph of CIM objects, grouped by class and UUID.

The method returns a Graph dict consisting of types and UUID mapped to object instances.




----

### get_all_edges()

The `.get_all_edges()` method is the core library method that enables the flexibility of CIM-Graph to query for CIM objects of any class and build the knowledge graph without custom queries.

The arguments of the method are

* graph (dict): Graph of CIM objects, grouped by class and UUID.

* cim_class (type): The CIM class for which to retrieve edges (e.g. `cim.ACLineSegment`)

The method does not return any values.


In [11]:
database.get_all_edges(graph, cim.Feeder)

----

### get_edges_query()

This is a debugging method that can be used to obtain the query text passed to the database endpoint. The query text can then be copied and pasted into the database GUI for error-checking.

The arguments of the method are

* graph (dict): Graph of CIM objects, grouped by class and UUID.

* cim_class (type): The CIM class for which to retrieve edges (e.g. `cim.ACLineSegment`)

The method returns a string with the SPARQL query text.

##### Example 1

The example below shows how the SPARQL string can be retrieved for debugging. The text can then be copied and pasted into the Blazegraph GUI query window.

In [12]:
query_message = database.get_edges_query(graph, cim.Feeder)
print(query_message)


        PREFIX r:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX cim:  <http://iec.ch/TC57/CIM100#>
        SELECT DISTINCT ?identifier ?attribute ?value ?edge
        WHERE {
          
    VALUES ?identifier { "49AD8E07-3BF9-A4E2-CB8F-C3722F837B62" 
               }
        bind(iri(concat("urn:uuid:", ?identifier)) as ?eq)

        ?eq r:type cim:Feeder.

        {?eq (cim:|!cim:) ?val.
         ?eq ?attr ?val.}
        UNION
        {?val (cim:|!cim:) ?eq.
         ?val ?attr ?eq.}

        {bind(strafter(str(?attr),"#") as ?attribute)}
        {bind(strafter(str(?val),"urn:uuid:") as ?uri)}
        {bind(if(?uri = "", ?val, ?uri) as ?value)}

        OPTIONAL {?val a ?classraw.
                  bind(strafter(str(?classraw),"http://iec.ch/TC57/CIM100#") as ?edge_class)
                  {bind(strafter(str(?val),"urn:uuid:") as ?uri)}

                  bind(concat("{\"@id\":\"", ?uri,"\",\"@type\":\"", ?edge_class, "\"}") as ?edge)}
        }

        ORDER by  ?id

----

### edge_query_parser()

The `.edge_query_parser()` method is used to convert JSON-LD formatted query responses into a set of Equipment, ConnectivityNode, Terminal, and Measurement objects.

The arguments of the method are

* `graph` (dict, optional): Graph of CIM objects, grouped by class and UUID.

* `query_output` (dict): JSON-LD formatted query output to be parsed

* cim_class (type): The CIM class for which edges are to be created (e.g. `cim.ACLineSegment`)

* `expand_graph` (bool): Used by get_all_attributes so as not to expand graph with more nodes, defaults to True

The method returns an updated graph dictionary


----

## UML Sequence Diagrams

This section contains UML sequence diagrams explaining how the CIMantic Graphs library executes database queries and API calls. The UML diagrams on this page are rendered from flat-text using mermaid.js, which can be imported using

In [13]:
from mermaid import Mermaid

### get_object()

A UML sequence diagram depicting how the BlazegraphConnection `get_object()` method retrieves an object using its mRID is shown below

In [14]:
with open('./images/3_3_get_object.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### get_triple()

A UML sequence diagram depicting how the BlazegraphConnection `get_from_triple()` method retrieves an object using the RDF triple is shown below

In [15]:
with open('./images/3_3_get_from_triple.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### connect()

A UML sequence diagram for how the BlazegraphConnection class is initialized and connects to the database is shown below

In [16]:
with open('./images/3_3_connect.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### execute()

The diagram below illustrates how a block of SPARQL query text is executed:

In [17]:
from mermaid import Mermaid
with open('./images/3_3_execute.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### update()

The diagram below illustrates how a SPARQL update statement is executed 

In [18]:
with open('./images/3_3_update.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### create_new_graph()

The `.create_new_graph()` method is invoked when the user creates a new graph model, as shown in the UML sequence diagram below

In [19]:
with open('./images/3_3_create_new_graph.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### create_distributed_graph()

The `.create_distributed_graph()` method is invoked when the user creates a new graph model with the `distributed` flag set to True, as shown in the UML sequence diagram below

In [20]:
with open('./images/3_3_create_distributed_graph.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### get_all_edges()

The UML sequence diagram below shows how the `.get_all_edges()` method is invoked by the higher-level GraphModel method of the same name by a user.

For improved performance, CIM-Graph uses parallel processing with sets of 100 objects queried for in each batch. The execution workflow is shown below

In [21]:
with open('./images/3_3_get_all_edges.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### get_edges_query()

The diagram below illustrates how a SPARQL query is retrieved and returned to the user for debugging 

In [22]:
with open('./images/3_3_get_edges_query.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

-----

### parse_node_query()

The `.parse_node_query()` method is invoked as part of initialization of a new GraphModel as shown below

In [23]:
with open('./images/3_3_parse_node_query.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

### edge_query_parser()

The `.edge_query_parser()` method is used to convert JSON-LD formatted query responses into a set of Equipment, ConnectivityNode, Terminal, and Measurement objects. The method is invoked as part of the get_all_edges() and get_all_attributes() methods

In [24]:
with open('./images/3_3_parse_edges_query.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

## Blazegraph Connection Parameters (Deprecated)

Older versions of CIM-Graph used the `ConnectionParameters` class to authenticate with database as shown below. This method is now deprecated and will throw an error.

In [25]:
from cimgraph.databases import ConnectionParameters
# Create connection parameters
params = ConnectionParameters(url = "http://localhost:8889/bigdata/namespace/kb/sparql",
                              cim_profile='rc4_2021', iec61970_301=8)
# Create database connection object
blazegraph = BlazegraphConnection(params)


ConnectionParameters class is deprecated and will be deleted in a future release
Set environment variables for required authentication


TypeError: BlazegraphConnection.__init__() takes 1 positional argument but 2 were given

----