# Generate Mock Graph


* author: Nikola Vasiljevic
* date: 2023-04-03


In this notebook we will demonstrate how you can use NEAT to generate mock graph based on your data model.

First we need to import all the necessary libraries:

In [19]:
from pathlib import Path
from cognite.neat.core import loader, parser, extractors
from cognite.neat.core.utils import remove_namespace
from cognite.neat.core.mocks.graph import generate_triples, add_triples

%reload_ext autoreload
%autoreload 2


Since we already have an example data model, we will use it to generate mock graph.

Here we setting path to the transformation rules which contain data model definition and parsing data model in corresponding form:

In [2]:
ROOT = Path().resolve().parent.parent.parent
TRANSFORMATION_RULES = ROOT / "cognite" / "neat" / "examples" / "rules" / "Rules-Nordic44-to-TNT.xlsx"

In [3]:
raw_sheets = loader.rules.excel_file_to_table_by_name(TRANSFORMATION_RULES)
data_model = parser.parse_transformation_rules(raw_sheets)

Let's now take a look and see how many defined classes we have:

In [11]:
data_model.get_defined_classes()

{'GeographicalRegion',
 'Orphanage',
 'RootCIMNode',
 'SubGeographicalRegion',
 'Substation',
 'Terminal'}

Let's now inspect properties related to one of the classes:

In [12]:
data_model.to_dataframe()['Substation']

Unnamed: 0,property_type,value_type,min_count,max_count
IdentifiedObject.mRID,DatatypeProperty,string,1,1.0
IdentifiedObject.name,DatatypeProperty,string,1,1.0
Substation.Region,ObjectProperty,SubGeographicalRegion,1,1.0
Substation.Terminal,ObjectProperty,Terminal,1,


Let's now configure desired number of instances per each of the above classes except for Orphanage as it is a special class that NEATs expects in specific form, which if not satisfied it would be automatically created by NEAT. We will store desired number of instances in a dictionary which we will call `class_count`:

In [13]:
class_count = {"RootCIMNode":1, 
               "GeographicalRegion":5, 
               "SubGeographicalRegion":10, 
               "Substation": 20, 
               "Terminal": 60}

To generate mock graph we will first create an empty graph to which we will triples that will represent our mock graph:

In [15]:
graph_store = loader.NeatGraphStore(prefixes=data_model.prefixes, 
                                    namespace=data_model.metadata.namespace)
graph_store.init_graph(base_prefix=data_model.metadata.prefix)

First we will create triples and then will added them to the graph. We will start with creating triples for the classes:

In [16]:
mock_triples = generate_triples(data_model, class_count)
add_triples(graph_store, mock_triples)

After successfully creating and adding mock triples let's now take a look at the graph and see if we have expected number of class instances.

Here we are executing SPARQL query to count all the class instances:
```
SELECT ?class (count(?s) as ?instances ) WHERE { ?s a ?class . } group by ?class order by DESC(?instances)
```

and later on when processing results we are purposely removing namespaces from the class names:

In [17]:
for res in list(graph_store.graph.query("SELECT ?class (count(?s) as ?instances ) WHERE { ?s a ?class . } group by ?class order by DESC(?instances)")):
    print(f"{remove_namespace(res[0]):25} {res[1]}" )

Terminal                  60
Substation                20
SubGeographicalRegion     10
GeographicalRegion        5
RootCIMNode               1


Let's now parse this graph into assets and relationships and see if we have expected number of instances per each class.
Do not be alarmed with warnings and errors that NEAT is throwing due to missing Orphanage, it will automatically create it for us:

In [21]:
assets = extractors.rdf_to_assets.rdf2assets(graph_store, data_model)

ERROR:root:Error while loading instances of class <http://purl.org/cognite/tnt#Orphanage> into cache. Reason: 'instance'


Orphanage with external id orphanage not found in asset hierarchy!


In [22]:
count_assets = {}
for asset in assets.values():
    if asset["metadata"]["type"] not in count_assets:
        count_assets[asset["metadata"]["type"]] = 1
    else:
        count_assets[asset["metadata"]["type"]] += 1
        
count_assets

{'RootCIMNode': 1,
 'GeographicalRegion': 5,
 'SubGeographicalRegion': 10,
 'Substation': 20,
 'Terminal': 60,
 'Orphanage': 1}

Let's now create relationships and check if we have expected number of relationships, which is:
- 60 relationships where source is Terminal and target is Substation
- 60 relationships where source is Substation and target is Terminal

In [23]:
relationships = extractors.rdf_to_relationships.rdf2relationships(graph_store, data_model)

In [24]:
no_sub_ter_rela = len(relationships[(relationships.source_external_id.str.match('Substation.*')) & 
                                    (relationships.target_external_id.str.match('Terminal.*'))])
no_ter_sub_rela = len(relationships[(relationships.source_external_id.str.match('Terminal.*')) & 
                                    (relationships.target_external_id.str.match('Substation.*'))])

print(f"Substation-Terminal relations: {no_sub_ter_rela}")
print(f"Terminal-Substation relations: {no_ter_sub_rela}")

Substation-Terminal relations: 60
Terminal-Substation relations: 60
