# Part 1: Export Data Model to CDF

[![Notebook](https://shields.io/badge/notebook-access-green?logo=jupyter&style=for-the-badge)](https://github.com/cognitedata/neat/blob/main/docs/tutorial/notebooks/part-1-data-model-generation.ipynb)

* author: Nikola Vasiljevic
* date: 2023-05-31


**Prerequisite**: Installed Python with `excel` dependency `pip install cognite-neat[excel]`

**Content** This notebook represent Part 1 of NEAT Onboarding tutorial. In this notebook we will demonstrate how to export data model using NEAT.

This part 1 of a series of tutorials focused on learning the core concepts of `neat` through using it as a package.

## Transformation Rules

The *Transformation Rules* is a core concept of `neat`. It is how `neat` internally represent a data model with the transformations from a source model. We will go into more detail for the *Transformation Rules* in a later tutorial, but for now it is sufficient to note that the *Transformation Rules* are exposed to the user of `neat` through a four tables in a spreadsheet. For more information about *Transformation Rules* check [this detail overview](../../transformation-rules.html). 

## Parsing Transformation Rules

To get started, we use the built in example ` power_grid_model`.

In [1]:
from cognite.neat import rules
from cognite.neat.rules.examples import power_grid_model

In [2]:
type(power_grid_model), power_grid_model.suffix

(pathlib.WindowsPath, '.xlsx')

In [3]:
power = rules.parser.parse_rules_from_excel_file(power_grid_model)
power

Unnamed: 0,value
prefix,power-grid
cdf_space_name,playground
namespace,http://purl.org/cognite/power-grid#
data_model_name,power_grid
version,0_1_0
is_current_version,True
created,2022-09-29 00:00:00
updated,2023-10-07 05:24:44.364695
title,Power Grid Example Data Model
description,This is simplified power grid data model used ...


As we see above, the example is simply a excel file that we can parse to obtain the *Transformation Rules*.
We can inspect the different sheets of the rules using the properties below

In [4]:
power.metadata

Metadata(prefix='power-grid', cdf_space_name='playground', namespace=Namespace('http://purl.org/cognite/power-grid#'), data_model_name='power_grid', version='0_1_0', is_current_version=True, created=datetime.datetime(2022, 9, 29, 0, 0), updated=datetime.datetime(2023, 10, 7, 5, 24, 44, 364695), title='Power Grid Example Data Model', description='This is simplified power grid data model used in NEAT tutorial.', creator=['Nikola Vasiljevic', 'Anders Albert'], contributor=None, rights='Free for non-commerical use', externalIdPrefix='', data_set_id=2626756768281823, source=WindowsPath('C:/Users/AndersAlbert/Projects/internal/neat/cognite/neat/rules/examples/power-grid-example.xlsx'), dms_compliant=True)

In [8]:
power.classes

{'GeographicalRegion': Class(description=None, cdf_resource_type='Asset', deprecated=False, deprecation_date=None, replaced_by=None, source=None, source_entity_name=None, match_type=None, comment=None, class_id='GeographicalRegion', class_name='GeographicalRegion', parent_class=None, parent_asset=None, Dataset Id=nan, similarTo=nan, similarityScore=nan, equalTo=nan),
 'SubGeographicalRegion': Class(description='A subset of a geographical region of a power system network model.', cdf_resource_type='Asset', deprecated=False, deprecation_date=None, replaced_by=None, source=None, source_entity_name=None, match_type=None, comment=None, class_id='SubGeographicalRegion', class_name='SubGeographicalRegion', parent_class=None, parent_asset='GeographicalRegion', Dataset Id=nan, similarTo=nan, similarityScore=nan, equalTo=nan),
 'Substation': Class(description='A substation is a part of an electrical generation, transmission, and distribution system.', cdf_resource_type='Asset', deprecated=False,

In [6]:
power.properties

{'row 3': Property(description='The name that identifies Greographical', cdf_resource_type=['Asset'], deprecated=False, deprecation_date=None, replaced_by=None, source=None, source_entity_name=None, match_type=None, comment=None, class_id='GeographicalRegion', property_id='name', property_name='name', expected_value_type='string', min_count=1, max_count=1, default=None, property_type='DatatypeProperty', resource_type_property=['name'], source_type='Asset', target_type='Asset', label='name', relationship_external_id_rule=None, rule_type=<RuleType.rdfpath: 'rdfpath'>, rule='cim:GeographicalRegion(cim:IdentifiedObject.name)', skip_rule=False, similarTo=nan, similarityScore=nan, equalTo=nan),
 'row 4': Property(description='The name that identifies SubGreographical', cdf_resource_type=['Asset'], deprecated=False, deprecation_date=None, replaced_by=None, source=None, source_entity_name=None, match_type=None, comment=None, class_id='SubGeographicalRegion', property_id='name', property_name='

### (Optional - Advanced) Create Your Own Transformation Rules

Before proceeding download `Transformation Rule` template using [this link](https://drive.google.com/uc?export=download&id=1yJxK35IaKVpZJas60ojReCjh-Ppj9fKX). Unzip file and open template:


<video src="../../videos/tutorial-1-download-rules-template.mp4" controls>
</video>

Let's now fill in the template sheet, going sheet by sheet in the following order
- `Metadata` : where we will provide metadata about data model itself
- `Classes` : where we will defined classes
- `Properties`: where we will define properties for each of defined classes


<video src="../../videos/tutorial-1-defining-data-model.mp4" controls>
</video>


Once we are done with filling in the template sheet, lets parse it as we we did with the `power_grid_model` shown above.


## Export 

In [2]:
from cognite.neat import rules

In [None]:


Let's import all necessary libraries. For simplicity we also provide already prefilled rules sheet from examples named `power_grid_model`:

.
If you have filled in your own transformation rules excel file replace `power_grid_model` with path to your file:

In [2]:
# transformation_rules = parse_rules_from_excel_file(Path("your_path_to_excel_file"))
transformation_rules = parse_rules_from_excel_file(power_grid_model)


data_model_gql = GraphQLSchema.from_rules(transformation_rules=transformation_rules, verbose=True)

If we now print derive GraphQL schema we can see how each of the objects (i.e. classes) are defined and represented in GraphQL:

In [3]:
print(data_model_gql.schema)

type GeographicalRegion {
  """
  The name that identifies Greographical
  @name name
  """
  name: String!
}

"""
A subset of a geographical region of a power system network model.
@name SubGeographicalRegion
"""
type SubGeographicalRegion {
  """
  The name that identifies SubGreographical
  @name name
  """
  name: String!
  """
  Region to which subgeographical region belongs to
  @name region
  """
  region: GeographicalRegion
}

"""
A substation is a part of an electrical generation, transmission, and distribution system.
@name Substation
"""
type Substation {
  """
  The name that identifies Substation
  @name name
  """
  name: String!
  """
  The subgeographical region containing the substation
  @name subGeographicalRegion
  """
  subGeographicalRegion: SubGeographicalRegion
}

type Terminal {
  """
  The name that identifies Terminal
  @name name
  """
  name: String!
  """
  The alternative name that identifies Substation
  @name aliasName
  """
  aliasName: String!
  """
 

Derive GraphQL schema now can be uploaded to CDF and resolved as Flexible Data Model:

<video src="../../videos/tutorial-1-upload-gql-schema-to-cdf.mp4" controls>
</video>


Let's now convert Transformation Rules to OWL based semantic ontology and SHACL object constraints:

In [4]:
with warnings.catch_warnings(record=True) as validation_warnings:
    ontology = Ontology.from_rules(transformation_rules=transformation_rules)

Ontology is stored in RDF Graph accessible through `.ontology` , where to actually see its content we serialize it and print it out:

In [5]:
print(ontology.semantic_data_model)

@prefix dct: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix power-grid: <http://purl.org/cognite/power-grid#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

power-grid: a owl:Ontology ;
    rdfs:label "Power Grid Example Data Model" ;
    dct:created "2022-09-29T00:00:00"^^xsd:dateTime ;
    dct:creator "Anders Albert",
        "Nikola Vasiljevic" ;
    dct:description "This is simplified power grid data model used in NEAT tutorial." ;
    dct:hasVersion "0_1_0" ;
    dct:modified "2023-08-15T08:25:08.712363"^^xsd:dateTime ;
    dct:rights "Free for non-commerical use" ;
    dct:title "Power Grid Example Data Model" ;
    owl:versionInfo "0_1_0" .

power-grid:TerminalShape a sh:NodeShape ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount

In the same why we access shape constraints:

In [6]:
print(ontology.constraints)

@prefix power-grid: <http://purl.org/cognite/power-grid#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

power-grid:TerminalShape a sh:NodeShape ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:nodeKind sh:Literal ;
            sh:path power-grid:aliasName ],
        [ sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:node power-grid:SubstationShape ;
            sh:nodeKind sh:IRI ;
            sh:path power-grid:substation ],
        [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:nodeKind sh:Literal ;
            sh:path power-grid:name ] ;
    sh:targetClass power-grid:Terminal .

power-grid:GeographicalRegionShape a sh:NodeShape ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:nodeKind sh:Literal ;
            sh:path power-grid:name ] ;
   

Entire Semantic Data Model (ontology + constraints) can be access through property `semantic_data_model`: