# Part 1: Data Model Generation

[![Notebook](https://shields.io/badge/notebook-access-green?logo=jupyter&style=for-the-badge)](https://github.com/cognitedata/neat/blob/main/docs/tutorial/notebooks/part-1-data-model-generation.ipynb)

* author: Nikola Vasiljevic
* date: 2023-05-31

This notebook represent Part 1 of NEAT Onboarding tutorial. In this notebook we will demonstrate how to generate data model using NEAT.
All you need is basic knowledge of Excel and Python, and very good knowledge of domain you are trying to model.

For simplicity we will define a simple power grid data model.

Before proceeding download `Transformation Rule` template using [this link](https://drive.google.com/uc?export=download&id=1yJxK35IaKVpZJas60ojReCjh-Ppj9fKX). Unzip file and open template:


<video src="../../videos/tutorial-1-download-rules-template.mp4" controls>
</video>


Let's import all necessary libraries. For simplicity we also provide already prefilled rules sheet from examples named `power_grid_model`:

In [1]:
from pathlib import Path
import warnings

from cognite.neat.core.rules import load_rules_from_excel_file
from cognite.neat.core.rules.exporter.rules2graphql import GraphQLSchema
from cognite.neat.core.rules.exporter.rules2ontology import Ontology

from cognite.neat.examples import power_grid_model

Let's now fill in the template sheet, going sheet by sheet in the following order
- `Metadata` : where we will provide metadata about data model itself
- `Classes` : where we will defined classes
- `Properties`: where we will define properties for each of defined classes


<video src="../../videos/tutorial-1-defining-data-model.mp4" controls>
</video>


For more information about `Transformation Rules` check [this detail overview](../../transformation-rules.md). 


Once we are done with filling in the template sheet, let's load it and transform it to GraphQL schema which represents our data model.
If you have filled in your own transformation rules excel file replace `power_grid_model` with path to your file:

In [2]:
your_path_to_excel_file = Path("insert_path_here")
transformation_rules = load_rules_from_excel_file(power_grid_model)
data_model_gql = GraphQLSchema.from_rules(transformation_rules=transformation_rules)

If we now print derive GraphQL schema we can see how each of the objects (i.e. classes) are defined and represented in GraphQL:

In [3]:
print(data_model_gql.schema)

type GeographicalRegion {
  name: String!
}

type SubGeographicalRegion {
  name: String!
  region: GeographicalRegion
}

type Substation {
  name: String!
  subGeographicalRegion: SubGeographicalRegion
}

type Terminal {
  name: String!
  aliasName: String!
  substation: Substation
}


Derive GraphQL schema now can be uploaded to CDF and resolved as Flexible Data Model:

<video src="../../videos/tutorial-1-upload-gql-schema-to-cdf.mp4" controls>
</video>


Let's now convert Transformation Rules to OWL based semantic ontology and SHACL object constraints:

In [4]:
with warnings.catch_warnings(record=True) as validation_warnings:
    ontology = Ontology.from_rules(transformation_rules=transformation_rules)

Ontology is stored in RDF Graph accessible through `.ontology` , where to actually see its content we serialize it and print it out:

In [5]:
print(ontology.ontology)

@prefix dct: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix power-grid: <http://purl.org/cognite/power-grid#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

power-grid: a owl:Ontology ;
    rdfs:label "Power Grid Example Data Model" ;
    dct:created "2022-09-29T00:00:00"^^xsd:dateTime ;
    dct:creator "Anders Albert",
        "Nikola Vasiljevic" ;
    dct:description "This is simplified power grid data model used in NEAT tutorial." ;
    dct:hasVersion "0_1_0" ;
    dct:modified "2023-07-25T09:03:02.759234"^^xsd:dateTime ;
    dct:rights "Free for non-commerical use" ;
    dct:title "Power Grid Example Data Model" ;
    owl:versionInfo "0_1_0" .

power-grid:aliasName a owl:DatatypeProperty ;
    rdfs:label "aliasName" ;
    rdfs:comment "The alternative name that identifies Substation" ;
    rdfs:domain power-grid:Terminal 

In the same why we access shape constraints:

In [6]:
print(ontology.constraints)

@prefix power-grid: <http://purl.org/cognite/power-grid#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

power-grid:TerminalShape a sh:NodeShape ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:nodeKind sh:Literal ;
            sh:path power-grid:name ],
        [ sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:node power-grid:SubstationShape ;
            sh:nodeKind sh:IRI ;
            sh:path power-grid:substation ],
        [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:nodeKind sh:Literal ;
            sh:path power-grid:aliasName ] ;
    sh:targetClass power-grid:Terminal .

power-grid:GeographicalRegionShape a sh:NodeShape ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:nodeKind sh:Literal ;
            sh:path power-grid:name ] ;
   

Entire Semantic Data Model (ontology + constraints) can be access through property `semantic_data_model`:

In [7]:
print(ontology.semantic_data_model)

@prefix dct: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix power-grid: <http://purl.org/cognite/power-grid#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

power-grid: a owl:Ontology ;
    rdfs:label "Power Grid Example Data Model" ;
    dct:created "2022-09-29T00:00:00"^^xsd:dateTime ;
    dct:creator "Anders Albert",
        "Nikola Vasiljevic" ;
    dct:description "This is simplified power grid data model used in NEAT tutorial." ;
    dct:hasVersion "0_1_0" ;
    dct:modified "2023-07-25T09:03:02.759234"^^xsd:dateTime ;
    dct:rights "Free for non-commerical use" ;
    dct:title "Power Grid Example Data Model" ;
    owl:versionInfo "0_1_0" .

power-grid:TerminalShape a sh:NodeShape ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount