# Data Population
**Prerequisites**
- Access to a CDF Project.
- Know how to use a terminal, so you can run `pygen` from the command line to 
  generate the SDK.
- Knowledge of your the data and data model.

In [None]:
import warnings

warnings.filterwarnings("ignore")
# This is just to enable improting the generated SDK from the examples folder in the pygen repository
import sys  # noqa: E402

from tests.constants import REPO_ROOT  # noqa: E402

sys.path.append(str(REPO_ROOT / "examples"))

## Introduction to Problem

`pygen` can be used to ingest data into an existing data model. It is well suited when the source data is nested and comes in a format such as `JSON`.

Before you can ingest data you need the following:

1. A Data Model Deployed to CDF.
2. Generated an SDK for it.

In this guide, we will use some windmill data as an example. First, we already have a deployed a model and generated an SDK for it.

The model was generated with the follwing config from the `pyproject.toml`

```toml
[tool.pygen]
data_models = [
    ["sp_pygen_power", "WindTurbine", "1"],
]

```


The model is illustrated in Cognite Data Fusions interface below:

<img src="images/windturbine_model" width="800">

First, we will inspect some of the data we have available

In [3]:
from tests.constants import WindMillFiles

In [4]:
print(WindMillFiles.Data.wind_mill_json.read_text()[:500])

[
    {

        "name": "hornsea_1_mill_3",
        "windfarm": "Hornsea 1",
        "capacity": 7.0,
        "rotor": {

            "rotor_speed_controller": "V52-WindTurbine.ROT",
            "rpm_low_speed_shaft": "V52-WindTurbine.cnt0"
        },
        "nacelle": {

            "gearbox": {

                "displacement_x": "V52-WindTurbine.Gear_D_X",
                "displacement_y": "V52-WindTurbine.Gear_D_Y",
                "displacement_z": "V52-WindTurbine.Gear_D_Z"
            },


As we see in the snippet above this is nested data, which is well suited for `pygen` supported ingestion

## External ID Hook

All data in CDF data models needs to have an `external_id` set. Often, source data does not come with an `external_id` set, and to help this `pygen` comes with a built in hook that enables you to set `external_id` when you are ingesting the data. The name of this hook is an `external_id_factory` and you can set it importing the `DomainModelWrite` from your generated data classes. 

In [5]:
from windmill.data_classes import DomainModelWrite

from cognite.pygen.utils.external_id_factories import create_external_id_factory, incremental_factory, uuid_factory

In [7]:
DomainModelWrite.external_id_factory = uuid_factory

The `external_id_factory` is a function that takes in two arguments, first a `type` which is the data class for the object and then a `dict` with the data for that partuclar object. `pygen` comes with a few generic external id factories you can use, see [External ID factory](../api/utils_external_id_factory.html) These can be good for testing an exploration, but we recommend that you write your own factory function for (at least) the most important classes.

In the example below, we write a factory method that sets the ID for all windmills. Looking at the snippet below we note that the windmill have an `name` from the source system, so we would like to use this as the `external_id`.

In [8]:
from windmill.data_classes import WindmillWrite

fallback_factory = create_external_id_factory(suffix_ext_id_factory=incremental_factory)


def windmill_factory(domain_cls: type, data: dict) -> str:
    if domain_cls is WindmillWrite:
        return data["name"]
    else:
        # Fallback to incremental
        return fallback_factory(domain_cls, data)


# Finally, we set the new factory
DomainModelWrite.external_id_factory = windmill_factory

## Ingesting the Data

After we have set the `external_id_factory` we are all good to go. `pygen` is generating `pydantic` data classes which means we can use the built in support for json validation in `pydantic`

We not that we had a list of windmills, in `pydantic` we use a `TypeAdapter` to parse a list of objects

In [9]:
from pydantic import TypeAdapter

In [10]:
windmills = TypeAdapter(list[WindmillWrite]).validate_json(WindMillFiles.Data.wind_mill_json.read_text())

`pygen` also support `pydantic` v1. The same line above for v1 is

```python
from pydantic import parse_as_obj

windmills = parse_as_obj(list[WindmillWrite], WindMillFiles.Data.wind_mill_json.read_text())
```

In [11]:
from windmill.data_classes import WindmillWriteList

In [12]:
# The WindmillWriteList has a few helper methods and nicer display than a regular list
windmills = WindmillWriteList(windmills)
windmills

Unnamed: 0,space,external_id,blades,capacity,metmast,nacelle,name,rotor,windfarm,node_type,data_record
0,windmill-instances,hornsea_1_mill_3,"[{'space': 'windmill-instances', 'external_id'...",7.0,[],"{'space': 'windmill-instances', 'external_id':...",hornsea_1_mill_3,"{'space': 'windmill-instances', 'external_id':...",Hornsea 1,,{'existing_version': None}
1,windmill-instances,hornsea_1_mill_2,"[{'space': 'windmill-instances', 'external_id'...",7.0,[],"{'space': 'windmill-instances', 'external_id':...",hornsea_1_mill_2,"{'space': 'windmill-instances', 'external_id':...",Hornsea 1,,{'existing_version': None}
2,windmill-instances,hornsea_1_mill_1,"[{'space': 'windmill-instances', 'external_id'...",7.0,[],"{'space': 'windmill-instances', 'external_id':...",hornsea_1_mill_1,"{'space': 'windmill-instances', 'external_id':...",Hornsea 1,,{'existing_version': None}
3,windmill-instances,hornsea_1_mill_4,"[{'space': 'windmill-instances', 'external_id'...",7.0,[],"{'space': 'windmill-instances', 'external_id':...",hornsea_1_mill_4,"{'space': 'windmill-instances', 'external_id':...",Hornsea 1,,{'existing_version': None}
4,windmill-instances,hornsea_1_mill_5,"[{'space': 'windmill-instances', 'external_id'...",7.0,[],"{'space': 'windmill-instances', 'external_id':...",hornsea_1_mill_5,"{'space': 'windmill-instances', 'external_id':...",Hornsea 1,,{'existing_version': None}


We note that the `external_id` field is set to the `name` for the windmill. If we check the other objects we see these gets an `external_id` = `class_name.lower():counter`

In [13]:
windmills[0].nacelle

Unnamed: 0,value
space,windmill-instances
external_id,nacellewrite:1
data_record,{'existing_version': None}
node_type,
acc_from_back_side_x,V52-WindTurbine.Acc1N
acc_from_back_side_y,V52-WindTurbine.Acc2N
acc_from_back_side_z,V52-WindTurbine.Acc3N
gearbox,"{'space': 'windmill-instances', 'external_id':..."
generator,"{'space': 'windmill-instances', 'external_id':..."
high_speed_shaft,"{'space': 'windmill-instances', 'external_id':..."


We can now upload this data by creating a domain client and call the `windmill.upsert` method. 

In [14]:
from windmill import WindmillClient

In [17]:
wind = WindmillClient.from_toml("config.toml")

In [18]:
result = wind.upsert(windmills)
print(f"{len(result.nodes)} nodes and {len(result.edges)} uploaded")

145 nodes and 105 uploaded


Note that `pygen` have the method `.to_instances_write()` you can use to check which `nodes`and `edges` were created.

We note that `pygen` created in total 145 nodes and 105 edges between these nodes.

The edges were of 2 different types, and then nodes were ingested into 10 different views

In [19]:
instances = windmills.to_instances_write()

In [20]:
len(instances.nodes), len(instances.edges)

(145, 105)

In [21]:
unique = set(edge.type.external_id for edge in instances.edges)
len(unique), unique

(2, {'Blade.sensor_positions', 'Windmill.blades'})

In [22]:
unique = set([source.source for node in instances.nodes for source in node.sources])
len(unique), unique

(10,
 {ViewId(space='power-models', external_id='Blade', version='1'),
  ViewId(space='power-models', external_id='Gearbox', version='1'),
  ViewId(space='power-models', external_id='Generator', version='1'),
  ViewId(space='power-models', external_id='HighSpeedShaft', version='1'),
  ViewId(space='power-models', external_id='MainShaft', version='1'),
  ViewId(space='power-models', external_id='Nacelle', version='1'),
  ViewId(space='power-models', external_id='PowerInverter', version='1'),
  ViewId(space='power-models', external_id='Rotor', version='1'),
  ViewId(space='power-models', external_id='SensorPosition', version='1'),
  ViewId(space='power-models', external_id='Windmill', version='1')})

In [23]:
instances.nodes

Unnamed: 0,space,instance_type,external_id,sources
0,windmill-instances,node,hornsea_1_mill_3,"[{'properties': {'capacity': 7.0, 'nacelle': {..."
1,windmill-instances,node,bladewrite:1,"[{'properties': {'is_damaged': False, 'name': ..."
2,windmill-instances,node,sensorpositionwrite:1,[{'properties': {'flapwise_bend_mom': 'V52-Win...
3,windmill-instances,node,sensorpositionwrite:2,[{'properties': {'edgewise_bend_mom_offset': '...
4,windmill-instances,node,sensorpositionwrite:3,[{'properties': {'edgewise_bend_mom_crosstalk_...
...,...,...,...,...
140,windmill-instances,node,generatorwrite:5,[{'properties': {'generator_speed_controller':...
141,windmill-instances,node,highspeedshaftwrite:5,[{'properties': {'bending_moment_y': 'V52-Wind...
142,windmill-instances,node,mainshaftwrite:5,[{'properties': {'bending_x': 'V52-WindTurbine...
143,windmill-instances,node,powerinverterwrite:5,[{'properties': {'active_power_total': 'V52-Wi...


In [24]:
instances.edges

Unnamed: 0,space,instance_type,external_id,type,start_node,end_node
0,windmill-instances,edge,hornsea_1_mill_3:bladewrite:1,"{'space': 'power-models', 'external_id': 'Wind...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
1,windmill-instances,edge,bladewrite:1:sensorpositionwrite:1,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
2,windmill-instances,edge,bladewrite:1:sensorpositionwrite:2,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
3,windmill-instances,edge,bladewrite:1:sensorpositionwrite:3,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
4,windmill-instances,edge,bladewrite:1:sensorpositionwrite:4,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
...,...,...,...,...,...,...
100,windmill-instances,edge,bladewrite:15:sensorpositionwrite:86,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
101,windmill-instances,edge,bladewrite:15:sensorpositionwrite:87,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
102,windmill-instances,edge,bladewrite:15:sensorpositionwrite:88,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
103,windmill-instances,edge,bladewrite:15:sensorpositionwrite:89,"{'space': 'power-models', 'external_id': 'Blad...","{'space': 'windmill-instances', 'external_id':...","{'space': 'windmill-instances', 'external_id':..."
