# Data Migration
**Prerequisites**
- Access to a CDF Project.
- Know how to use a terminal, so you can run `pygen` from the command line to 
  generate the SDK.
- Knowledge of your the data and data models.

In [1]:
import warnings
warnings.filterwarnings('ignore')
# This is just to enable improting the generated SDK from the examples folder in the pygen repository
import sys
from tests.constants import REPO_ROOT
sys.path.append(str(REPO_ROOT / "examples" ))

## Introduction to Problem

In the development of a solution, it is common that you move from one data model to another. Typically, the physicaly storage of data, the containers, stays the same, or are extended, 
and only the views of a data model change. In this case, there is no need for a data migration and all you do is update the solution code for the new model. 

If you, however, need to change how the data is stored in the containers, you need to move the data from one model to the other. A common to do this is to use CDF Transformations. An alternative
ways is to use `pygen` for data migration.

#### Advantage of CDF Transformations
* Can handle large volume of data, millions of instances.
* You can write the transformations in the UI using SQL.


#### Advantages of using `pygen` 
* Edges are automatically created.
* Can handle medium volume of data, hundreds of thousands of instances.
* Support doing very custom migration (what ever you can do in Python code).

**Why** would you **chose `pygen`** over **CDF Transformations**? If you are more comfortable with `Python` than writing `SQL`, you have data models with a of one-to-many edges, the number of nodes and edges (instances) is in the order of hundreds of thousands.




## Use Case

The use case we use in this guide is moving from one Asset Performance Model to another. 

We wil refer to the model we are moving away from as the **source** model and
the model we are moving to as the **destination** model.

**Source** Model


<img src="images/source_model.png" width="400">

**Destination Model**:


<img src="images/destination_model.png" width="400">

As we see in the illustratoins above (click to enlarge), we have many more edges in the destination than the source model, 
whiche means this is a good use case for using `pygen` for the data migration.

In this guide, we will not do the entire migration, but instead focus on the `APM_Template` to `Template` migration.

## Generating SDKs

For demo purposes we will generate the SDKs in this notebook, however, depending on your use case it might be useful to use the `pygen` CLI to generate the SDKs.

Migration scrips are often a one of, meaning you are not expected to regenerat the SDK as the model changes, thus if you generate it locally, you can do changes
to the generated code as you are not expecting this to change later. 

Note that we have set up a `config.toml` with credentials to connect to CDF

In [1]:
from cognite.pygen import load_cognite_client_from_toml, generate_sdk_notebook

In [2]:
client = load_cognite_client_from_toml()

In [3]:
source = generate_sdk_notebook(("APM_AppData_4", "APM_AppData_4", "7"), client)

Successfully retrieved data model(s) ('APM_AppData_4', 'APM_AppData_4', '7')
Writing SDK to C:\Users\ANDERS~1\AppData\Local\Temp\pygen
Done!
Added C:\Users\ANDERS~1\AppData\Local\Temp\pygen to sys.path to enable import
Imported apm_app_data_4.client


In [4]:
destination = generate_sdk_notebook(("IntegrationTestsImmutable", "ApmAppData", "v3"), client)

Successfully retrieved data model(s) ('IntegrationTestsImmutable', 'ApmAppData', 'v3')
Writing SDK to C:\Users\ANDERS~1\AppData\Local\Temp\pygen
Done!
C:\Users\ANDERS~1\AppData\Local\Temp\pygen already in sys.path
Imported apm_app_data.client


## Retrieving Data

When retrieving the data we typically want only want nodes from one space, so we filter on space. In addition, it important that we retrieve all edges so we can 
connect the `templates` with the `templates_items`.

In [13]:
templates = source.apm_template.list(space="sourceSpace", retrieve_edges=True, limit=-1)
items = source.apm_checklist_item.list(space="sourceSpace", retrieve_edges=True, limit=-1)

In [14]:
len(templates), len(items)

(50, 0)

## Transformation

client

In [16]:
from collections import defaultdict

In [17]:
from cognite.client import data_modeling as dm

In [19]:
edges = client.data_modeling.instances.list("edge", filter=dm.filters.Equals(["edge", "space"], "APM_AppData_4"), limit=-1)

In [20]:
len(edges)

92

## Writing Data