# Creating Mock Data

In the case you have a data model but no data, it can be useful to be able to generate some data for your model. 
There are several use cases for this

* In the design phase, you want to quickly try out your current data model iteration with some data.
* You need data for testing the data model.
* Load testing of a data model.



`pygen` comes with a `MockGenerator`. Reference for this you can find [here](../api/utils_mock_generator.html), this is a practical guide to the usage of this module.

## Generate Default Mock Data

In [1]:
from cognite.pygen.utils import MockGenerator, load_cognite_client_from_toml

In [2]:
client = load_cognite_client_from_toml()

In this example, we will use the `WindMill` data model as an example. 

Lets instantiate the `MockGenerator`, we do this by passing the data model along with the instance space we want to use for the
generated data. In addition, we need an instantiated `CogniteClient` to fetch the data model.

In [3]:
generator = MockGenerator.from_data_model(("power-models", "Windmill", "1"), instance_space="sp_sandbox", client=client, seed=42)

In [4]:
generator

The `MockGenerator` has one method `generate_mock_data` which will generate the mock data using the default settings.

In [7]:
mock_data = generator.generate_mock_data()

## Inspect Generated Mock Data

In [8]:
mock_data

Unnamed: 0,resource,count
0,node,55
1,edge,28
2,timeseries,136
3,sequence,0
4,file,0


In [9]:
mock_data.nodes.to_pandas().head()

Unnamed: 0,instance_type,space,external_id,sources
0,node,sp_sandbox,blade_92349,"[{'properties': {'name': 'KriXref', 'is_damage..."
1,node,sp_sandbox,blade_9116,"[{'properties': {'name': 'evblAbk', 'is_damage..."
2,node,sp_sandbox,blade_6006,"[{'properties': {'name': 'HbolMJU', 'is_damage..."
3,node,sp_sandbox,blade_86673,"[{'properties': {'name': None, 'is_damaged': T..."
4,node,sp_sandbox,blade_29871,"[{'properties': {'name': 'HClEQaP', 'is_damage..."


The mock data has a few convenience methods to make it easy to use the data. We can deploy and clean it, as well as dump it as yaml. 

In [10]:
mock_data.deploy(client)

Created 55 nodes and 28 edges
Created/Updated 136 timeseries


In [11]:
mock_data.clean(client, delete_space=True)

Deleted 55 nodes and 28 edges 
Deleted 136 timeseries
Deleted space sp_sandbox


## Control Amount of Mock Data

You can also control how the data is generated by providing one or more configs.

The easiest is to probide a default config that will be used for all views, but you can also have one config per view.

In [5]:
from cognite.pygen.utils.mock_generator import ViewMockConfig, GeneratorFunction, IDGeneratorFunction, DataType

If we want to generate more nodes and edges we can set the default config.

In [17]:
views = client.data_modeling.data_models.retrieve(("power-models", "Windmill", "1"), inline_views=True).latest_version().views

In [9]:
new_generator = MockGenerator(views, instance_space="sp_sandbox", default_config=
                              ViewMockConfig(node_count=100, max_edge_per_type=3, null_values=0.1))

In [10]:
more_data = new_generator.generate_mock_data()

In [11]:
more_data

Unnamed: 0,resource,count
0,node,1100
1,edge,423
2,timeseries,3060
3,sequence,0
4,file,0


## Customized Random Generation

We can also control how the random data is generated. 

We have two interfaces:

* Generation of mock data
* Generation of node IDs

In [12]:
DataType

typing.Union[int, float, bool, str, dict, NoneType]

In [13]:
GeneratorFunction

In [14]:
IDGeneratorFunction

In the data model, there is a `Blade` view

In [19]:
blade = next(v for v in views if v.external_id == "Blade")

In [53]:
blade.dump()["properties"].keys()

dict_keys(['name', 'is_damaged', 'sensor_positions'])

We see that this view has a property `is_damaged`, we want to replace the default generation of random value for this property.

We set it such that the blade is damaged in 10% of the cases

In [54]:
import random

In [55]:
blade_config = ViewMockConfig(
    properties={"is_damaged": lambda count: random.choices([True, False], weights=[0.1, 0.9], k=count)}
)   

In addition, we want all properties of type `Text` to be a random name

In [56]:
from faker import Faker

In [57]:
from cognite.client import data_modeling as dm

In [58]:
# Note that since we are using an external source for the ransomness, we have to set the seed ourselves
Faker.seed(42)
faker = Faker()

In [59]:
default_config = ViewMockConfig(
    property_types={dm.Text: lambda count: [faker.unique.name() for _ in range(count)]}
)

In [60]:
custom_generator = MockGenerator(views, "sp_sandbox", view_configs={blade.as_id(): blade_config}, default_config=default_config, seed=7)

In [45]:
customized_mock_data = custom_generator.generate_mock_data()

ValueError: Could not generate mock data for property position of type <class 'cognite.client.data_classes.data_modeling.data_types.Float64'>