# Creating Mock Data

In the case you have a data model but no data, it can be useful to be able to generate some data for your model. 
There are several use cases for this

* In the design phase, you want to quickly try out your current data model iteration with some data.
* You need data for testing the data model.
* Load testing of a data model.



`pygen` comes with a `MockGenerator`. Reference for this you can find [here](../api/utils_mock_generator.html), this is a practical guide to the usage of this module.

## Generate Default Mock Data

In [1]:
from cognite.pygen.utils import MockGenerator, load_cognite_client_from_toml

In [2]:
client = load_cognite_client_from_toml()

In this example, we will use the `WindMill` data model as an example. 

Lets instantiate the `MockGenerator`, we do this by passing the data model along with the instance space we want to use for the
generated data. In addition, we need an instantiated `CogniteClient` to fetch the data model.

In [3]:
generator = MockGenerator.from_data_model(("power-models", "Windmill", "1"), instance_space="sp_sandbox", client=client, seed=42)

In [4]:
generator

The `MockGenerator` has one method `generate_mock_data` which will generate the mock data using the default settings.

In [7]:
mock_data = generator.generate_mock_data()

## Inspect Generated Mock Data

In [8]:
mock_data

In [9]:
mock_data.nodes.to_pandas().head()

The mock data has a few convenience methods to make it easy to use the data. We can deploy and clean it, as well as dump it as yaml. 

In [10]:
mock_data.deploy(client)

In [11]:
mock_data.clean(client, delete_space=True)

## Control Amount of Mock Data

You can also control how the data is generated by providing one or more configs.

The easiest is to probide a default config that will be used for all views, but you can also have one config per view.

In [9]:
from cognite.pygen.utils.mock_generator import ViewMockConfig, GeneratorFunction, IDGeneratorFunction, DataType

If we want to generate more nodes and edges we can set the default config.

In [10]:
views = client.data_modeling.data_models.retrieve(("power-models", "Windmill", "1"), inline_views=True).latest_version().views

In [11]:
new_generator = MockGenerator(views, instance_space="sp_sandbox")

In [12]:
more_data = new_generator.generate_mock_data(node_count=100, max_edge_per_type=3, null_values=0.1)

In [13]:
more_data

## Customized Random Generation

We can also control how the random data is generated. 

We have two interfaces:

* Generation of mock data
* Generation of node IDs

In [14]:
DataType

In [15]:
GeneratorFunction

In [16]:
IDGeneratorFunction

In the data model, there is a `Blade` view

In [17]:
blade = next(v for v in views if v.external_id == "Blade")

In [18]:
blade.dump()["properties"].keys()

We see that this view has a property `is_damaged`, we want to replace the default generation of random value for this property.

We set it such that the blade is never damaged

In [30]:
import random

In [31]:
blade_config = ViewMockConfig(
    properties={"is_damaged": lambda count: [False]*count}
)   

In addition, we want all properties of type `Text` to be a random name

In [32]:
from faker import Faker

In [33]:
from cognite.client import data_modeling as dm

In [34]:
# Note that since we are using an external source for the ransomness, we have to set the seed ourselves
Faker.seed(42)
faker = Faker()

In [48]:
default_config = ViewMockConfig(
    # Note that this setting will not apply to the Blade View as that we are passing a custom config to it
    property_types={dm.Text: lambda count: [faker.unique.name() for _ in range(count)]}
)

In [49]:
custom_generator = MockGenerator(views, "sp_sandbox", view_configs={blade.as_id(): blade_config}, default_config=default_config, seed=7)

In [50]:
customized_mock_data = custom_generator.generate_mock_data()

In [51]:
customized_mock_data

In [58]:
blade_data = next(view_data for view_data in customized_mock_data if view_data.view_id == blade.as_id())

In [53]:
blade_data.node.dump()

We see that `is_damaged` is set to `False` when it is not nullable. Note that in addition, the new default function for text is not applied here as the blade view has its own config were we did not overwrite the Text field generator.

In [54]:
windmill = next(v for v in views if v.external_id == "Windmill")

In [55]:
windmill_data = next(view_data for view_data in customized_mock_data if view_data.view_id == windmill.as_id())

In [57]:
windmill_data.node.dump()

For the windfarm, we see that the text property `windfarm` has been set with our random Text generator.