# Validation Examples

As a result of optimizations, and the low level nature of the Power Grid Model's mathematical core, the core exceptions may not always be clear to the user. Therefore an optional validation mechanism is supplied, which validates data structures and values off-line. It is recommended to always validate your data before constructing a PowerGridModel instance. An alternative approach would be to validate only when an exception is raised, but be aware that not all data errors will raise exceptions: most of them will just yield invalid results without warning.

The basic methods and class definitions are available in the `power_grid_model.validation` module:

```python
# Manual validation
#   validate_input_data() assumes that you won't be using update data in your calculation.
#   validate_batch_data() validates input_data in combination with batch/update data.
validate_input_data(input_data, calculation_type, symmetric) -> list[ValidationError]
validate_batch_data(input_data, update_data, calculation_type, symmetric) -> dict[int, list[ValidationError]]

# Assertions
#   assert_valid_input_data() and assert_valid_batch_data() raise a ValidationException,
#   containing the list/dict of errors, when the data is invalid.
assert_valid_input_data(input_data, calculation_type, symmetric) raises ValidationException
assert_valid_batch_data(input_data, calculation_type, update_data, symmetric) raises ValidationException

# Utilities
#   errors_to_string() converts a set of errors to a human readable (multi-line) string representation
errors_to_string(errors, name, details)
```

Each validation error is an object which can be converted to a compact human-readable message using `str(error)`. It
contains three member variables `component`, `field` and `ids`, which can be used to gather more specific information about the validation error, e.g. which object IDs are involved.

```python
class ValidationError:
    
    # Component(s): e.g. ComponentType.node or [ComponentType.node, ComponentType.line]
    component: ComponentType | list[ComponentType]
    
    # Field(s): e.g. "id" or ["line_from", "line_to"] or [(ComponentType.node, "id"), (ComponentType.line, "id")]
    field: str | list[str] | list[tuple[ComponentType, str]]

    # IDs: e.g. [1, 2, 3] or [(ComponentType.node, 1), (ComponentType.line, 1)]
    ids: list[int] | list[tuple[ComponentType, int]] = []    
    
```

Note: The data types of `input_data` and `update_data` are the same as expected by the power grid model.

In [1]:
from power_grid_model import ComponentType, DatasetType, PowerGridModel, initialize_array

# A power grid containing several errors

# node
node_error = initialize_array(DatasetType.input, ComponentType.node, 3)
node_error["id"] = [1, 2, 3]
node_error["u_rated"] = [10.5e3]

# line
line_error = initialize_array(DatasetType.input, ComponentType.line, 3)
line_error["id"] = [4, 5, 6]
line_error["from_node"] = [1, 2, 3]
line_error["to_node"] = [2, 3, 4]
line_error["from_status"] = [True]
line_error["to_status"] = [True]
line_error["r1"] = [0.25]
line_error["x1"] = [0.2]
line_error["c1"] = [10e-6]
line_error["tan1"] = [0.0]

# Power Sensor
sensor_error = initialize_array(DatasetType.input, ComponentType.sym_power_sensor, 2)
sensor_error["id"] = [6, 7]
sensor_error["measured_object"] = [3, 4]
sensor_error["measured_terminal_type"] = [0, 2]
sensor_error["p_measured"] = [0]
sensor_error["q_measured"] = [0]
sensor_error["power_sigma"] = [0]

error_data = {
    ComponentType.node: node_error,
    ComponentType.line: line_error,
    ComponentType.sym_power_sensor: sensor_error,
}

In [2]:
# Without validation
model = PowerGridModel(error_data)
output_data = model.calculate_state_estimation(symmetric=True)

IDWrongType: Wrong type for object with id 4

Try validate_input_data() or validate_batch_data() to validate your data.


In [3]:
from power_grid_model.validation import assert_valid_input_data

# Assert valid data
assert_valid_input_data(error_data, symmetric=True)
model = PowerGridModel(error_data)
output_data = model.calculate_state_estimation(symmetric=True)

ValidationException: There are 5 validation errors in input_data:
   1. Fields line.id and sym_power_sensor.id are not unique for 2 lines/sym_power_sensors.
   2. Field 'to_node' does not contain a valid node id for 1 line.
   3. Field 'power_sigma' is not greater than zero for 2 sym_power_sensors.
   4. Field 'measured_object' does not contain a valid line/asym_line/generic_branch/transformer id for 1 sym_power_sensor. (measured_terminal_type=branch_from)
   5. Field 'measured_object' does not contain a valid source id for 1 sym_power_sensor. (measured_terminal_type=source)

In [4]:
from power_grid_model.validation import ValidationException

# Assert valid data and display component ids
try:
    assert_valid_input_data(error_data, symmetric=True)
    model = PowerGridModel(error_data)
    output_data = model.calculate_state_estimation(symmetric=True)
except ValidationException as ex:
    for error in ex.errors:
        print(type(error).__name__, error.component, ":", error.ids)

MultiComponentNotUniqueError [<ComponentType.line: 'line'>, <ComponentType.sym_power_sensor: 'sym_power_sensor'>] : [(<ComponentType.line: 'line'>, np.int32(6)), (<ComponentType.sym_power_sensor: 'sym_power_sensor'>, np.int32(6))]
InvalidIdError ComponentType.line : [6]
NotGreaterThanError ComponentType.sym_power_sensor : [6, 7]
InvalidIdError ComponentType.sym_power_sensor : [6]
InvalidIdError ComponentType.sym_power_sensor : [7]


In [5]:
from power_grid_model.validation import errors_to_string, validate_input_data

# Validation only as exception handling
try:
    model = PowerGridModel(error_data)
    output_data = model.calculate_state_estimation(symmetric=True)
except RuntimeError as _ex:
    errors = validate_input_data(error_data, symmetric=True)
    print(errors_to_string(errors))

There are 5 validation errors in the data:
   1. Fields line.id and sym_power_sensor.id are not unique for 2 lines/sym_power_sensors.
   2. Field 'to_node' does not contain a valid node id for 1 line.
   3. Field 'power_sigma' is not greater than zero for 2 sym_power_sensors.
   4. Field 'measured_object' does not contain a valid line/asym_line/generic_branch/transformer id for 1 sym_power_sensor. (measured_terminal_type=branch_from)
   5. Field 'measured_object' does not contain a valid source id for 1 sym_power_sensor. (measured_terminal_type=source)


In [6]:
# Manual checking and display detailed information about the invalid data
errors = validate_input_data(error_data, symmetric=True)
print(errors_to_string(errors, details=True))

There are 5 validation errors in the data:

	Fields line.id and sym_power_sensor.id are not unique for 2 lines/sym_power_sensors.
		component: line/sym_power_sensor
		field: line.id and sym_power_sensor.id
		ids: [(<ComponentType.line: 'line'>, np.int32(6)), (<ComponentType.sym_power_sensor: 'sym_power_sensor'>, np.int32(6))]

	Field 'to_node' does not contain a valid node id for 1 line.
		component: line
		field: 'to_node'
		ids: [6]
		ref_components: node
		filters: 

	Field 'power_sigma' is not greater than zero for 2 sym_power_sensors.
		component: sym_power_sensor
		field: 'power_sigma'
		ids: [6, 7]
		ref_value: zero

	Field 'measured_object' does not contain a valid line/asym_line/generic_branch/transformer id for 1 sym_power_sensor. (measured_terminal_type=branch_from)
		component: sym_power_sensor
		field: 'measured_object'
		ids: [6]
		ref_components: line/asym_line/generic_branch/transformer
		filters: (measured_terminal_type=branch_from)

	Field 'measured_object' does not c

## Batch datasets


In [7]:
from power_grid_model.validation import validate_batch_data

node = initialize_array(DatasetType.input, ComponentType.node, 1)
node[:] = (1, 10e3)
source = initialize_array(DatasetType.input, ComponentType.source, 1)
source[:] = (2, 1, 1, 1.0, 0.0, 1e10, 0.1, 1.0)
load = initialize_array(DatasetType.input, ComponentType.sym_load, 1)
load[:] = (3, 1, 1, 0, 1000, 1000)

input_data = {
    ComponentType.node: node,
    ComponentType.source: source,
    ComponentType.sym_load: load,
}

# update data
load_update = initialize_array(DatasetType.update, ComponentType.sym_load, (100, 1))
load_update["id"] = 3
load_update["status"] = -5
batch_data = {ComponentType.sym_load: load_update}

errors = validate_batch_data(input_data=input_data, update_data=batch_data, symmetric=True)
errors_to_string(errors, details=True)

"There is a validation error in the data, batch #0:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #1:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #2:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #3:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #4:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #5:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tc

### Tip: Validating Large datasets

The data validator is not designed for performance and is quite slow at validating large datasets.
Practically most errors can be identified in the first few scenarios.
Hence a handy tip is to validate a small section of the data using slicing. 

In [8]:
sliced_batch_data = {component: array[:100] for component, array in batch_data.items()}
errors = validate_batch_data(input_data=input_data, update_data=sliced_batch_data, symmetric=True)
errors_to_string(errors, details=True)

"There is a validation error in the data, batch #0:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #1:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #2:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #3:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #4:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tcomponent: sym_load\n\t\tfield: 'status'\n\t\tids: [3]\n\nThere is a validation error in the data, batch #5:\n\n\tField 'status' is not a boolean (0 or 1) for 1 sym_load.\n\t\tc

### Validating cartesian product of batch datasets

[Cartesian product of  batch datasets](../user_manual/calculations.md#cartesian-product-of-batch-datasets) are not supported in the `validate_batch_data`. 
The user should flatten the dataset manually to use the validator.
For example it can be done in the following way: 

In [9]:
import numpy as np

source_update = initialize_array(DatasetType.update, ComponentType.source, (3, 1))
source_update["id"] = [[2]]
source_update["u_ref"] = [[0.9], [1.0], [-1.1]]

source_2d, load_2d = np.meshgrid(source_update, load_update, indexing="xy")

flattened_update = {
    ComponentType.source: source_2d.reshape(-1, source_update.shape[1]),
    ComponentType.sym_load: load_2d.reshape(-1, load_update.shape[1]),
}
errors = validate_batch_data(input_data=input_data, update_data=flattened_update, symmetric=True)
print(errors_to_string(errors, details=False))

There is a validation error in the data, batch #0:
	Field 'status' is not a boolean (0 or 1) for 1 sym_load.
There is a validation error in the data, batch #1:
	Field 'status' is not a boolean (0 or 1) for 1 sym_load.
There are 2 validation errors in the data, batch #2:
   1. Field 'u_ref' is not greater than zero for 1 source.
   2. Field 'status' is not a boolean (0 or 1) for 1 sym_load.
There is a validation error in the data, batch #3:
	Field 'status' is not a boolean (0 or 1) for 1 sym_load.
There is a validation error in the data, batch #4:
	Field 'status' is not a boolean (0 or 1) for 1 sym_load.
There are 2 validation errors in the data, batch #5:
   1. Field 'u_ref' is not greater than zero for 1 source.
   2. Field 'status' is not a boolean (0 or 1) for 1 sym_load.
There is a validation error in the data, batch #6:
	Field 'status' is not a boolean (0 or 1) for 1 sym_load.
There is a validation error in the data, batch #7:
	Field 'status' is not a boolean (0 or 1) for 1 sym_lo