Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 130 additions & 21 deletions docs/05-dataclasses.md
Original file line number Diff line number Diff line change
Expand Up @@ -639,31 +639,34 @@ class SubClass(BaseB, BaseA):

## Post-validation

Post-validation is everything that happens **after** all fields have been validated individually, but **before** the `DataclassValidator`
returns the dataclass object.
Post-validation is everything that happens **after** all fields have been validated individually, but **before** the
`DataclassValidator` returns the dataclass object.

For example, you could implement additional validation criteria that depend on the values of multiple fields of the dataclass. One common
use case for this are fields that are optional by default, but can be **required** under certain conditions, e.g. fields A and B are not
required, but if one of them exists, the other must exist too (or conversely, only one of the fields is allowed to be set at the same
time).
For example, you could implement additional validation criteria that depend on the values of multiple fields of the
dataclass. One common use case for this are fields that are optional by default, but can be **required** under certain
conditions, e.g. fields A and B are not required, but if one of them exists, the other must exist too (or conversely,
only one of the fields is allowed to be set at the same time).

Another use case are "integrity constraints". Imagine you have a dataclass with two datetime fields `begin_time` and `end_time` that
specify the start and end of a time interval. You might want to ensure that `begin_time <= end_time` is always true, because a time
interval cannot end before it starts. This cannot be done using the field validators alone, so you need to do this integrity check
**post validation**.
Another use case are "integrity constraints". Imagine you have a dataclass with two datetime fields `begin_time` and
`end_time` that specify the start and end of a time interval. You might want to ensure that `begin_time <= end_time` is
always true, because a time interval cannot end before it starts. This cannot be done using the field validators alone,
so you need to do this integrity check **post validation**.

Of course, you can do any sort of post-validation on a validated object after it was returned by the `DataclassValidator`. But a more
elegant way is to **integrate** the post-validation logic into the validator and/or dataclass itself.
Of course, you can do any sort of post-validation on a validated object after it was returned by the `DataclassValidator`.
But a more elegant way is to **integrate** the post-validation logic into the `DataclassValidator` and dataclass itself.

There are two ways to do this: One way is to **subclass** the `DataclassValidator` for your dataclass and override the `post_validate()`
method (which by default doesn't do anything with the object). Another way is to use the `__post_init__()` special method of dataclasses.
We will only demonstrate the latter for now, because it's a bit more easy (doesn't require subclassing the validator).
There are two ways to implement post-validation in a validataclass: First, there is the `__post_init__()` special method
which is automatically called as part of the `__init__()` method of a dataclass. It is a feature of regular dataclasses,
so this post-validation is also applied when instantiating the dataclass without using validataclass.

Dataclasses can have the special method `__post_init__()` which will be automatically called after the `__init__()` special method.
It does not receive any arguments (unless so called `InitVar` fields are used, which are currently unsupported by this library), but it
can access the objects fields as usual with `self.field_name`.
The other way is to implement the `__post_validate__()` method. This method is called by the `DataclassValidator` right
after creating the object. It is a feature of validataclass, so it is **not** called when instantiating the dataclass
manually (although you can of course just call `obj.__post_validate__()` manually as well).

Let's see how this can be done for the datetime example from above:
The `__post_validate__()` method additionally supports another special feature: **context-sensitive validation**, which
will be discussed shortly.

This example implements the datetime post-validation example from above:

```python
from datetime import datetime
Expand All @@ -678,7 +681,8 @@ class ExampleClass:
begin_time: datetime = DateTimeValidator(DateTimeFormat.REQUIRE_UTC)
end_time: datetime = DateTimeValidator(DateTimeFormat.REQUIRE_UTC)

def __post_init__(self):
# Note: In this case, __post_init__() would look exactly the same.
def __post_validate__(self):
# Ensure that begin_time is always before end_time
if self.begin_time > self.end_time:
raise ValidationError(
Expand Down Expand Up @@ -740,7 +744,7 @@ class ExampleClass:
# This field is required only if enable_something is True. Otherwise it will be ignored.
some_value: Optional[int] = IntegerValidator(), Default(None)

def __post_init__(self):
def __post_validate__(self):
# If enable_something is True, ensure that some_value is set!
if self.enable_something is True and self.some_value is None:
raise DataclassPostValidationError(field_errors={
Expand Down Expand Up @@ -773,6 +777,111 @@ The `DataclassPostValidationError` from this example will look like this after c
```


### Context-sensitive post-validation

As mentioned earlier, the `__post_validate__()` method supports a nice feature called **context-sensitive validation**.

In general, this means that the validation can depend on the **context** it is used in. Usually, the output of a
validator is always determined by a) the options set at the time the validator was created and b) the input value.
Context-sensitive validation means that you pass additional parameters to the validator at runtime, i.e. at the time
the `validate()` method is called to validate a piece of input.

These so called **context arguments** are passed to the `validate()` call as arbitrary keyword arguments. Whether and
how the validator actually uses these arguments depends on the implementation of the validator. Most validators don't
do anything with it except for passing it to sub-validators (e.g. the `ListValidator` passes the context arguments to
the specified item validator).

The `DataclassValidator` supports these context arguments and uses them in two ways: First, it passes them as they are
to any field validator (which might pass them to other validators as well). Second, it also passes them to the
`__post_validate__()` method of the dataclass.

However, for this to work, the method MUST accept arbitrary keyword arguments, i.e. it needs to be declared with a
`**kwargs` parameter (the parameter name doesn't matter). You can of course declare specific keyword arguments that you
want to use for post-validation (make sure to define them as optional!), but you still need to accept any other keyword
argument as well, otherwise the context arguments will not be passed to the method at all.

Example:

```python
from typing import Optional

from validataclass.dataclasses import validataclass, Default
from validataclass.exceptions import RequiredValueError, DataclassPostValidationError
from validataclass.validators import DataclassValidator, BooleanValidator, IntegerValidator

@validataclass
class ContextSensitiveExampleClass:
# This field is optional, unless the context says otherwise.
some_value: Optional[int] = IntegerValidator(), Default(None)

# Note: Prefix the kwargs parameter with an underscore to avoid "unused parameter" notices.
def __post_validate__(self, *, require_some_value: bool = False, **_kwargs):
# If require_some_value was set at validation time, ensure that some_value is set!
if require_some_value and self.some_value is None:
raise DataclassPostValidationError(field_errors={
'some_value': RequiredValueError(reason='Must be set in this context.'),
})

# Create a validator for this dataclass
validator = DataclassValidator(ContextSensitiveExampleClass)

# Without context arguments: The field is optional.
validator.validate({}) # -> ContextSensitiveExampleClass(some_value=None)
validator.validate({"some_value": 42}) # -> ContextSensitiveExampleClass(some_value=42)

# With the context argument "require_some_value" set: The field is now required!
validator.validate({}, require_some_value=True) # will raise a DataclassPostValidationError
validator.validate({"some_value": 42}, require_some_value=True) # -> ContextSensitiveExampleClass(some_value=42)
```

**One important note about the `validate()` method:**

For backwards compatibility, `Validator` classes currently are **not** required to accept arbitrary keyword arguments.
Custom validators that were created before this feature was implemented (version 0.7.0) will not support this, so
calling their `validate()` method with keyword arguments will raise an error.

To avoid this, there is a helper method that wraps the `validate()` call: `Validator.validate_with_context()` will check
whether the validator class supports context arguments, then call the `validate()` method either with or without them.

In cases where you don't know whether your validator class already supports context arguments (especially when writing
generic code that can use arbitrary validators), you should therefore use the `validate_with_context()` method.

Example:

```python
from validataclass.validators import Validator

validator: Validator = ... # This can be any validator class
input_data = ...

validated_data = validator.validate_with_context(input_data, my_context_var=42)
```

This method will become obsolete and eventually removed in the future (possibly in version 1.0.0), when every validator
class will be required to support context arguments.

**Therefore, you should upgrade your custom validator classes to support context arguments, and also to pass them to
any underlying base validator.**

To do this, simply add a `**kwargs` argument to your `validate()` call. For example:

```python
from typing import Any
from validataclass.validators import StringValidator

class UppercaseStringValidator(StringValidator):
# BEFORE:
# def validate(self, input_data: Any) -> str:
# validated_str = super().validate(input_data)
# return validated_str.upper()

# AFTER:
def validate(self, input_data: Any, **kwargs) -> str:
validated_str = super().validate(input_data, **kwargs)
return validated_str.upper()
```


### Post-initialization variables

Another thing you can do at post-validation time is setting "post-initialization fields". These are fields in a dataclass that are
Expand Down
2 changes: 1 addition & 1 deletion src/validataclass/validators/any_of_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def __init__(self, allowed_values: List[Any], *, allowed_types: Union[type, List
self.allowed_values = allowed_values
self.allowed_types = allowed_types

def validate(self, input_data: Any) -> Any:
def validate(self, input_data: Any, **kwargs) -> Any:
"""
Validate that input is in the list of allowed values. Returns the value unmodified.
"""
Expand Down
2 changes: 1 addition & 1 deletion src/validataclass/validators/anything_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def _normalize_allowed_types(

return list(allowed_types)

def validate(self, input_data: Any) -> Any:
def validate(self, input_data: Any, **kwargs) -> Any:
"""
Validate input data. Accepts anything (or only specific types) and returns data unmodified.
"""
Expand Down
2 changes: 1 addition & 1 deletion src/validataclass/validators/boolean_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def __init__(self, *, allow_strings: bool = False):
"""
self.allow_strings = allow_strings

def validate(self, input_data: Any) -> bool:
def validate(self, input_data: Any, **kwargs) -> bool:
"""
Validate type of input data. Returns a boolean.
"""
Expand Down
76 changes: 61 additions & 15 deletions src/validataclass/validators/dataclass_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"""

import dataclasses
import inspect
from typing import Any, Optional, TypeVar, Generic, Dict

from validataclass.dataclasses import Default, NoDefault
Expand Down Expand Up @@ -62,10 +63,41 @@ class ExampleDataclass:

All fields that do NOT specify a default value (or explicitly use the special value `NoDefault`) are required.

Post-validation checks can be implemented either as a `__post_init__()` method in the dataclass or by subclassing
DataclassValidator and overriding the `post_validate()` method. In both cases, you can either raise
`DataclassPostValidationError` exceptions directly or raise normal `ValidationError` exceptions, which will be
wrapped inside a `DataclassPostValidationError` automatically.
Post-validation checks can be implemented in the dataclass either using the `__post_init__()` special method (which
is part of regular dataclasses and thus also works without validataclass) or using a `__post_validate__()` method
(which is called by the DataclassValidator after creating the object). The latter also supports *context-sensitive*
validation, which means you can pass extra arguments to the `validate()` call that will be passed both to all field
validators and to the `__post_validate__()` method (as long as it is defined with a `**kwargs` argument).

In post-validation you can either raise regular `ValidationError` exceptions, which will be automatically wrapped
inside a `DataclassPostValidationError` exception, or raise such an exception directly (in which case you can
also specify errors for individual fields, which provides more precise errors to the user).

Here is an example for such a `__post_validate__()` method that also happens to be context-sensitive:

```
@validataclass
class ExampleDataclass:
optional_field: str = StringValidator(), Default('')

# Note: The method MUST accept arbitrary keyword arguments (**kwargs), not just the parameter you defined,
# otherwise no context arguments will be passed to it at all. To avoid "unused parameter" notices, you can
# prepend the variable name with an underscore.
def __post_validate__(self, *, require_optional_field: bool = False, **_kwargs):
if require_optional_field and not self.optional_field:
raise DataclassPostValidationError(field_errors={
'value': RequiredValueError(reason='The optional field is required for some reason.'),
})
```

In this example, the field "optional_field" is usually optional, but there are cases where you need the field to be
set, which is only determined at runtime, i.e. when calling the validate() method of DataclassValidator. For this
you can now set the context argument `require_optional_field` (as defined in the `__post_validate__` method):

```
validator = DataclassValidator(ExampleDataclass)
obj = validator.validate(input_data, require_optional_field=True)
```
"""

# Dataclass type that the validated dictionary will be converted to
Expand Down Expand Up @@ -144,22 +176,31 @@ def _get_field_default(field: dataclasses.Field) -> Default:
raise DataclassValidatorFieldException(f'Default specified for dataclass field "{field.name}" is not of type "Default".')
return default

def validate(self, input_data: Any) -> T_Dataclass:
def _pre_validate(self, input_data: Any, **kwargs) -> dict:
"""
Validate an input dictionary according to the specified dataclass. Returns an instance of the dataclass.
Pre-validation steps: Validates the input as a dictionary and fills in the default values.
"""
# Validate raw dictionary using underlying DictValidator
validated_dict = super().validate(input_data)
validated_dict = super().validate(input_data, **kwargs)

# Fill optional fields with default values
for field_name, field_default in self.field_defaults.items():
if field_name not in validated_dict:
validated_dict[field_name] = field_default.get_value()

# Try to create dataclass object from validated dictionary and catch exceptions that may be raised by a __post_init__() method
return validated_dict

def validate(self, input_data: Any, **kwargs) -> T_Dataclass:
"""
Validate an input dictionary according to the specified dataclass. Returns an instance of the dataclass.
"""
# Pre-validate the raw dictionary and fill in default values
validated_dict = self._pre_validate(input_data, **kwargs)

# Try to create dataclass object from validated dictionary and catch exceptions that may be raised in post-validation
try:
validated_object = self.dataclass_cls(**validated_dict)
return self.post_validate(validated_object)
return self._post_validate(validated_object, **kwargs)
except DataclassPostValidationError as error:
# Error already has correct exception type, just reraise
raise error
Expand All @@ -168,12 +209,17 @@ def validate(self, input_data: Any) -> T_Dataclass:
raise DataclassPostValidationError(error=error)
# Ignore all non-ValidationError exceptions (these are either errors in the code or should be handled properly by the user)

# noinspection PyMethodMayBeStatic
def post_validate(self, validated_object: T_Dataclass) -> T_Dataclass:
@staticmethod
def _post_validate(validated_object: T_Dataclass, **kwargs) -> T_Dataclass:
"""
Run post-validation checks on the validated dataclass instance. Returns the dataclass instance.

This method does nothing, but can be overridden by subclasses to implement user-defined checks (and optionally modify the
instance). Exceptions raised in this method will be caught in `validate()` and handled as DataclassPostValidationErrors.
Post-validation steps: Calls the `__post_validate__()` method on the dataclass object (if it is defined).
"""
# Post validation using the custom __post_validate__() method in the dataclass (if defined)
if hasattr(validated_object, '__post_validate__'):
# Only pass context arguments if __post_validate__() accepts them
if inspect.getfullargspec(validated_object.__post_validate__).varkw is not None:
validated_object.__post_validate__(**kwargs)
else:
validated_object.__post_validate__()

return validated_object
4 changes: 2 additions & 2 deletions src/validataclass/validators/date_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,12 @@ def __init__(self):
# Initialize StringValidator without any parameters
super().__init__()

def validate(self, input_data: Any) -> date:
def validate(self, input_data: Any, **kwargs) -> date:
"""
Validate input as a valid date string and convert it to a `datetime.date` object.
"""
# First, validate input data as string
date_string = super().validate(input_data)
date_string = super().validate(input_data, **kwargs)

# Try to create date object from string (only accepts "YYYY-MM-DD")
try:
Expand Down
4 changes: 2 additions & 2 deletions src/validataclass/validators/datetime_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,12 +225,12 @@ def __init__(
# Precompile regular expression for datetime format
self.datetime_format_regex = re.compile(self.datetime_format.regex_str)

def validate(self, input_data: Any) -> datetime:
def validate(self, input_data: Any, **kwargs) -> datetime:
"""
Validate input as a valid datetime string and convert it to a `datetime.datetime` object.
"""
# First, validate input data as string
datetime_string = super().validate(input_data)
datetime_string = super().validate(input_data, **kwargs)

# Validate string format with a regular expression
if not self.datetime_format_regex.fullmatch(datetime_string):
Expand Down
Loading