# Using Schemas in Kosh

This notebook shows how to use schema in Kosh to validate your metadata


In [1]:
import kosh
import os

kosh_example_sql_file = "kosh_schemas_example.sql"

# Create and open a new store (erase if exists)
store = kosh.connect(kosh_example_sql_file, delete_all_contents=True)
# create a dataset
dataset = store.create()

Let's create a schema to validate our metadata
a schema object takes two dictionaries as input
one for the required attributes and one for the optional attributes

For each attributes we need to provide validation functions or valid values
 - If the "validation" is a callable it will be applied on values of the attribute and must pass and return True
 - If the validation is an instance of 'type' the attribute must be an instance of the validation type
 - Otherwise the value must match "validation"
 
 It is possible though to have multiple possible validations for a single attribute, simply define them in the dictionary as a list, if any validation passes the attribute is considered valid
 

Let's create a validation schema that requires our datasets to have the attribute "must" with any value and allow for an attribute 'maybe' that must be one of 1, "yes" or True

In [2]:
required = {"must": None}
optional = {"maybe": [1, "yes"]}
schema = kosh.KoshSchema(required, optional)

Our current (blank) dataset will not validate, we can first try it as follow:

In [3]:
try:
    schema.validate(dataset)
except ValueError as err:
    print("As expected, we failed to validate with error:", err)

As expected, we failed to validate with error: Could not validate 9e0152167054428692bb87486de0f891
1 required attribute errors: {'must': AttributeError('Object 9e0152167054428692bb87486de0f891 does not have must attribute')}
0 optional attributes errors: {}


In [4]:
# Let's add the attribute 
dataset.must = "I have must"
# Validation now passes
schema.validate(dataset)

True

Now let's have must as an integer

In [5]:
required = {"must": int}
optional = {"maybe": [1, "yes"]}
schema = kosh.KoshSchema(required, optional)
# it does not validate anymore
try:
    schema.validate(dataset)
except ValueError as err:
    print("As expected, it now fails to validate with error:", err)

As expected, it now fails to validate with error: Could not validate 9e0152167054428692bb87486de0f891
1 required attribute errors: {'must': ValueError('value I have must failed validation')}
0 optional attributes errors: {}


In [6]:
dataset

KOSH DATASET
	id: 9e0152167054428692bb87486de0f891
	name: Unnamed Dataset
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	must: I have must
	name: Unnamed Dataset
--- Associated Data (0)---
--- Ensembles (0)---
	[]
--- Ensemble Attributes ---


In [7]:
# Let's fix this
dataset.must = 5
# It now validates
schema.validate(dataset)

True

In [8]:
# Note that any extra attribute is ok but will not be checked for validation
dataset.any = "hi"
schema.validate(dataset)

True

In [9]:
# We can now enforce this schema subsequently
dataset.schema = schema

In [10]:
# Now we cannot set `must` to a bad value
try:
    dataset.must = 7.6
except ValueError as err:
    print("Failed to set attribute as it did not validate (must be int). Error:", err)

Failed to set attribute as it did not validate (must be int). Error: value 7.6 failed validation


In [11]:
# Still at 5
dataset.must

5

Note that when setting the schema attribute all attributes of the dataset will be checked

In [12]:
dataset2 = store.create()
dataset2.must = 7.6
try:
  dataset2.schema = schema
except: 
    pass

In [13]:
# Similarly optional attribute must validate
try:
    dataset.maybe = "b"
except ValueError as err:
    print("Optional attributes must validate as well. Error:", err)

Optional attributes must validate as well. Error: Could not validate value 'b'


In [14]:
dataset.maybe = "yes"
dataset.maybe = 1

Now sometimes we need more complex validation let's create a simple validation function

In [15]:
def isYes(value):
    if isinstance(value, str):
        return value.lower()[0] == "y"
    elif isinstance(value, int):
        return value == 1
    
required = {"must": int}
optional = {"maybe": isYes}
schema = kosh.KoshSchema(required, optional)

dataset.schema = schema
dataset.maybe = "y"

we can also pass list of possible validations


In [16]:
def isNo(value):
    if isinstance(value, str):
        return value.lower()[0] == "n"
    elif isinstance(value, int):
        return value == 0
    
required = {"must": int}
optional = {"maybe": [isYes, isNo, "oui"]}
schema = kosh.KoshSchema(required, optional)

dataset.schema = schema
dataset.maybe = "N"
dataset.maybe = 'No'
dataset.maybe = 'oui'
dataset.maybe = 'Yes'