Skip to content
Laurence Rowe edited this page May 10, 2013 · 14 revisions

Welcome to the Seep wiki!

  1. bug Julian in a friendly manner
  2. think about other possible keywords or behavior and spec it out here
  3. become familiar with jsonschema, especially the validators
  4. start writing unit tests

purpose

  1. Allow for optional data coercion (i.e. tranformation/conversion) of properties (and their names) during the validation process.
  2. distinguish between validation from trusted sources (in which case we would like the automated convenience of just filling in missing values with defaults) or untrusted source (if values are missing, it fails to validate)

SS: I guess 'serialization' is the term colander uses to describe the movement of data from server to client, but I find this term a little misleading (and unintuitive) in this context

LR: Inbound / outbound schemas?

SS: I would take #2 a step further and suggest that you could generalize the 'serialization' process and have the schema not merely fill in defaults but transform the data further:

  1. fills in defaults where values are missing
  2. rename/coerce fields/values
  3. remove/omit fields (say you have a mongo row and want to expose most but not all fields)

to this end, there is really no distinction between the serialization or deserialization process. They are both transformations with rules that are enabled or disabled depending on what you're trying to do.

LR: remove/omit could be a 'relaxed' validator mode which drops disallowed additional properties instead of recording the validation error.

brainstorming

Julian's thoughts

deserializer = SeepDeserializer(schema, coersions={"integer" : int}).deserialize(instance)

Shaun thoughts

a possible test case: a property that could be passed in many forms

start_time = "2007-03-01T13:00:00Z"	# a string with an ISO date
start_time = "yesterday"		# a string with some natural language
start_time = 10				# a number representing 10 o'clock today
start_time = -2				# a number representing 2 hours ago
start_time = "-2m"			# s string using some magic syntax denoting 2 minutes ago

some schema

event =  {
	"type" : "object",
	"additionalProperties" : False,
	"properties" : {
		"start_time" : { "type" : "string" },
	},
	"required" : [ 'start' ]
}

we will want to synthesize the property into some internal form, presumably a python datetime. The code that validates the input probably does nearly as much work as the code required to convert the input to an internal representation.

The question is, if we know we're going to be coercing the data into the new form, should we bother running through an initial 'validation' step first, or is it better to do so in one pass (e.g. a coersion function attempts to convert the data to the new representation and if it fails, also fails validation)

def mydatetime_coersion_function(input):
	if success:
		return new_representation_of_input
	raise ValidationError

# some possibilities:
"scheduled_start" : {"type" : mydatetime_coersion_function }
"scheduled_start" : {"type" : "mydatetime" }  # "mydatetime" and mydatetime_coersion_function is pre-registered with the validator/serializer (see Julian's SeepSerializer above)

Laurence's thoughts

The distinction between serialization and deserialization is really about trying to cram two similar schemas into document. In my experience schemas also vary across the 'user role' access, with admin users having more options than standard users having more options than anonymous users.

The spec states: A JSON Schema MAY contain properties which are not schema keywords.. It seems reasonable to add custom keywords that might be used to tweak / filter down a schema for the particular purpose to which it is to be applied.

other randomness

a simple wrapper I found useful

task = {
	"type" : "object",
	"additionalProperties" : False,
	"properties" : {
		"name" : { "type" : "string" },
		"id" : { "type" : "number" }
	},
	"required" : [ 'id' ]
}

assigned_task =  Schema(task).add({
	"scheduled_start" : {"type" : "string" },
})

completed_task = Schema(assigned_task).add({
	"actual_start" : {"type" : "string"},
	"actual_duration" : {"type" : "number"},
})


active_task(data)   # use __callable__ 
active_task.validate(data)