Skip to content

Defining New Schema Types

Jason Wolfe edited this page Jan 13, 2016 · 12 revisions

Note: as of schema 1.0, this guide is deprecated. Please check the updated version.

In this section, we will see the steps required to define new schemas.

Schemas are composable: you can write schemas that build on other schemas. Eventually all of these composite schemas bottom out in atomic schemas, schemas that do not depend on other schemas. Schemas can be recursive too, so you can write schemas that are defined in terms of themselves. Let's take a look at how to define atomic and composite schemas.

Atomic Schemas

An atomic schema is one that is defined without depending on any other schema.

As an example, let's take a look at EqSchema implemented in core.cljx. This schema is used to check whether some input data value x is equal to a given value v.

Let's see how this schema can be created and used for checking. For example, we can define a schema (eq "Schemas are cool!") that can be used to check whether a particular value is exactly equal to the string "Schemas are cool!". Let's first try with a positive example:

(require '[schema.core :as s])

(s/check (s/eq "Schemas are cool!") "Schemas are cool!") 
> nil 

The schema check succeeds (it returns nil) because the two strings are equal.

Now let's try a negative example:

(s/check (s/eq "Schemas are cool!") "Schemas are NOT cool!") 
> (not (= "Schemas are cool!" a-java.lang.String)) 

Here the schema check fails because the data value "Schemas are NOT cool!" does not match the value given when defining the EqSchema (namely "Schemas are cool!"). In this case, we see that the result is a validation error message explaining how the schema failed to validate.

Now that we have seen how EqSchema is used, let's see how it is implemented.

(defrecord EqSchema [v]
  Schema
  (walker [this]
    (fn [x]
      (if (= v x)
        x
        (macros/validation-error this x (list '= v (utils/value-name x))))))
  (explain [this] (list 'eq v)))

We see that EqSchema implements the walker and explain methods of the Schema protocol. The walker method returns a function that is used by check to validate data. This returned function takes a piece of data x as input and returns a value if we want the check to succeed, or a validation error if the check fails. (For validation checking, the returned value is typically just the input. When we do coercion or other fancier uses of the walker, we will see use cases where a different, transformed value is returned.)

In the case of the EqSchema, the heart of deciding what walker returns lies in the equality testing: (= v x). The validation error is constructed using the validation-error method in schema.macros.

The explain method of the Schema protocol allows you to specify how the schema should be rendered. The explain method is used to print the schema.

Atomic schemas are small and simple. Their power really comes from composing them into bigger, more complex schemas.

Composite Schemas

A composite schema is one that is defined in terms of other schemas.

As an example, let's look at Both implemented in core.cljx. This schema is defined in terms of zero or more subschemas. Validation of the Both schema succeeds if the validation for each of the subschemas succeeds. Note that these subschemas can be arbitrary schemas such as the EqSchema above or even other instances of Both.

As an example usage, let's look at a schema that checks whether a number is both positive and even:

(def EvenPos (s/both (s/pred even? 'even?) (s/pred pos? 'pos?)))

This is a composite schema that is defined in terms of two instances of the Predicate schema.

As an aside, a Predicate schema is built from a predicate function that returns true when we want the schema check to succeed, and false if the check should fail. Here we're using clojure's built-in even? and pos? predicate functions to define two Predicate schemas. The second arg to s/pred is an optional name for the predicate, which is used to make validation errors and the explain nicer to read.

We can now use the constructed both schema to validate numbers.

The number 4 is both even and positive, so the following check succeeds:

(s/check EvenPos 4)
> nil

On the other hand, the number 3 is not even, so it fails the first predicate and so the Both schema check fails:

(s/check EvenPos 3)
> (not (even? 3)) ;; aside: here we see the predicate's name 'even?' rendered in the explain

To validate a composite schema (such as Both), we need to perform some processing on its constituent subschemas (in our example, these are the two Predicates) and then combine the results of that processing in a meaningful way. In the case of Both, combining the results of processing the constituent schemas essentially amounts to an "and" that all the subschemas validate. In general, each constituent subschema might itself be a composite schema, and therefore validation might require recursively traversing these schemas. This recursive traversal is handled by the walker and subschema-walker methods.

As an example, let's see how the walker method is implemented in terms of the subschema-walker method to define the behavior of Both.

(defrecord Both [schemas]
  Schema
  (walker [this]
    (let [sub-walkers (mapv subschema-walker schemas)]
      (fn [x]
        (reduce
         (fn [x sub-walker]
           (if (utils/error? x)
             x
             (sub-walker x)))
         x
         sub-walkers))))
  (explain [this] (cons 'both (map explain schemas))))

Here we see the implementation of Both, which is defined in terms of schemas, a seq of the constituent subschemas. In our example, schemas is a seq of length 2 containing the even and positive pred schemas.

Recall that to validate a schema, walker returns a function that takes in data x to validate, and, if x matches the schema, it is returned, otherwise an error is returned. In the case of Both, x must be matched against all of the constituent subschemas, which is done by evaluating the functions created by each of their respective walker methods against the input data x. If any of these functions return an error, then validating the Both schema for the input x should also return an error.

The implementation of Both first gets the functions returned by the walker method of the subschemas by calling subschema-walker on each element of the schemas seq. These returned functions are bound to sub-walkers outside of the returned function. Then Both reduces over these sub-walkers, applying each to x and returns the first error, or x if no error is found.

Caveats of composite schemas

There are a couple of caveats to keep in mind when defining a composite schema.

Define walker in terms of subschema-walker

First, rather than calling walker directly on the subschemas, we use subschema-walker, which eventually calls through to their walker methods. This extra layer of indirection gives us a powerful hook to introduce transformations to apply to the data. When schemas are used only to validate data, there is no transformation applied, and so calling subschema-walker on the subschemas is equivalent to calling walker. However, to support fancier uses of schemas, such as coercion or recursive schemas, it is necessary to call subschema-walker instead of walker. For now, the important take away is that whenever you want to walk a subschema, you need to do it using the subschema-walker method.

Eagerly bind the subschema-walker

The second caveat is that the subschema walkers are bound outside of the returned function. That is to say the (let [sub-walkers (mapv subschema-walker schemas)] ...) happens outside of the returned (fn [x] ...). This might seem like a trivial detail, on the surface the functionality might seem equivalent, but it's not. Binding the subschema walkers eagerly, outside of the returned function is necessary for performant and correct schema implementations, so we've tried to ensure that incorrect implementations produce informative error messages.

Concretely, let's see what happens if when incorrectly implement Both by binding the subschema walkers inside of the returned function. We'll call our broken implementation BrokenBoth:

(defrecord BrokenBoth [schemas]
  Schema
  (walker [this]
    (fn [x]
      (let [sub-walkers (mapv subschema-walker schemas)]
        (reduce
         (fn [x sub-walker]
           (if (utils/error? x)
             x
             (sub-walker x)))
         x
         sub-walkers)))))

(defn broken-both [& schemas] (BrokenBoth. schemas))

Here we have essentially copied the implementation of Both, but have swapped the order of (let [sub-walkers (mapv subschema-walker schemas)] ...) and (fn [x] ...). Now, if we try using our broken implementation to validate a value, we get an error:

(s/check (broken-both (s/pred even? 'even) (s/pred pos? 'positive)) 4)
> RuntimeException Walking is unsupported outside of start-walker; all composite schemas must eagerly bind subschema-walkers outside the returned walker.

Therefore, the second important take away is to always get the subwalkers outside of the returned function.