Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Try ajv instead of is-my-json-valid for JSON Schema validation #20145

Closed
wants to merge 4 commits into from

Conversation

deBhal
Copy link
Contributor

@deBhal deBhal commented Nov 23, 2017

What's wrong with is-my-json-valid?

is-my-json-valid is excellent in terms of pass/fail, but it's pretty terse with it's results. In particular, it would be super nice if we could reliably get back to the rule that failed, because that way we could add any meta-data we might need.

I ran into this adding a rule to the .fr domain contact validation schema that says "if the registrantType is organization, then the organization field cannot be empty". That's no problem to validate with JSON schema, but we need to associate a failed result with the right field on the form.

Up until now, all the validation I've done has been simple enough to fit into a properties that matches the data itself so it's been easy enough to put messages in the right place (I did have to rearrange rules a couple of times to disambiguate them, but nothing worth ripping out the library yet).

However, once a rule involves more than one property it's intrinsically ambiguous which field is the wrong one (maybe both). In the .fr organization case, I had to add logic in the validation result handling that says roughly "if it doesn't have a field, associate it with the organization field". That works, but only once, and it's an ugly hack.

What I want to be able to do is specify in the rule itself which field it belongs to. If we can get back to the rule from the failure, this is as simple as adding another property to the rule itself, like field: organization, and taking that to mean "If this fails, show the error next to the organization field". ( We're already using this approach on the back end to facilitate i18n r165250-wpcom ).

The awesome thing is that we can use that same trick for any extra data we turn out to need. We could add the user visible error message to the rule, or we could add a severity. We could even decide that we'd like a sort of inheritance for some properties, and search up the schema's tree for them.

But we can't do that with is-my-json-valid, because it doesn't tell us clearly where the problem occurred. There are several related issues (mafintosh/is-my-json-valid#38, mafintosh/is-my-json-valid#39, mafintosh/is-my-json-valid#22).

I've put together a runnable example of the problem:

import Ajv from 'ajv';
import draft04 from 'ajv/lib/refs/json-schema-draft-04.json';
const ajv = new Ajv();
ajv.addMetaSchema( draft04 );

import IMJV from 'is-my-json-valid';

const ravenSchema =
	{ "not": {
			"properties": {
				"animal": { "enum": [ "raven" ] },
				"color": {
					"not": {
						"enum": [ "black" ] } } } } };
const doveSchema =
	{	"whatever we want": "here it is",
		"not": {
			"properties": {
				"animal": { "enum": [ "dove" ] },
				"color": {
					"not": {
						"enum": [ "white" ] } } } } };

const birdsSchema = {
	"id": "birds",
	"allOf": [
		ravenSchema,
		doveSchema,
	],
};

const badRaven = { animal: 'raven', color: 'rainbow' };
const badDove = { animal: 'dove', color: 'pink' };

const imjvValidate = IMJV( birdsSchema, { greedy: true, verbose: true } );
const ajvValidate = ajv.compile( birdsSchema );

console.log( ( imjvValidate( badRaven ), imjvValidate.errors ) );
// [ { field: 'data', message: 'negative schema matches' } ]
console.log( ( imjvValidate( badDove ), imjvValidate.errors ) );
// [ { field: 'data', message: 'negative schema matches' } ]

console.log( ( ajvValidate( badRaven ), ajvValidate.errors ) );
// [ { keyword: 'not',
//     dataPath: '',
//     schemaPath: '#/allOf/0/not',
//     params: {},
//     message: 'should NOT be valid' } ]

console.log( ( ajvValidate( badDove ), ajvValidate.errors ) );
// [ { keyword: 'not',
//     dataPath: '',
//     schemaPath: '#/allOf/1/not',
//     params: {},
//     message: 'should NOT be valid' } ]

import { dropRight, get, last } from 'lodash';
const failedRule = get(
	birdsSchema,
	dropRight( last( ajvValidate.errors ).schemaPath.slice( 2 ).split( '/' ) )
);
console.log( JSON.stringify( failedRule ) );
//  {"whatever we want":"here it is","not":{"properties":{"animal":{"enum":["dove"]},"color":{"not":{"enum":["white"]}}}}}

While this example is a little contrived, you can see that is-my-json-valid returns exactly the same result imjvValidate( badRaven ) and imjvValidate( badDove ), meaning we can't tell what failed.

ajv adds the criticial information we need in the schemaPath to tell the two failures apart as well as find that rule in the schema.

Why ajv?

  • Most correct
  • super fast
  • great docs
  • easy to get up and running
  • MIT Licence
  • Not too big ( 116k minimized )
  • 29M downloads last month (!)
  • https://github.com/epoberezkin/ajv-i18n (we won't even use it, but I'm stoked it exists ;) )

There are a lot of other options, but ajv seems to be a pretty clear standout.

djv has great stats too, but I found it a bit challenging to use. By default it returns a fairly uninformative string, and changing that involves passing in an error handling callback that seems to return a string to get evaluated in a context that contains the information you need, but the documentation on that aspect is not very helpful. I get the impression it would do what we need it to with some extra effort, but I don't see a compelling reason to follow the rougher path here.

https://github.com/bugventure/jsen has good stats and good looking docs, but it performs worse than is-my-json-valid on the benchmarks, so not as attractive, but if we're going to do some rigorous testing we should probably include it.

@matticbot
Copy link
Contributor

@deBhal deBhal force-pushed the try/ajv-json-schema-library branch 2 times, most recently from f88c8e1 to 15d17ee Compare November 24, 2017 02:37
@deBhal deBhal changed the title [WIP] Try ajv instead of is-my-json-valid for fr validation [WIP] Try ajv instead of is-my-json-valid for JSON Schema validation Nov 24, 2017
@ramonjd
Copy link
Member

ramonjd commented Nov 27, 2017

I think this a worthy idea to pursue, especially if it's adopted in a way that allows us to consistently validate our forms. Furthermore, longer term, if it helps to share strict validation rules across front and backends all the better. Async validation looks tasty.

JSON-Schema has some nice approaches to dependencies as well, and seems pretty flexible with regards to conditional validation.

Do you see a 'per object/field' or 'per collection/form' schema as making more sense as a broad strategy?

What I mean is, when building a form, do you think it makes more sense to run tests against a collection of individual schemata (such as organization, name, email, favoritePokemonCharacter and so on), or something like a domainContactDetailsSchema object?

Thanks for this!

Copy link
Contributor

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable change to investigate. I see maybe five uses of is-my-json-valid outside of test code. Could you spare the cycles to update those and see how the overall total build sizes change?

In Slack we discussed that this ended up as around 30 KB gzip'd. If we can eliminate is-my-json-valid in the progress and cut down 15 KB or more or so of that size then I would imagine there would be very little opposition here.

I don't mind helping out with the data layer stuff because I don't think any code is yet relying on the failure messages from the validation. Pretty soon we'll have more momentum though as more people do.

Nice update.

cc: @blowery @samouri @aduth thoughts on this?

@deBhal
Copy link
Contributor Author

deBhal commented Nov 29, 2017

I'll come back and write a bit more in the morning, but in the meantime I'll just dump what I've got:

I swapped ajv in for is-my-json-valid, fixed the obvious problems, and only broke 21 tests out of 10k! :D
calypso_ -bash _115x91

I ran the bundle analyzer (just npm run update-deps; npm run preanalyze-bundles; npm run analyze-bundles), and it looks like ajv is going to add ~20.7k zipped to the overall bundle. I'm not sure why my IMJV build isn't showing the gzipped values (maybe because I generated it in a freshly cloned copy?), but the other numbers match up.

The analyzer does a pretty good job of showing how big that actually is - the second biggest thing in node_modules in the client:
ajv

For comparison, here's is-my-json-valid (it's so tiny!):
imvj

I think that's big enough that I'll have another look at adding what we need to IMJV. My first impressions were that it's pretty complicated code, though, and I don't have a big chunk of time I can allocate to it.

@dmsnell
Copy link
Contributor

dmsnell commented Nov 29, 2017

I'm curious what that dotjs is doing there; it's a sizable chunk of ajs and I wonder if it's even needed. You think we could eliminate it somehow?

@akirk
Copy link
Member

akirk commented Nov 30, 2017

I had a quick look and it turns out thatdot is just a devDependecy but the dotjs folder actually contains compiled versions of the files in the dot folder and they seem to make up the core of the rule processing: https://github.com/epoberezkin/ajv/blob/master/lib/compile/_rules.js

@deBhal
Copy link
Contributor Author

deBhal commented Nov 30, 2017

Do you see a 'per object/field' or 'per collection/form' schema as making more sense as a broad strategy?

Good question! I don't know :)

Some things are simple - email is actually already a "format", which is how you do anything that's too hard for a regex (understanding-json-schema#format), so we'd need to add a custom format if we wanted to do something like luhn-validate a credit card number.

Otherwise, we've got a lot of flexibility. With the way JSON Schema and Redux work, I suspect we'd get a better result if we focused on the state/data rather than form fields. The component already has to map the state to the fields, which seems like the same task/responsibility as mapping errors to fields.

There's room to do some really clever things, but in the short term, we're probably going to start with validation that cover the same data as some component and be fast enough run in the render() function, so we can start there and then identify/dedupe worthwhile abstractions based on actual use.

I don't mind helping out with the data layer stuff because I don't think any code is yet relying on the failure messages from the validation. Pretty soon we'll have more momentum though as more people do.

Thanks! I totally hacked through there, and made guesses about performance and side effects that definitely need checking if we end up going down this path.

As an aside, ajv complained that a number of the schemas themselves are invalid (e.g. because title is a keyword and we're not using it properly). Probably not a problem, unless it turns out to be :)

@deBhal
Copy link
Contributor Author

deBhal commented Dec 1, 2017

I dug into is-my-json-validtoday, and after the initial hump, it's way easier than it looks and I got the schemaPath I want working for my current use cases: mafintosh/is-my-json-valid#148

It needs tests before it's ready to land, but it addresses several open issues and should have almost no impact on runtime validation performance, so with tests in place I think it will be an attractive PR for them.

If they're slow to merge, we should move my fork (https://github.com/deBhal/is-my-json-valid/tree/add/schemaPath) to github.com/Automattic - I believe we're already forking a couple of repos like that already.

The IMJV code is actually nice and easy to modify. I think I can see how to add the keyword, or get the errors out of references if we want that. I actually added the schema node itself to the error with a one-line change in the code. It ended up dumping potentially a lot of stringified schemas into the generated code, though, so I pulled it out, but now that I'm thinking about it a bit more, I can see a way to do it much more cleanly by adding the schema to the validation function and using these paths I've just added.

As an aside, I noticed that the validate() function IMJV generates is self contained and available with a simple toString(), so it would be a very simple webpack plugin that takes the schema, compiles it and adds that pre-compiled validator to the build if we want to do that. The function for my example bird schema is about 4 times the size as the schema, though, so the time saved precompiling would need to make a real difference.

I'm going to go ahead and close this PR in favour of that one, but I'm happy to come back here if there's demand.

@deBhal deBhal closed this Dec 1, 2017
@alisterscott alisterscott deleted the try/ajv-json-schema-library branch March 6, 2018 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants