Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(jsonld): add structured data validation #6750

Merged
merged 33 commits into from Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
5f02a63
core(jsonld): add structured data validation
patrickhulce Dec 8, 2018
2883d90
yarn fixes
patrickhulce Dec 8, 2018
4d27cd0
condense generated files to one line
patrickhulce Dec 8, 2018
e9daf3d
stash error
patrickhulce Dec 8, 2018
2371450
eslint ignores
patrickhulce Dec 10, 2018
179796a
update jsonld
patrickhulce Dec 11, 2018
a21b921
cleanup
patrickhulce Dec 12, 2018
6ac3092
Merge branch 'master' into structued_data_pkg
patrickhulce Jan 23, 2019
e8714ce
update jsonld to non-native
patrickhulce Jan 24, 2019
12ff45f
add extensions to require statements
patrickhulce Jan 24, 2019
6a981d9
keep ts happy
patrickhulce Jan 24, 2019
21ec416
eslint
patrickhulce Jan 24, 2019
75b7a51
update jsonlint-mod dep
patrickhulce Feb 11, 2019
eca612a
feedback pt 1
patrickhulce Feb 21, 2019
e345a5e
feedback pt 2
patrickhulce Feb 21, 2019
ca2443b
feedback pt3
patrickhulce Feb 21, 2019
5ae8632
feedback pt 4
patrickhulce Feb 21, 2019
a6d2aaa
prettify
patrickhulce Feb 21, 2019
965541d
add script for updating jsonldcontext
patrickhulce Feb 21, 2019
dddba0f
Update sd-validation/jsonld.js
davidlehn Feb 21, 2019
1fc3d13
more feedback
patrickhulce Mar 6, 2019
e4ba275
more types
patrickhulce Mar 6, 2019
d29678b
move tests
patrickhulce Mar 6, 2019
9b7abd3
comment
patrickhulce Mar 6, 2019
1bd1f9e
Merge branch 'master' into structued_data_pkg
patrickhulce Mar 12, 2019
a4200aa
Merge branch 'master' into structued_data_pkg
patrickhulce Mar 18, 2019
fedd5cb
revert viewerg
patrickhulce Mar 21, 2019
064c104
Merge branch 'master' into structued_data_pkg
patrickhulce Mar 26, 2019
4ebdd17
move folder to lib
patrickhulce Mar 28, 2019
df921da
feedback
patrickhulce Mar 28, 2019
ce535cf
forEach -> map
patrickhulce Mar 28, 2019
002a663
more feedback
patrickhulce Apr 2, 2019
70028c9
last feedback!
patrickhulce Apr 8, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 6 additions & 2 deletions package.json
Expand Up @@ -32,8 +32,9 @@
"unit-cli": "jest --runInBand \"lighthouse-cli/\"",
"unit-cli:ci": "jest --runInBand --coverage --ci \"lighthouse-cli/\"",
"unit-viewer": "mocha --reporter dot \"lighthouse-viewer/test/**/*-test.js\"",
"unit": "yarn unit-core && yarn unit-cli && yarn unit-viewer",
"unit:ci": "yarn unit-core:ci && yarn unit-cli:ci && yarn unit-viewer",
"unit-sd": "mocha --reporter dot \"sd-validation/test/**/*-test.js\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use jest instead of another use of mocha (which will need to be removed later) if it's not too hard to switch over

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"unit": "yarn unit-core && yarn unit-cli && yarn unit-viewer && yarn unit-sd",
"unit:ci": "yarn unit-core:ci && yarn unit-cli:ci && yarn unit-viewer && yarn unit-sd",
"core-unit": "yarn unit-core",
"cli-unit": "yarn unit-cli",
"viewer-unit": "yarn unit-viewer",
Expand Down Expand Up @@ -129,6 +130,7 @@
"prettier": "^1.14.3",
"pretty-json-stringify": "^0.0.2",
"puppeteer": "^1.10.0",
"request-promise-native": "^1.0.5",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docs for request-promise-native point to the v4 docs of request-promise, which mentions needing request as a peer dependency? Are we including through something else and we should make it explicit?

"sinon": "^2.3.5",
"typescript": "3.2.2",
"uglify-es": "3.0.15",
Expand All @@ -149,6 +151,8 @@
"intl-messageformat-parser": "^1.4.0",
"jpeg-js": "0.1.2",
"js-library-detector": "^5.1.0",
"jsonld": "^1.5.0",
"jsonlint-mod": "^1.7.2",
"lighthouse-logger": "^1.2.0",
"lodash.isequal": "^4.5.0",
"lookup-closest-locale": "6.0.4",
Expand Down
1 change: 1 addition & 0 deletions sd-validation/assets/jsonldcontext.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions sd-validation/assets/schema-tree.json

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions sd-validation/expand.js
@@ -0,0 +1,56 @@
/**
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const {URL} = require('url');
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
const jsonld = require('jsonld');
// @ts-ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be ignored anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const schemaOrgContext = require('./assets/jsonldcontext.json');
const SCHEMA_ORG_HOST = 'schema.org';

/**
* Custom loader that prevents network calls and allows us to return local version of the
* schema.org document
* @param {string} schemaUrl
* @param {function(null|Error, Object|undefined):void} callback
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
*/
function documentLoader(schemaUrl, callback) {
let urlObj = null;

try {
urlObj = new URL(schemaUrl, 'http://example.com');
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
} catch (e) {
return callback(Error('Error parsing URL: ' + schemaUrl), undefined);
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
}

if (urlObj && urlObj.host === SCHEMA_ORG_HOST && urlObj.pathname === '/') {
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
callback(null, {
document: schemaOrgContext,
});
} else {
// We only process schema.org, for other schemas we return an empty object
callback(null, {
document: {},
});
}
}

/**
* Takes JSON-LD object and normalizes it by following the expansion algorithm
* (https://json-ld.org/spec/latest/json-ld-api/#expansion).
*
* @param {Object} inputObject
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
* @returns {Promise<Object>}
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
*/
module.exports = async function expand(inputObject) {
try {
return await jsonld.expand(inputObject, {documentLoader});
} catch (err) {
// jsonld wraps real errors in a bunch of junk, so see we have an underlying error first
if (err.details && err.details.cause) throw err.details.cause;
throw err;
}
};
31 changes: 31 additions & 0 deletions sd-validation/helpers/walkObject.js
@@ -0,0 +1,31 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

walk-object.js

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

/**
* Recursively (DFS) traverses an object and calls provided function for each field.
*
* @param {Object} obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param {function(String, any, Array<String>, Object): void} callback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param {function(String, any, Array<String>, Object): void} callback
* @param {function(string, any, Array<string>, any): void} callback

* @param {Array<String>} path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param {Array<String>} path
* @param {Array<string>} path

*/
module.exports = function walkObject(obj, callback, path = []) {
if (obj === null) {
return;
}

Object.keys(obj).forEach(fieldName => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about
Object.entries(obj).forEach(([fieldName, fieldValue]) => {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const fieldValue = obj[fieldName];
const newPath = Array.from(path);
newPath.push(fieldName);

callback(fieldName, fieldValue, newPath, obj);

if (typeof fieldValue === 'object') {
walkObject(fieldValue, callback, newPath);
}
});
};
83 changes: 83 additions & 0 deletions sd-validation/index.js
@@ -0,0 +1,83 @@
/**
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const parseJSON = require('./json.js');
const validateJsonLD = require('./jsonld.js');
const promiseExpand = require('./expand.js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop the promise in the name (or some combo of json-ld and "expand", but I can't make the capitalization look good :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const validateSchemaOrg = require('./schema.js');

/**
* Validates JSON-LD input. Returns array of error objects.
*
* @param {string} textInput
* @returns {Promise<Array<{path: ?string, validator: string, message: string}>>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more specific validator type? 'json' | 'json-ld' | 'json-ld-expand' | 'schema-org'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
module.exports = async function validate(textInput) {
/** @type {Array<{path: ?string, validator: string, message: string}>} */
const errors = [];
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved

// STEP 1: VALIDATE JSON
const parseOutput = parseJSON(textInput);

if (parseOutput.error) {
errors.push({
validator: 'json',
path: parseOutput.error.line,
message: parseOutput.error.message,
});

return errors;
}

const inputObject = parseOutput.result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could move the JSON.parse call out of json.js and put it here (since it's known that it's valid by this point). Then json.js doesn't have to bother with a result and only be in charge of indicating if there was an error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// STEP 2: VALIDATE JSONLD
const jsonLdErrors = validateJsonLD(inputObject);

if (jsonLdErrors && jsonLdErrors.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (jsonLdErrors && jsonLdErrors.length) {
if (jsonLdErrors.length) {

(since validateJsonLD always returns an array)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

jsonLdErrors.forEach(error => {
errors.push({
validator: 'json-ld',
path: error.path,
message: error.message.toString(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message: error.message.toString(),
message: error.message,

(error.message is always a string)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

});
});

return errors;
}

// STEP 3: EXPAND
let expandedObj = null;
try {
expandedObj = await promiseExpand(inputObject);
} catch (error) {
errors.push({
validator: 'json-ld-expand',
path: null,
message: error && error.toString(),
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
});

return errors;
}

// STEP 4: VALIDATE SCHEMA
const schemaOrgErrors = validateSchemaOrg(expandedObj);
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved

if (schemaOrgErrors && schemaOrgErrors.length) {
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
schemaOrgErrors.forEach(error => {
errors.push({
validator: 'schema-org',
path: error.path,
message: error.message,
});
});

return errors;
}

return errors;
};
59 changes: 59 additions & 0 deletions sd-validation/json.js
@@ -0,0 +1,59 @@
/**
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const jsonlint = require('jsonlint-mod');

/**
* @param {string} input
* @returns {{error: {message: string, line: string|null}|null, result: Object|null}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object is any on the js side when it's not parameterized (unlike in ts where it really means Object), so might as well make it any like JSON.parse returns (or unknown if we're feeling brave about out deps :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
module.exports = function parseJSON(input) {
let result;
const logError = console.error; // eslint-disable-line no-console

try {
// jsonlint-mod always calls console.error when there's an error
// We don't want this behavior, so we stash console.error while it's executing
console.error = () => undefined; // eslint-disable-line no-console
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should no longer be needed 🙂 circlecell/jsonlint-mod#2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hurray that's awesome! :D

jsonlint.parse(input);
result = JSON.parse(input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we use the return value from jsonlint.parse? Do we just not care about the extra features it supports, just the validation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we use the return value from jsonlint.parse

oh, I see, this is just a json linter, not a json-ld linter.

result = JSON.parse(input); should likely be below the try/catch block. jsonlint.parse should throw on anything that JSON.parse will, and result will always be undefined in the catch block either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} catch (error) {
let line = error.at;
let message = error.message;
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved

// extract line number from message
if (!line) {
const regexLineResult = error.message.match(/Parse error on line (\d+)/);

if (regexLineResult) {
line = regexLineResult[1];
}
}

// adjust jsonlint error output to our needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for this regex

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const regexMessageResult = error.message.match(/-+\^\n(.+)$/);

if (regexMessageResult) {
message = regexMessageResult[1];
}

return {
error: {
message,
line,
},
result,
};
} finally {
console.error = logError; // eslint-disable-line no-console
}

return {
error: null,
result,
};
};
71 changes: 71 additions & 0 deletions sd-validation/jsonld.js
@@ -0,0 +1,71 @@
/**
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const walkObject = require('./helpers/walkObject.js');

// This list comes from the JSON-LD 1.1 spec: https://json-ld.org/spec/latest/json-ld/#syntax-tokens-and-keywords
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
const KEYWORDS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename VALID_KEYWORDS

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

'@base',
'@container',
'@context',
'@graph',
'@id',
'@index',
'@language',
'@list',
'@nest',
'@none',
'@prefix',
'@reverse',
'@set',
'@type',
'@value',
'@version',
'@vocab',
];

/**
* @param {string} fieldName
* @returns boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@returns {boolean} (the @returns/@return is a whole other thing we should probably standardize on :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
function validKeyword(fieldName) {
return KEYWORDS.includes(fieldName);
}

/**
* @param {string} keyName
* @returns {string | null} error
*/
function validateKey(keyName) {
if (keyName[0] === '@' && !validKeyword(keyName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startsWith('@')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and just inline validKeyword?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done++

return 'Unknown keyword';
}

return null;
}

/**
* @param {Object} json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @returns {Array<{path: string, message: string}>}
*/
module.exports = function validateJsonLD(json) {
/** @type {Array<{path: string, message: string}>} */
const errors = [];

walkObject(json, (name, value, path) => {
const error = validateKey.call(null, name);

if (error) {
errors.push({
path: path.join('/'),
message: error,
});
}
});

return errors;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe since there's just the one keyword check, this file could be something like

const KEYWORDS = [
  // ...
];

/**
 * @param {string} keyName
 * @returns {boolean}
 */
function isInvalidKeyword(keyName) {
  return keyName.startsWith('@') && !VALID_KEYWORDS.includes(keyName);
}

/**
 * @param {any} json
 * @returns {Array<{path: string, message: string}>}
 */
module.exports = function validateJsonLD(json) {
  /** @type {Array<{path: string, message: string}>} */
  const errors = [];

  walkObject(json, (keyName, value, path) => {
    if (isInvalidKeyword(keyName)) {
      errors.push({
        path: path.join('/'),
        message: 'Unknown keyword',
      });
    }
  });

  return errors;
};

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yeah this is what done++ was ;)