Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(jsonld): Structured data validation updates #8137

Merged
merged 18 commits into from
Apr 16, 2019

Conversation

mattzeunert
Copy link
Contributor

Summary

Updates to the SD validation logic:

  • add line numbers to error messages
  • store invalid entity type for "unexpected property" errors

We use the entity type to include them in the error messages and link to their schema.org URL so the user can see what properties are valid:

Screenshot 2019-04-10 at 13 32 28

From the audit's perspective the validator would ideally generate the full error message including the markdown link, but that logic is in the audit to make the sd validator more re-usable.

Related Issues/PRs

Part of #4359

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hurray for having some real usage-driven changes here now! :D 🎉

@@ -41,7 +41,7 @@ module.exports = function validateJsonLD(json) {
if (name.startsWith('@') && !VALID_KEYWORDS.has(name)) {
errors.push({
path: path.join('/'),
message: 'Unknown keyword',
message: 'Unknown keyword ' + name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this in quotes or after a colon or something? this also makes we wonder what our i18n story is here 😬

@@ -17,7 +17,7 @@ describe('JSON validation', () => {
`);

assert.equal(errors.length, 1);
assert.equal(errors[0].path, 2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably use strict equal here then, these would have been failing before I guess

lighthouse-core/lib/sd-validation/sd-validation.js Outdated Show resolved Hide resolved
* @returns null | number - line number of the path value in the prettified JSON
*/
function getLineNumberFromJsonLDPath(obj, path) {
// To avoid having an extra dependency on a JSON parser we set a unique key in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has that shipped sailed with jsonlint-mod is there any way to leverage that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also deja vu, I feel like I've reviewed this before so sorry if you've answered this in the previous PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has that shipped sailed with jsonlint-mod is there any way to leverage that?

I don't see a direct way to leverage jsonlint-mod here. It just uses Douglas Crockford's JSON parser.

I guess we could parse it ourselves or find a small parser that we can include?

also deja vu, I feel like I've reviewed this before so sorry if you've answered this in the previous PR

Yup, I sent you a pre-PR to get some initial feedback.

lighthouse-core/lib/sd-validation/sd-validation.js Outdated Show resolved Hide resolved
// but key provided by validator is "author"
const keyParts = key.split('/');
const relativeKey = keyParts[keyParts.length - 1];
if (relativeKey === pathPart && currentObj[key] !== undefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like if we have currentObj[key] as undefined we could just return before this whole loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I don't see why we'd need it at all actually...

} else {
currentObj = currentObj[key];
}
keyFound = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like keyFound is unnecessary, we could just return here and throw if we ever make it through the loop without returning

@@ -169,7 +171,9 @@ describe('schema.org validation', () => {
}`);

assert.equal(errors.length, 1);
assert.equal(errors[0].invalidTypes[0], 'http://schema.org/Article');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name is a bit confusing to me for how it's used. the type isn't invalid, right? it's just the type of the entity that has some other invalid stuff going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity is not a valid type instance because it has a property that doesn't exist on type.

I originally just called that array types, but then it wasn't clear that having that array means invalid. (And we use that array to generate the Invalid Event: unexpected property asdf message.)

Maybe we can call the array invalidForTypes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh gotcha, I actually like the direction you were headed with .types then. I think it's clear something is invalid by virtue of the fact that we are giving an error. Maybe typeOfInvalidEntity or something similar if we're going the more explicit route?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm at validTypes right now, since we only include the types in the validation error if they are valid. (And we don't want to show invalid types in the UI, because in that case they'll be part of the error message.)

errors.push({
invalidTypes: error.invalidTypes,
message: error.message,
// get rid of the first chunk (/@type) as it's the same for all errors
path:
'/' +
path
.slice(0, -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait a second, the path is backwards? I think we're missing some tests that have paths of length >2 then 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into this more tomorrow, but what makes you think the path is backwards? Here's an example of the path:

[ 'http://schema.org/author', '0', 'http://schema.org/colleague', '0', '@type' ]

For this:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "author": {
    "@type": "Person",
    "displayNaame": "Sally",
    "colleague": {
      "@type": "Person",
      "displayNaame": "Sally"
    }
  }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the comment confused me. It says it's getting rid of the "first chunk" but it's removing the last chunk. That made me think it is backwards and all paths started with /@type, but I see now :)

maybe update the comment? a test for longer paths still couldn't hurt though sometime :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, yeah good catch, that's a very confusing comment! Added a test case for deeper nesting too.

@connorjclark connorjclark changed the title core(jsonld): Structured data validation udpates core(jsonld): Structured data validation updates Apr 10, 2019
@mattzeunert
Copy link
Contributor Author

I don't think this can happen if the input is a JSON-LD object and the path is from the expanded version of that object. Should we just take it out?

Screenshot 2019-04-11 at 11 36 17

@mattzeunert mattzeunert reopened this Apr 11, 2019
Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking good so far to me, just some surface level comments

| "json"
| "json-ld"
| "json-ld-expand"
| "schema-org";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, where is this leading | coming from :P Not even prettier does this, I don't think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my prettier definitely does this in other projects, not sure about here :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my prettier definitely does this in other projects, not sure about here :)

maybe it's a typescript prettier thing? I couldn't get it to repro in their playground

@@ -49,7 +49,7 @@ function findType(type) {
*
* @param {string|Array<string>} typeOrTypes
* @param {Array<string>} keys
* @returns {Array<string>}
* @returns {Array<{message: string, key?: string, typesOfInvalidEntity?: Array<string>, path?: string}>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pull into a typedef so these properties can have docs?

* @param {string} path
* @returns {null | number} - line number of the path value in the prettified JSON
*/
function getLineNumberFromJsonLDPath(obj, path) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two are utility-y enough, maybe they're worth exposing and having explicit tests for them?

They're getting test coverage right now but I doubt it's much, and it would have the benefit of documenting them a bit more. e.g. these aren't called with literals anywhere, so it's not possible to know what the path string generally looks like without also looking at the code that generates paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, adding some tests is a good idea 👍

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mattzeunert
Copy link
Contributor Author

@patrickhulce LGTY?

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, yes! LGTM :)

@patrickhulce patrickhulce merged commit c9668d7 into master Apr 16, 2019
@patrickhulce patrickhulce deleted the sd-validation-update branch April 16, 2019 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants