Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape almost everything in ld+json content. #5121

Merged
merged 1 commit into from Sep 21, 2021

Conversation

isoos
Copy link
Collaborator

@isoos isoos commented Sep 21, 2021

No description provided.

// As we want to store rawJson inside a HTML Element, it is better
// to escape all non-trusteded characters inside it. Non-trusted
// characters must include `</!>` characters.
RegExp(r'[^0-9a-zA-Z ,@\:\{\}\[\]\.\-"]'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need that much escaping inside [] in a regular expression... is it really necessary to do \{ ? and not just { ?

Also, why not allow semi colon? and * and # and $, just curious, is there a reason not to? if so I'm happy to see them escaped..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out with the control structures in JSON ({}[]:",), but then it got hard to verify the result by reading it, so added a bit more like @, - and (space). While we could enable even more, I think we shouldn't even do these later exceptions...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often is this called within a process? Worth hoisting out the regex as something top-level/final to avoid allocations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll do that. We may also cache the parsed Nodes too, but this should be done anyway.

@@ -257,7 +257,7 @@ <h3 class="title">More</h3>
</p>
</aside>
</div>
<script type="application/ld+json">{"@context":"http:\u002f\u002fschema.org","@type":"SoftwareSourceCode","name":"flutter_titanium","version":"1.10.0","description":"flutter_titanium - flutter_titanium is awesome","url":"https:\u002f\u002fpub.dev\u002fpackages\u002fflutter_titanium","dateCreated":"%%published-timestamp%%","dateModified":"%%updated-timestamp%%","programmingLanguage":"Dart","image":"https:\u002f\u002fpub.dev\u002fstatic\u002fimg\u002fpub-dev-icon-cover-image.png"}</script>
<script type="application/ld+json">{"@context":"http:\u002f\u002fschema.org","@type":"SoftwareSourceCode","name":"flutter\u005ftitanium","version":"1.10.0","description":"flutter\u005ftitanium - flutter\u005ftitanium is awesome","url":"https:\u002f\u002fpub.dev\u002fpackages\u002fflutter\u005ftitanium","dateCreated":"%%published-timestamp%%","dateModified":"%%updated-timestamp%%","programmingLanguage":"Dart","image":"https:\u002f\u002fpub.dev\u002fstatic\u002fimg\u002fpub-dev-icon-cover-image.png"}</script>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, should we consider a special case to avoid escaping https:// and _ ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ was always escaped because it can be part of a closing tag. https:// could be a special case, but if the search engine do proper parsing, it shouldn't really matter.

@jonasfj
Copy link
Member

jonasfj commented Sep 21, 2021

Can we test how good search engines are at handling these escape sequences?

@isoos
Copy link
Collaborator Author

isoos commented Sep 21, 2021

Can we test how good search engines are at handling these escape sequences?

I'll do a staging release and try to test it with online testers.

@isoos
Copy link
Collaborator Author

isoos commented Sep 21, 2021

schema.org validates the content successfully:
image

@jonasfj
Copy link
Member

jonasfj commented Sep 21, 2021

Sweet, let's ship it!

@isoos isoos merged commit 22363c2 into dart-lang:master Sep 21, 2021
@isoos isoos deleted the escape-ldjson branch September 21, 2021 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants