New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape almost everything in ld+json content. #5121
Conversation
// As we want to store rawJson inside a HTML Element, it is better | ||
// to escape all non-trusteded characters inside it. Non-trusted | ||
// characters must include `</!>` characters. | ||
RegExp(r'[^0-9a-zA-Z ,@\:\{\}\[\]\.\-"]'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need that much escaping inside []
in a regular expression... is it really necessary to do \{
? and not just {
?
Also, why not allow semi colon? and *
and #
and $
, just curious, is there a reason not to? if so I'm happy to see them escaped..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started out with the control structures in JSON ({}[]:",
), but then it got hard to verify the result by reading it, so added a bit more like @
, -
and
(space). While we could enable even more, I think we shouldn't even do these later exceptions...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often is this called within a process? Worth hoisting out the regex as something top-level/final to avoid allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll do that. We may also cache the parsed Nodes too, but this should be done anyway.
@@ -257,7 +257,7 @@ <h3 class="title">More</h3> | |||
</p> | |||
</aside> | |||
</div> | |||
<script type="application/ld+json">{"@context":"http:\u002f\u002fschema.org","@type":"SoftwareSourceCode","name":"flutter_titanium","version":"1.10.0","description":"flutter_titanium - flutter_titanium is awesome","url":"https:\u002f\u002fpub.dev\u002fpackages\u002fflutter_titanium","dateCreated":"%%published-timestamp%%","dateModified":"%%updated-timestamp%%","programmingLanguage":"Dart","image":"https:\u002f\u002fpub.dev\u002fstatic\u002fimg\u002fpub-dev-icon-cover-image.png"}</script> | |||
<script type="application/ld+json">{"@context":"http:\u002f\u002fschema.org","@type":"SoftwareSourceCode","name":"flutter\u005ftitanium","version":"1.10.0","description":"flutter\u005ftitanium - flutter\u005ftitanium is awesome","url":"https:\u002f\u002fpub.dev\u002fpackages\u002fflutter\u005ftitanium","dateCreated":"%%published-timestamp%%","dateModified":"%%updated-timestamp%%","programmingLanguage":"Dart","image":"https:\u002f\u002fpub.dev\u002fstatic\u002fimg\u002fpub-dev-icon-cover-image.png"}</script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, should we consider a special case to avoid escaping https://
and _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/
was always escaped because it can be part of a closing tag. https://
could be a special case, but if the search engine do proper parsing, it shouldn't really matter.
Can we test how good search engines are at handling these escape sequences? |
I'll do a staging release and try to test it with online testers. |
Sweet, let's ship it! |
No description provided.