New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip whitespace from data: URLs in stylesheets to prevent AMP validation error #1089

Closed
JonHendershot opened this Issue Apr 20, 2018 · 33 comments

Comments

@JonHendershot
Copy link

JonHendershot commented Apr 20, 2018

Validator reports CSS syntax error in tag 'style amp-custom' - bad url.

This is still throwing errors on Google Search Console. See ticket #604.

Worth noting that we're on v0.7 of the plugin to fix sanitization issues with the carousel, as per your recommendation in ticket #1076

screen shot 2018-04-20 at 1 40 12 pm

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 20, 2018

@JonHendershot I can confirm this is an error v0.6. However, it is fixed when I check out the 0.7 branch due to the fix merged in #1004. Note that this was not part of 0.7-beta1. Are you trying with current build of the 0.7 branch (with instructions in #1076 (comment))? Note that we're planning to package a 0.7-rc1 release on Monday.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 20, 2018

I can see that it was not fixed in 0.7-beta1, but it is fixed in the current state of the 0.7 branch.

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented Apr 20, 2018

Thanks for such a quick reply @westonruter - I do believe we're packaged with the current build via the instructions given in #1076 . We would've built from it earlier this week.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 20, 2018

@JonHendershot please share the URL for the page in question. Also, try running the output through https://validator.ampproject.org/ since it could be that the Google Search Console is reporting based on a cached version.

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented Apr 20, 2018

@westonruter We've got quite a few instances of this error on the site currently. Just checked it again on https://validator.ampproject.org/ and confirmed seeing this issue.

https://www.wellandgood.com/good-looks/millie-bobby-brown-red-carpet-sneaker-look/amp/

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 20, 2018

@JonHendershot I'm seeing:

<meta name="generator" content="AMP Plugin v0.7-beta1" />

Would you please do another build off of the 0.7 branch to make sure it is on the latest? Note that once you do this you should see the generator as:

<meta name="generator" content="AMP Plugin v0.7-beta1-d65578d-20180420T185722Z" />
@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 21, 2018

@JonHendershot it would be great to get confirmation on this ASAP as we're planning to release v0.7-RC1 on Monday.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 23, 2018

@JonHendershot we're going to proceed with the 0.7-RC1 release shortly.

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented Apr 23, 2018

hi @westonruter -- Sorry for the delay. We've updated the plugin and are still seeing the same issues. We've staged the updated plugin in the following environment:
https://amp-slides-well-good.pantheonsite.io/good-looks/millie-bobby-brown-red-carpet-sneaker-look/amp/

Confirmed that we're seeing the following meta tag in our head:
<meta name="generator" content="AMP Plugin v0.7-beta1-d65578d-20180420T213729Z" />

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 23, 2018

@JonHendershot thanks, I was able to replicate the issue this time. The issue is that there seems to be a space in the original data: URL in your content:

background:url(data:image/png; base64,ivborw0kggoaa...

If I copy your original markup and remove this extra space then the result properly validates. Can you try that?

Either way, this should be fixed, but it's not as bad an issue as #604 was, since it completely corrupted the data: URL.

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented Apr 23, 2018

@westonruter gotcha - thanks! I'm not totally clear, though -- it seems that that string is something generated by the plugin. I can't seem to find it in our codebase anywhere so I'm assuming it's being built somewhere. Should I be digging deeper to find a solve, or is this something on your side?

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 23, 2018

@JonHendershot The style rule is coming from an inline style. Specifically, it is coming from this block of HTML:

<blockquote class="instagram-media" style="background: #FFF; border: 0; border-radius: 3px; box-shadow: 0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width: 658px; padding: 0; width: calc(100% - 2px);" data-instgrm-permalink="https://www.instagram.com/p/BePtv5Ug43B/" data-instgrm-version="8">
<div style="padding: 8px;">
<div style="background: #F8F8F8; line-height: 0; margin-top: 40px; padding: 62.5% 0; text-align: center; width: 100%;">
<div style="background: url(data:image/png; base64,ivborw0kggoaaaansuheugaaacwaaaascamaaaapwqozaaaabgdbtueaalgpc/xhbqaaaafzukdcak7ohokaaaamuexurczmzpf399fx1+bm5mzy9amaaadisurbvdjlvzxbesmgces5/p8/t9furvcrmu73jwlzosgsiizurcjo/ad+eqjjb4hv8bft+idpqocx1wjosbfhh2xssxeiyn3uli/6mnree07uiwjev8ueowds88ly97kqytlijkktuybbruayvh5wohixmpi5we58ek028czwyuqdlkpg1bkb4nnm+veanfhqn1k4+gpt6ugqcvu2h2ovuif/gwufyy8owepdyzsa3avcqpvovvzzz2vtnn2wu8qzvjddeto90gsy9mvlqtgysy231mxry6i2ggqjrty0l8fxcxfcbbhwrsyyaaaaaelftksuqmcc); display: block; height: 44px; margin: 0 auto -44px; position: relative; top: -22px; width: 44px;"></div>
</div>
<p style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; line-height: 17px; margin-bottom: 0; margin-top: 8px; overflow: hidden; padding: 8px 0 7px; text-align: center; text-overflow: ellipsis; white-space: nowrap;"><a style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; font-style: normal; font-weight: normal; line-height: 17px; text-decoration: none;" href="https://www.instagram.com/p/BePtv5Ug43B/" target="_blank" rel="noopener">A post shared by Millie Bobby Brown (@milliebobbybrown)</a> on <time style="font-family: Arial,sans-serif; font-size: 14px; line-height: 17px;" datetime="2018-01-22T08:46:26+00:00">Jan 22, 2018 at 12:46am PST</time>
</div>
</blockquote>

Is this coming from an Instagram embed? Please share the underlying post_content for this embed so we can reproduce.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 24, 2018

I can see actually that this is the Instagram embed code they supply. Here's another embed code snippet just copied from a post:

<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-permalink="https://www.instagram.com/p/Bh7jRUkBIzB/" data-instgrm-version="8" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="padding:8px;"> <div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;"> <div style=" background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div></div> <p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/Bh7jRUkBIzB/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">Introducing the new “Psycho Kaler”! Kale, Calabrian Chilies, Cream, Mozzarella, Garlic, and Ricotta. #pizzaweekiseveryweek</a></p> <p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by <a href="https://www.instagram.com/fillmorepdx/" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px;" target="_blank"> Info@filmorepdx.com</a> (@fillmorepdx) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2018-04-23T22:55:39+00:00">Apr 23, 2018 at 3:55pm PDT</time></p></div></blockquote> <script async defer src="//www.instagram.com/embed.js"></script>

Notice here that there is no space in the background image data: URL. It appears that your code was modified to add the space.

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented Apr 24, 2018

@westonruter yah good call -- thanks for digging through that for me!! Looks like that code is being added via the TinyMCE content editor, so it'll be on content management and/or a content filter to protect against it. Thank you again for such quick replies this week!

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 24, 2018

I'll re-open this because it's something we can fix in CSS sanitizer to prevent extra spaces from being added in the first place.

@westonruter westonruter reopened this Apr 24, 2018

@westonruter westonruter added this to the v1.0 milestone Apr 24, 2018

@westonruter westonruter changed the title CSS syntax error in tag 'style amp-custom' - bad url. Strip whitespace from data: URLs in stylesheets to prevent CSS syntax error in tag 'style amp-custom' - bad url. Apr 24, 2018

@westonruter westonruter changed the title Strip whitespace from data: URLs in stylesheets to prevent CSS syntax error in tag 'style amp-custom' - bad url. Strip whitespace from data: URLs in stylesheets to prevent AMP validation error Apr 24, 2018

@westonruter westonruter added the css label Apr 24, 2018

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Apr 24, 2018

To fix the issue, this is where I would work:

https://github.com/Automattic/amp-wp/blob/1e2cd22f1bfe8ab9d833c901e02111120f52fe72/includes/sanitizers/class-amp-style-sanitizer.php#L749-L758

The real_path_urls method should be renamed to normalize_urls, and then instead of just skipping data: URLs, it should remove all whitespace before continuing.

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented Apr 24, 2018

ah, wonderful! Thanks @westonruter -- will keep an eye on this

@amedina amedina self-assigned this Apr 24, 2018

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented May 8, 2018

@westonruter just wanted to follow up on this and see if it has been addressed yet. Thanks!

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented May 9, 2018

@JonHendershot no it has not.

@JonHendershot

This comment has been minimized.

Copy link

JonHendershot commented May 11, 2018

@westonruter ok cool - any estimate on when we can expect to see this patched?

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented May 13, 2018

It will be part of the 1.0 release which would be coming this summer. I suggest implementing an alternative with a the_content filter in the mean time.

@Mte90

This comment has been minimized.

Copy link

Mte90 commented May 18, 2018

I fixed with this code:

add_filter( 'the_content', 'fix_amp_space' );

function fix_amp_space( $content ) {
    if ( is_amp_endpoint() ) {
        return str_replace('data:image/png; base64,', 'data:image/png;base64,', $content);
    }
    return $content;
}

But I don't understand why in 0.8 this fix cannot be implemented because there are a lot of people reporting this problem.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented May 18, 2018

The fix here using the_content is more of a workaround than a proper fix (as the fix here won't apply to CSS in external stylesheets). Only with 1.0 do we have an actual CSS Parser which can be used to extract the data: URLs from stylesheets. In 0.7 the only option is to do some string replacement like this. But it's a pretty brute-force fix.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented May 18, 2018

@amedina Instead of removing the spaces from the data: URL it may be better to wrap the content in quotes. This would also fix the issue of using SVG in data: URLs per: https://wordpress.org/support/topic/background-gradients-invalid/

@mehigh

This comment has been minimized.

Copy link
Collaborator

mehigh commented May 23, 2018

Wrapping the dataUri with quotes is indeed the better approach here.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented May 23, 2018

@mehigh why? URLs don't allow whitespace so just removing whitespace seems to have the same effect, and it also has the added benefit of saving at least 3 bytes.

@mehigh

This comment has been minimized.

Copy link
Collaborator

mehigh commented May 23, 2018

I googled. What I knew was not up to date.. this was actually something I picked up from the CSS validator years back, but it seems this was reported as a bug which got fixed .. 3 years ago, since quotes are optional for data uris as well. I've always used quotes myself though.
w3c/css-validator#42 (comment)

TL;DR: dropping the quotes for data embeds is OK.

@davisshaver

This comment has been minimized.

Copy link
Contributor

davisshaver commented May 25, 2018

Chiming in after getting AMP regressions.

Since removing the whitespace works but is a little brute force, could we wrap in the single quote to support legacy embeds (like Instagram)? I wouldn't call it better but at least it would fix the errors until 1.0 version is ready.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented May 25, 2018

@davisshaver Regressions as in this wasn't a problem in 0.6?

If the issue is specifically Instagram embeds, then the proper solution for that would be in #1103. It seems WordPress has a tendency to add spaces inside the data: URL and so this might be actually an issue with TinyMCE. In the general case, the fix is to target the data: URLs as in #1164 to remove the spaces. This targeting of URLs is most reliable when we have an actual CSS parser, which we do as of 1.0-alpha. In the mean time, I suggest in the mean time using a the_content filter on your own site to strip out the invalid spaces as a temporary workaround, like in #1089 (comment)

@davisshaver

This comment has been minimized.

Copy link
Contributor

davisshaver commented May 28, 2018

Nice! Thanks @westonruter. By regression I just meant that formerly-validating AMP pages were no longer validating. It looks like @amedina's patch will fix this though. 🚀

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented May 28, 2018

formerly-validating AMP pages were no longer validating

@davisshaver right, but are they no longer validating because of a change in the plugin or a change on your site?

@davisshaver

This comment has been minimized.

Copy link
Contributor

davisshaver commented May 28, 2018

@westonruter Sorry I really don't know... I think the change in plugin caused it, as I don't believe anything related to inline images changed on the site; the content affected was all archival (Feb 2018 or earlier); and I don't recall seeing this error at any point previously. BUT I am unable to verify that the pages were at any time indexed in AMP, as analytics data shows no hits to the pages, and I didn't keep clean enough records of plugin versions to put this into context of v0.6/v0.7. Sorry I can't be more help.

@kienstra

This comment has been minimized.

Copy link
Collaborator

kienstra commented Jun 5, 2018

Steps To Test

Hi @csossi,
Could you please test this?

  1. Create a new post on the Native AMP staging site
  2. Copy this Instagram URL into the post editor: https://www.instagram.com/p/BjnK5eGlil4
  3. This should render as an Instagram post
  4. Visit the front-end of the post
  5. Expected: There are no AMP validation errors, when using the Chrome AMP extension, or the AMP Validator

@kienstra kienstra moved this from Ready for QA to Ready for Merging in v1.0 Jun 6, 2018

@kienstra kienstra moved this from Ready for Merging to In Production in v1.0 Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment