New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare (and maybe other) CDN breaking AMP pages #2380

Closed
weeblr opened this Issue Mar 2, 2016 · 84 comments

Comments

Projects
None yet
@weeblr

weeblr commented Mar 2, 2016

Hi,
After this discussion, https://productforums.google.com/forum/#!msg/webmasters/_EFTDi1koJo/EAoRom0fIQAJ, Tomo T suggested we open an issue here.

It's a known issue (#1662) that Cloudflare (and maybe other) are breaking AMP pages by inserting on the fly some javascript that do not comply with AMP specs.

Tomo T mentioned you guys are in touch with cloudflare, so maybe resolution is on its way, but in our wbAMP plugin for Joomla, we did the following:
1 - added no-cache headers
2 - Added a X-amphtml: wbAMP header, so that users can setup rules to bypass those pages in their CDN control panel for instance.

Is there any merit to this?

Rgds

@Gregable

This comment has been minimized.

Show comment
Hide comment
@Gregable

Gregable Mar 2, 2016

Member

I'm not an expert on cloudflare, but I went in and played around with the settings on a free account to see what cloudflare is doing. I think the feature which breaks AMP validation is called Rocket Loader. It seems, among other changes, to insert into every HTML document head the following snippet:

<script type="text/javascript">
//<![CDATA[
[Cloudflare javascript loader code here]
//]]>
</script>

This loads javascript from cloudflare's site, which isn't part of the AMP runtime and this is invalid for AMP. From my understanding of Rocket Loader's description, valid AMP pages won't be improved by Rocket Loader as they already load javascript asynchronously. As a workaround you can use today, you can disable Rocket Loader for amp pages.

The Rocket Loader feature appears to be under the "Speed" heading in cloudflare's web console, but it would probably be better to configure it just for AMP pages by setting up a specific "Page Rule" that disables Rocket Loader only for the section of your site that is AMP.

Member

Gregable commented Mar 2, 2016

I'm not an expert on cloudflare, but I went in and played around with the settings on a free account to see what cloudflare is doing. I think the feature which breaks AMP validation is called Rocket Loader. It seems, among other changes, to insert into every HTML document head the following snippet:

<script type="text/javascript">
//<![CDATA[
[Cloudflare javascript loader code here]
//]]>
</script>

This loads javascript from cloudflare's site, which isn't part of the AMP runtime and this is invalid for AMP. From my understanding of Rocket Loader's description, valid AMP pages won't be improved by Rocket Loader as they already load javascript asynchronously. As a workaround you can use today, you can disable Rocket Loader for amp pages.

The Rocket Loader feature appears to be under the "Speed" heading in cloudflare's web console, but it would probably be better to configure it just for AMP pages by setting up a specific "Page Rule" that disables Rocket Loader only for the section of your site that is AMP.

@weeblr-dev

This comment has been minimized.

Show comment
Hide comment
@weeblr-dev

weeblr-dev Mar 2, 2016

Thanks for your reply. We had seen the JS being inserted, but it's good to know this particular feature is involved. I will pass the information.
Nevertheless, the question stands whether it could be useful to add a specific header to amp page so that his process could be integrated by CDN or proxies?

Rgds

Thanks for your reply. We had seen the JS being inserted, but it's good to know this particular feature is involved. I will pass the information.
Nevertheless, the question stands whether it could be useful to add a specific header to amp page so that his process could be integrated by CDN or proxies?

Rgds

@jmadler

This comment has been minimized.

Show comment
Hide comment
@jmadler

jmadler Mar 2, 2016

Contributor

I've let cloudflare know as well. Generally CDNs could detect the doc as AMP HTML and avoid the JS insert (and, perhaps, do AMP specific things like image creation and srcset population).

Contributor

jmadler commented Mar 2, 2016

I've let cloudflare know as well. Generally CDNs could detect the doc as AMP HTML and avoid the JS insert (and, perhaps, do AMP specific things like image creation and srcset population).

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Mar 3, 2016

Hi there,

@Gregable Our user disabled Rocker Loader on his site, and that solved the validation issue. However, it appears the Rocket Loader feature is something that can only be enabled/disabled for an entire subdomain, not using Page Rules, ie you can't exclude only some URLs.

So only solutions are a/ drop it entirely, or b/ move AMP pages to a subdomain, which is a bit involevd.

I would assume Cloudflare would want to be able to identify AMP pages and disable their plugin then. As @jmadler said, they can just look at the amp attribute in the html tag?

I still think a header or something that doesn't require looking at the actual page content to recognize an AMP page would be helpful in the real world.

Rgds

weeblr commented Mar 3, 2016

Hi there,

@Gregable Our user disabled Rocker Loader on his site, and that solved the validation issue. However, it appears the Rocket Loader feature is something that can only be enabled/disabled for an entire subdomain, not using Page Rules, ie you can't exclude only some URLs.

So only solutions are a/ drop it entirely, or b/ move AMP pages to a subdomain, which is a bit involevd.

I would assume Cloudflare would want to be able to identify AMP pages and disable their plugin then. As @jmadler said, they can just look at the amp attribute in the html tag?

I still think a header or something that doesn't require looking at the actual page content to recognize an AMP page would be helpful in the real world.

Rgds

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 10, 2016

Hi, I work for CloudFlare but am not working in this specific area, it's been raised internally and this issue is being tracked and we'll look to get a fix out as soon as possible. If I have news on this, I'll update this bug. Feel free to ping me for updates on this.

buro9 commented Mar 10, 2016

Hi, I work for CloudFlare but am not working in this specific area, it's been raised internally and this issue is being tracked and we'll look to get a fix out as soon as possible. If I have news on this, I'll update this bug. Feel free to ping me for updates on this.

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 10, 2016

For now, a CloudFlare customer will want to disable:

  • Speed > Mirage
  • Speed > Rocket Loader
  • Scrape Shield > Email Obfuscation

Those can be disabled using page rules, such that you can leave the features enabled for the rest of the content served.

We'd recommend leaving the following enabled for AMP:

  • Speed > Auto Minify
  • Speed > Polish

buro9 commented Mar 10, 2016

For now, a CloudFlare customer will want to disable:

  • Speed > Mirage
  • Speed > Rocket Loader
  • Scrape Shield > Email Obfuscation

Those can be disabled using page rules, such that you can leave the features enabled for the rest of the content served.

We'd recommend leaving the following enabled for AMP:

  • Speed > Auto Minify
  • Speed > Polish
@eperezf

This comment has been minimized.

Show comment
Hide comment
@eperezf

eperezf Mar 14, 2016

@buro9 I did everything via a page rule. The problem is still there. Using Wordpress and CloudFlare. "The attribute 'type' in tag 'script type=application/ld+json' is set to the invalid value 'text/javascript'." is the only error left. Is there any waiting time for the settings to apply?

eperezf commented Mar 14, 2016

@buro9 I did everything via a page rule. The problem is still there. Using Wordpress and CloudFlare. "The attribute 'type' in tag 'script type=application/ld+json' is set to the invalid value 'text/javascript'." is the only error left. Is there any waiting time for the settings to apply?

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 15, 2016

@eperezf I've raised that as a bug ticket internally. I am curious, does disabling HTML minification resolve that?

Settings are usually applied globally within seconds.

buro9 commented Mar 15, 2016

@eperezf I've raised that as a bug ticket internally. I am curious, does disabling HTML minification resolve that?

Settings are usually applied globally within seconds.

@dknecht

This comment has been minimized.

Show comment
Hide comment
@dknecht

dknecht Mar 15, 2016

Contributor

@jmadler I work for CloudFlare and would be interested to chat about ways to collaborate and potentially add AMP specific logic to our edge.

Contributor

dknecht commented Mar 15, 2016

@jmadler I work for CloudFlare and would be interested to chat about ways to collaborate and potentially add AMP specific logic to our edge.

@eperezf

This comment has been minimized.

Show comment
Hide comment
@eperezf

eperezf Mar 15, 2016

@buro9 I think HTML Minification is not the issue here as the problem is the script type "text/javascript" inserted in the section. Just like Google Analytics, it should be formatted as "ld+/json" maybe.

I just checked again the validator and the error changed to "The tag 'script' is disallowed except in specific forms". Maybe if the script can be loaded form an external file instead of being inserted on the head section it may work.

eperezf commented Mar 15, 2016

@buro9 I think HTML Minification is not the issue here as the problem is the script type "text/javascript" inserted in the section. Just like Google Analytics, it should be formatted as "ld+/json" maybe.

I just checked again the validator and the error changed to "The tag 'script' is disallowed except in specific forms". Maybe if the script can be loaded form an external file instead of being inserted on the head section it may work.

@dvoytenko

This comment has been minimized.

Show comment
Hide comment
@dvoytenko

dvoytenko Mar 15, 2016

Collaborator

No, external file will not be allowed to be loaded either.

Collaborator

dvoytenko commented Mar 15, 2016

No, external file will not be allowed to be loaded either.

@eperezf

This comment has been minimized.

Show comment
Hide comment
@eperezf

eperezf Mar 15, 2016

It can using <script async src="">. At least it does on my website. The problem is inline javascript from Cloudflare using <script type="text/javascript">.

eperezf commented Mar 15, 2016

It can using <script async src="">. At least it does on my website. The problem is inline javascript from Cloudflare using <script type="text/javascript">.

@dvoytenko

This comment has been minimized.

Show comment
Hide comment
@dvoytenko

dvoytenko Mar 16, 2016

Collaborator

On the main document, <script async src> is only allowed to the AMP runtime or extensions from cdn.ampproject.org.

Collaborator

dvoytenko commented Mar 16, 2016

On the main document, <script async src> is only allowed to the AMP runtime or extensions from cdn.ampproject.org.

@dknecht

This comment has been minimized.

Show comment
Hide comment
@dknecht

dknecht Mar 16, 2016

Contributor

Can CloudFlare assume that an AMP Crawler will always have '(compatible; Google-AMPHTML)' in the user agent or at least AMPHTML?

Contributor

dknecht commented Mar 16, 2016

Can CloudFlare assume that an AMP Crawler will always have '(compatible; Google-AMPHTML)' in the user agent or at least AMPHTML?

@jmadler

This comment has been minimized.

Show comment
Hide comment
@jmadler

jmadler Mar 16, 2016

Contributor

No, you'll need to check for the presence of "AMP" or "⚡" in the tag, case insensitive.

Contributor

jmadler commented Mar 16, 2016

No, you'll need to check for the presence of "AMP" or "⚡" in the tag, case insensitive.

@jmarantz

This comment has been minimized.

Show comment
Hide comment
@jmarantz

jmarantz Mar 16, 2016

FWIW, mod_pagespeed has a parallel set of issues. We are working on solving those now, and you can follow our progress here: apache/incubator-pagespeed-mod#1263

The fact that we have to sniff content to tell something is an AMP document complicates the solution somewhat, particularly when streaming HTML. But it is all do-able (and in fact has already been done, just not all code-reviewed yet).

FWIW, mod_pagespeed has a parallel set of issues. We are working on solving those now, and you can follow our progress here: apache/incubator-pagespeed-mod#1263

The fact that we have to sniff content to tell something is an AMP document complicates the solution somewhat, particularly when streaming HTML. But it is all do-able (and in fact has already been done, just not all code-reviewed yet).

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 16, 2016

That is the root of the issue we have too. For the most part we do not parse or otherwise look at the content of HTML that passes through CloudFlare. Some features do in fact require a parser, but many do not and up to now could safely be added to a HTML document without parsing or otherwise inspecting the HTML document content.

For performance you may imagine that we only parse that which we absolutely must parse, which is a subset of all text/html documents that pass through CloudFlare.

It would be great if there were a header that declared an AMP document, the content-type would be best for this and currently most AMP pages return text/html which does not identify them as AMP. Something like text/html+amp would be great but probably has it's own issues (many legacy things that only expect text/html and nothing else).

We will probably also move to sniffing the html tag, but until then the advice for AMP users will be to disable features that are not AMP aware.

buro9 commented Mar 16, 2016

That is the root of the issue we have too. For the most part we do not parse or otherwise look at the content of HTML that passes through CloudFlare. Some features do in fact require a parser, but many do not and up to now could safely be added to a HTML document without parsing or otherwise inspecting the HTML document content.

For performance you may imagine that we only parse that which we absolutely must parse, which is a subset of all text/html documents that pass through CloudFlare.

It would be great if there were a header that declared an AMP document, the content-type would be best for this and currently most AMP pages return text/html which does not identify them as AMP. Something like text/html+amp would be great but probably has it's own issues (many legacy things that only expect text/html and nothing else).

We will probably also move to sniffing the html tag, but until then the advice for AMP users will be to disable features that are not AMP aware.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Mar 16, 2016

@buro9 This is what I suggested in the OP, but indeed I didn't think a separate content type value would be practical. Hence might be a good idea to have a separate header?
Right now we add:

X-amphtml: wbAMP

as wbAMP is our extension to produce AMP-compliant page from Joomla pages, but obviously something more generic would be a better fit.

I also believe that having to actually parse (though in a limited way) the actual content to decide whether to leave it alone or process it is a major overhead, and will prevent adoption.

weeblr commented Mar 16, 2016

@buro9 This is what I suggested in the OP, but indeed I didn't think a separate content type value would be practical. Hence might be a good idea to have a separate header?
Right now we add:

X-amphtml: wbAMP

as wbAMP is our extension to produce AMP-compliant page from Joomla pages, but obviously something more generic would be a better fit.

I also believe that having to actually parse (though in a limited way) the actual content to decide whether to leave it alone or process it is a major overhead, and will prevent adoption.

@jmarantz

This comment has been minimized.

Show comment
Hide comment
@jmarantz

jmarantz Mar 16, 2016

I would think that the features that CloudFlare offers that might break AMP are a proper subset of the ones that have to parse HTML anyway, but I have to admit I don't understand fully how rocket-loader works.

In any case I agree it would've been a lot more convenient for all involved if amp documents were required by the spec to self-identify with an http response header.

I would think that the features that CloudFlare offers that might break AMP are a proper subset of the ones that have to parse HTML anyway, but I have to admit I don't understand fully how rocket-loader works.

In any case I agree it would've been a lot more convenient for all involved if amp documents were required by the spec to self-identify with an http response header.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Mar 16, 2016

I would think that the features that CloudFlare offers that might break AMP are a proper subset of the ones that have to parse HTML anyway,
Pretty inefficient to have to do the parsing to discover that you won't be able to the use the parsed content because this is an AMP page. Better look at a header, to know that a specific set of features won't apply, and you don't need to parse (or apply whatever processing you want to apply)

weeblr commented Mar 16, 2016

I would think that the features that CloudFlare offers that might break AMP are a proper subset of the ones that have to parse HTML anyway,
Pretty inefficient to have to do the parsing to discover that you won't be able to the use the parsed content because this is an AMP page. Better look at a header, to know that a specific set of features won't apply, and you don't need to parse (or apply whatever processing you want to apply)

@dvoytenko

This comment has been minimized.

Show comment
Hide comment
@dvoytenko

dvoytenko Mar 16, 2016

Collaborator

I think "parse" is a bit of a strong word here. It's just a very top header of the page that needs to be inspected. I don't think we can require all publishers to include a header for all amp pages - that would seem to be error-prone and duplicative.

Collaborator

dvoytenko commented Mar 16, 2016

I think "parse" is a bit of a strong word here. It's just a very top header of the page that needs to be inspected. I don't think we can require all publishers to include a header for all amp pages - that would seem to be error-prone and duplicative.

@Gregable

This comment has been minimized.

Show comment
Hide comment
@Gregable

Gregable Mar 16, 2016

Member

Would it help if we added validation checks that the tag is within the first maybe 100 bytes of the document? Other than comments, it's already required to be the 2nd tag on the page, but comments and whitespace make it possible to be an arbitrary length from the top.

Member

Gregable commented Mar 16, 2016

Would it help if we added validation checks that the tag is within the first maybe 100 bytes of the document? Other than comments, it's already required to be the 2nd tag on the page, but comments and whitespace make it possible to be an arbitrary length from the top.

@jridgewell

This comment has been minimized.

Show comment
Hide comment
@jridgewell

jridgewell Mar 16, 2016

Member

Why not just ban comments until after the opening html tag? Then it's guaranteed to be before the second > character.

Member

jridgewell commented Mar 16, 2016

Why not just ban comments until after the opening html tag? Then it's guaranteed to be before the second > character.

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 16, 2016

Within the first 100 bytes is definitely going to be of great help. That's
enough to allow us to decide what to enable/disable, whilst allowing us to
send headers as early as possible.

This at least allows this to be a sniffer and not a parser and the
constraint allows it to be tightly defined which is good for security
(mitigates resource exhaustion during a DDoS).

Content-type is still preferable, but a 100 byte limit, whereby the AMP
string or lightning rune must fully complete within the first 100 bytes
would be good.

On Wed, 16 Mar 2016, 21:42 Justin Ridgewell, notifications@github.com
wrote:

Why not just ban comments until after the opening html tag? Then it's
guaranteed to be before the second > character.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2380 (comment)

buro9 commented Mar 16, 2016

Within the first 100 bytes is definitely going to be of great help. That's
enough to allow us to decide what to enable/disable, whilst allowing us to
send headers as early as possible.

This at least allows this to be a sniffer and not a parser and the
constraint allows it to be tightly defined which is good for security
(mitigates resource exhaustion during a DDoS).

Content-type is still preferable, but a 100 byte limit, whereby the AMP
string or lightning rune must fully complete within the first 100 bytes
would be good.

On Wed, 16 Mar 2016, 21:42 Justin Ridgewell, notifications@github.com
wrote:

Why not just ban comments until after the opening html tag? Then it's
guaranteed to be before the second > character.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2380 (comment)

@jmarantz

This comment has been minimized.

Show comment
Hide comment
@jmarantz

jmarantz Mar 16, 2016

The first 100 bytes might be tight. E.g. from
https://en.wikipedia.org/wiki/Document_type_declaration

lang="ar" dir="ltr" xmlns="http://www.w3.org/1999/xhtml">

On Wed, Mar 16, 2016 at 5:54 PM, David Kitchen notifications@github.com
wrote:

Within the first 100 bytes is definitely going to be of great help. That's
enough to allow us to decide what to enable/disable, whilst allowing us to
send headers as early as possible.

This at least allows this to be a sniffer and not a parser and the
constraint allows it to be tightly defined which is good for security
(mitigates resource exhaustion during a DDoS).

Content-type is still preferable, but a 100 byte limit, whereby the AMP
string or lightning rune must fully complete within the first 100 bytes
would be good.

On Wed, 16 Mar 2016, 21:42 Justin Ridgewell, notifications@github.com
wrote:

Why not just ban comments until after the opening html tag? Then it's
guaranteed to be before the second > character.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
https://github.com/ampproject/amphtml/issues/2380#issuecomment-197560035>


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2380 (comment)

The first 100 bytes might be tight. E.g. from
https://en.wikipedia.org/wiki/Document_type_declaration

lang="ar" dir="ltr" xmlns="http://www.w3.org/1999/xhtml">

On Wed, Mar 16, 2016 at 5:54 PM, David Kitchen notifications@github.com
wrote:

Within the first 100 bytes is definitely going to be of great help. That's
enough to allow us to decide what to enable/disable, whilst allowing us to
send headers as early as possible.

This at least allows this to be a sniffer and not a parser and the
constraint allows it to be tightly defined which is good for security
(mitigates resource exhaustion during a DDoS).

Content-type is still preferable, but a 100 byte limit, whereby the AMP
string or lightning rune must fully complete within the first 100 bytes
would be good.

On Wed, 16 Mar 2016, 21:42 Justin Ridgewell, notifications@github.com
wrote:

Why not just ban comments until after the opening html tag? Then it's
guaranteed to be before the second > character.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
https://github.com/ampproject/amphtml/issues/2380#issuecomment-197560035>


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2380 (comment)

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 16, 2016

I'm new to AMP, but I was under the impression that the DOCTYPE was dictated as HTML5: https://www.ampproject.org/docs/reference/spec.html

If not, and any valid DOCTYPE is acceptable then something more like 256 bytes would be better, though we've just doubled the proposed sniffer requirement and I'm thinking the content-type is still preferable.

buro9 commented Mar 16, 2016

I'm new to AMP, but I was under the impression that the DOCTYPE was dictated as HTML5: https://www.ampproject.org/docs/reference/spec.html

If not, and any valid DOCTYPE is acceptable then something more like 256 bytes would be better, though we've just doubled the proposed sniffer requirement and I'm thinking the content-type is still preferable.

@jridgewell

This comment has been minimized.

Show comment
Hide comment
@jridgewell

jridgewell Mar 16, 2016

Member

The first 100 bytes might be tight.

Not an issue, we mandate <!doctype html> (HTML5) doctype.

Member

jridgewell commented Mar 16, 2016

The first 100 bytes might be tight.

Not an issue, we mandate <!doctype html> (HTML5) doctype.

@Gregable

This comment has been minimized.

Show comment
Hide comment
@Gregable

Gregable Mar 16, 2016

Member

Content-type would be nice, and could be recommended, but plenty of content creators don't have the ability or the know-how to modify headers, therefore I think we will want to support tags.

We do mandate <!doctype html> for a valid doc currently. Nobody has proposed supporting DTDs yet, though if we want them we should resolve that asap.

The value of 100 was a strawman. I like 256 even more. I could try to get an estimate of what we're seeing in the wild, but given that DTDs are not allowed now, it might not be a good metric to look at.

Not allowing comments would be possible, but is tricky for some validation implementations if they run on top of an HTML parser which doesn't have callbacks for comments. We'd also definitely need to look at this in the wild, there may be some CMS systems that need a comment header for reasons. tl;dr: needs more research.

How about 256? And to be specific, I mean the trailing > in the <html amp> tag must be the 256'th byte or lower.

Member

Gregable commented Mar 16, 2016

Content-type would be nice, and could be recommended, but plenty of content creators don't have the ability or the know-how to modify headers, therefore I think we will want to support tags.

We do mandate <!doctype html> for a valid doc currently. Nobody has proposed supporting DTDs yet, though if we want them we should resolve that asap.

The value of 100 was a strawman. I like 256 even more. I could try to get an estimate of what we're seeing in the wild, but given that DTDs are not allowed now, it might not be a good metric to look at.

Not allowing comments would be possible, but is tricky for some validation implementations if they run on top of an HTML parser which doesn't have callbacks for comments. We'd also definitely need to look at this in the wild, there may be some CMS systems that need a comment header for reasons. tl;dr: needs more research.

How about 256? And to be specific, I mean the trailing > in the <html amp> tag must be the 256'th byte or lower.

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 16, 2016

256 bytes is good, I'll run the proposal internally tomorrow. We'd not stop asking for the content-type (especially as there is incompatibility with a user agents expectation of what text/html means and what it's about to receive), but lacking that... this constraint is a good one.

I mean the trailing > in the tag must be the 256'th byte or lower.

Agreed.

We would expect to implement a sniffer that would exit once it encounters either the closing > on the HTML tag, or 256 bytes. Whichever is sooner.

buro9 commented Mar 16, 2016

256 bytes is good, I'll run the proposal internally tomorrow. We'd not stop asking for the content-type (especially as there is incompatibility with a user agents expectation of what text/html means and what it's about to receive), but lacking that... this constraint is a good one.

I mean the trailing > in the tag must be the 256'th byte or lower.

Agreed.

We would expect to implement a sniffer that would exit once it encounters either the closing > on the HTML tag, or 256 bytes. Whichever is sooner.

@dvoytenko

This comment has been minimized.

Show comment
Hide comment
@dvoytenko

dvoytenko Mar 16, 2016

Collaborator

I think if we scan 256 bytes - might as well do 1000 bytes - that doesn't change anything but would remove most of edge cases.

Collaborator

dvoytenko commented Mar 16, 2016

I think if we scan 256 bytes - might as well do 1000 bytes - that doesn't change anything but would remove most of edge cases.

@dvoytenko

This comment has been minimized.

Show comment
Hide comment
@dvoytenko

dvoytenko Mar 16, 2016

Collaborator

Also, i think it'd be just as easy to wait until the first > to do the check - it's common for server-side filters to look in the <!doctype> and <html> declarations.

Collaborator

dvoytenko commented Mar 16, 2016

Also, i think it'd be just as easy to wait until the first > to do the check - it's common for server-side filters to look in the <!doctype> and <html> declarations.

@buro9

This comment has been minimized.

Show comment
Hide comment
@buro9

buro9 Mar 17, 2016

Please no.

1000 bytes is excessive. We serve a lot of pages and need to think of both performance and security. 256 is sufficient for performance whilst also mitigating some of the risk of resource exhaustion.

We definitely wouldn't wait for the first > , what if someone serves a 900MB video file as text/html and we're sitting there buffering it all during a sniff operation in anticipation of a character that won't arrive. People do this stuff every day, best not to ask why they do and accept that they do.

If we are going to sniff all text/html (something we're not doing today) and cannot just use a content-type (the right solution), we really want well defined rules for this. First 256 bytes to include the closing HTML tag, and contains either AMP or lightning emoji - this is a good rule and not incompatible with anything in the AMP spec. It just clarifies and defines something currently ambiguous.

There is no need to look further into the document, we just want to know very quickly whether this is an AMP doc or not.

The issue that mod_pagespeed also describe on their linked issue above with http-equiv meta refresh is similar... they would want to detect AMP very early so that they could adjust the HTTP headers if needed. Once they have gone beyond the sniff operation they (like us) want to be streaming that content and not holding it in a buffer.

All of us who provide reverse proxies, transparent proxies, transformation proxies, need to be able to be sure that we can identify AMP extremely early, within the scope of a single tiny buffer.

buro9 commented Mar 17, 2016

Please no.

1000 bytes is excessive. We serve a lot of pages and need to think of both performance and security. 256 is sufficient for performance whilst also mitigating some of the risk of resource exhaustion.

We definitely wouldn't wait for the first > , what if someone serves a 900MB video file as text/html and we're sitting there buffering it all during a sniff operation in anticipation of a character that won't arrive. People do this stuff every day, best not to ask why they do and accept that they do.

If we are going to sniff all text/html (something we're not doing today) and cannot just use a content-type (the right solution), we really want well defined rules for this. First 256 bytes to include the closing HTML tag, and contains either AMP or lightning emoji - this is a good rule and not incompatible with anything in the AMP spec. It just clarifies and defines something currently ambiguous.

There is no need to look further into the document, we just want to know very quickly whether this is an AMP doc or not.

The issue that mod_pagespeed also describe on their linked issue above with http-equiv meta refresh is similar... they would want to detect AMP very early so that they could adjust the HTTP headers if needed. Once they have gone beyond the sniff operation they (like us) want to be streaming that content and not holding it in a buffer.

All of us who provide reverse proxies, transparent proxies, transformation proxies, need to be able to be sure that we can identify AMP extremely early, within the scope of a single tiny buffer.

@jmadler

This comment has been minimized.

Show comment
Hide comment
@jmadler

jmadler Mar 17, 2016

Contributor

Let's first agree on an approach and then discuss content-type and scan maximum size.

My concern with the approach of "closing caret of html tag must be within 256 bytes and html tag must contain 'AMP' or '⚡', case insensitive" is that it may not be as performant as typical approaches for identifying content.

What about something like "'<html AMP' or '<html ⚡' must appear in the first N bytes, case insensitive, whitespace compressed". This falls in line with typical rulesets for magic words. However, it does mean that is invalid.

Does that approach make sense?

Contributor

jmadler commented Mar 17, 2016

Let's first agree on an approach and then discuss content-type and scan maximum size.

My concern with the approach of "closing caret of html tag must be within 256 bytes and html tag must contain 'AMP' or '⚡', case insensitive" is that it may not be as performant as typical approaches for identifying content.

What about something like "'<html AMP' or '<html ⚡' must appear in the first N bytes, case insensitive, whitespace compressed". This falls in line with typical rulesets for magic words. However, it does mean that is invalid.

Does that approach make sense?

@jmarantz

This comment has been minimized.

Show comment
Hide comment
@jmarantz

jmarantz Mar 18, 2016

I am aligned with Cloudflare's concerns about the tag being in the first
256 bytes. The in-flight mod_pagespeed implementation doesn't have that
limit but probably it should, to reduce the risk of unbounded buffer growth.

I don't think it makes a lot of sense to compress whitespace when computing
that window. Either the proxy has to first the sniff N bytes or it has to
do something more complicated.

I am not a fan of requiring that exact byte sequence "<html AMP". In all
other cases, HTML attributes are not order sensitive so I don't think the
standard should dictate that this one is order sensitive either.

I also think that if someone puts:

in the first 256 bytes, that should not be recognized as an AMP
document. So just scanning for that byte sequence in the first 256 bytes
seems wrong to me.

FWIW here is our open-sourced amp-recognition filter,
currently insensitive to byte-count. Outside this context, MPS sniffs all
content-type:text/html input, expecting the first non-whitespace character
to be a '<'. This implementation ignores all whitespace and comments, but
will give up at the first tag other than <doctype ...> or simply

, as well as at the first non-whitespace character.

https://github.com/pagespeed/mod_pagespeed/blob/master/pagespeed/kernel/html/amp_document_filter.cc

We can easily enforce the 256-character limit outside the context of
this filter by only letting this filter run on the first 256 bytes of
text we see, Our full HTML lexer runs in front of this, but I don't
think that'll be a speed issue.

I am aligned with Cloudflare's concerns about the tag being in the first
256 bytes. The in-flight mod_pagespeed implementation doesn't have that
limit but probably it should, to reduce the risk of unbounded buffer growth.

I don't think it makes a lot of sense to compress whitespace when computing
that window. Either the proxy has to first the sniff N bytes or it has to
do something more complicated.

I am not a fan of requiring that exact byte sequence "<html AMP". In all
other cases, HTML attributes are not order sensitive so I don't think the
standard should dictate that this one is order sensitive either.

I also think that if someone puts:

in the first 256 bytes, that should not be recognized as an AMP
document. So just scanning for that byte sequence in the first 256 bytes
seems wrong to me.

FWIW here is our open-sourced amp-recognition filter,
currently insensitive to byte-count. Outside this context, MPS sniffs all
content-type:text/html input, expecting the first non-whitespace character
to be a '<'. This implementation ignores all whitespace and comments, but
will give up at the first tag other than <doctype ...> or simply

, as well as at the first non-whitespace character.

https://github.com/pagespeed/mod_pagespeed/blob/master/pagespeed/kernel/html/amp_document_filter.cc

We can easily enforce the 256-character limit outside the context of
this filter by only letting this filter run on the first 256 bytes of
text we see, Our full HTML lexer runs in front of this, but I don't
think that'll be a speed issue.

jmarantz added a commit to apache/incubator-pagespeed-mod that referenced this issue Mar 19, 2016

@rudygalfi rudygalfi added this to the Sprint 2016-03-31 [current] milestone Mar 24, 2016

@rudygalfi

This comment has been minimized.

Show comment
Hide comment
@rudygalfi

rudygalfi Mar 24, 2016

Contributor

Just wanted to see if folks had any updates to share here?

Contributor

rudygalfi commented Mar 24, 2016

Just wanted to see if folks had any updates to share here?

@dvoytenko dvoytenko removed this from the Sprint: 2016-05-12 milestone May 23, 2016

@rudygalfi rudygalfi modified the milestone: Sprint: 2016-05-26 May 23, 2016

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 6, 2016

@venkatgig Just add:
// Disable newrelic if (extension_loaded('newrelic')) { newrelic_disable_autorum(); }

Somewhere in your template PHP

@venkatgig Just add:
// Disable newrelic if (extension_loaded('newrelic')) { newrelic_disable_autorum(); }

Somewhere in your template PHP

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 8, 2016

@RReverser We are still having a user with this problem, though the code inserted is not exactly the same as initially, I think:

`<script type="text/javascript">
//
</script>

<script data-pagespeed-no-defer>(function(){function d(b){var a=window;if(a.addEventListener)a.addEventListener("load",b,!1);else if(a.attachEvent)a.attachEvent("onload",b);else{var c=a.onload;a.onload=function(){b.call(this);c&&c.call(this)}}}var p=Date.now||function(){return+new Date};window.pagespeed=window.pagespeed||{};var q=window.pagespeed;function r(){this.a=!0}r.prototype.c=function(b){b=parseInt(b.substring(0,b.indexOf(" ")),10);return!isNaN(b)&&b<=p()};r.prototype.hasExpired=r.prototype.c;r.prototype.b=function(b){return b.substring(b.indexOf(" ",b.indexOf(" ")+1)+1)};r.prototype.getData=r.prototype.b;r.prototype.f=function(b){var a=document.getElementsByTagName("script"),a=a[a.length-1];a.parentNode.replaceChild(b,a)};r.prototype.replaceLastScript=r.prototype.f; r.prototype.g=function(b){var a=window.localStorage.getItem("pagespeed_lsc_url:"+b),c=document.createElement(a?"style":"link");a&&!this.c(a)?(c.type="text/css",c.appendChild(document.createTextNode(this.b(a)))):(c.rel="stylesheet",c.href=b,this.a=!0);this.f(c)};r.prototype.inlineCss=r.prototype.g;` Can you advise any workaround? Thanks

weeblr commented Jun 8, 2016

@RReverser We are still having a user with this problem, though the code inserted is not exactly the same as initially, I think:

`<script type="text/javascript">
//
</script>

<script data-pagespeed-no-defer>(function(){function d(b){var a=window;if(a.addEventListener)a.addEventListener("load",b,!1);else if(a.attachEvent)a.attachEvent("onload",b);else{var c=a.onload;a.onload=function(){b.call(this);c&&c.call(this)}}}var p=Date.now||function(){return+new Date};window.pagespeed=window.pagespeed||{};var q=window.pagespeed;function r(){this.a=!0}r.prototype.c=function(b){b=parseInt(b.substring(0,b.indexOf(" ")),10);return!isNaN(b)&&b<=p()};r.prototype.hasExpired=r.prototype.c;r.prototype.b=function(b){return b.substring(b.indexOf(" ",b.indexOf(" ")+1)+1)};r.prototype.getData=r.prototype.b;r.prototype.f=function(b){var a=document.getElementsByTagName("script"),a=a[a.length-1];a.parentNode.replaceChild(b,a)};r.prototype.replaceLastScript=r.prototype.f; r.prototype.g=function(b){var a=window.localStorage.getItem("pagespeed_lsc_url:"+b),c=document.createElement(a?"style":"link");a&&!this.c(a)?(c.type="text/css",c.appendChild(document.createTextNode(this.b(a)))):(c.rel="stylesheet",c.href=b,this.a=!0);this.f(c)};r.prototype.inlineCss=r.prototype.g;` Can you advise any workaround? Thanks
@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 9, 2016

First part: Cloudflare insertin GA. Disable the app on Cloudflare & add GA in your template instead.
Second part: Google pagespeed module. You must completely disable that for all AMP pages. (by creating a location in your NGINX for example) To do a test you could of course disable pagespeed alltogether.

Joostvanderlaan commented Jun 9, 2016

First part: Cloudflare insertin GA. Disable the app on Cloudflare & add GA in your template instead.
Second part: Google pagespeed module. You must completely disable that for all AMP pages. (by creating a location in your NGINX for example) To do a test you could of course disable pagespeed alltogether.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 9, 2016

Hi @Joostvanderlaan

There must be a confusion in your reading of my past message. All of this JS is inserted by Cloudflare on the fly. From past discussion on this thread, Cloudflare is supposed now to leave AMP pages alone, unmodified, it seems they don't yet.

Rgds

weeblr commented Jun 9, 2016

Hi @Joostvanderlaan

There must be a confusion in your reading of my past message. All of this JS is inserted by Cloudflare on the fly. From past discussion on this thread, Cloudflare is supposed now to leave AMP pages alone, unmodified, it seems they don't yet.

Rgds

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 9, 2016

Then still, you could stop Cloudflare from doing that by disabling the App in Cloudflare that injects javascript OR by creating a PageRule that disables the problemetic Cloudflare options. (Most of the Cloudflare options don't even inject JS)

(I'm using AMP + Cloudflare without problems)

Joostvanderlaan commented Jun 9, 2016

Then still, you could stop Cloudflare from doing that by disabling the App in Cloudflare that injects javascript OR by creating a PageRule that disables the problemetic Cloudflare options. (Most of the Cloudflare options don't even inject JS)

(I'm using AMP + Cloudflare without problems)

@RReverser

This comment has been minimized.

Show comment
Hide comment
@RReverser

RReverser Jun 9, 2016

@weeblr Actually this second part from your example makes me suspicious too:

<script data-pagespeed-no-defer>(function()...

and

window.pagespeed=window.pagespeed||{};var q=window.pagespeed;function r(){this.a=!0}r.prototype.c=function(b){b=parseInt(b.substring(0,b.indexOf(" ")),10);return!isNaN(b)&&b<=p()};r.prototype.hasExpired=r.prototype.c;r.prototype.b=function(b){return b.substring(b.indexOf(" ",b.indexOf(" ")+1)+1)};r.prototype.getData=r.prototype.b;

This looks like PageSpeed module and not CloudFlare.

If you're certain that it's not, can you please show URL where this is happening so that we could look into it?

RReverser commented Jun 9, 2016

@weeblr Actually this second part from your example makes me suspicious too:

<script data-pagespeed-no-defer>(function()...

and

window.pagespeed=window.pagespeed||{};var q=window.pagespeed;function r(){this.a=!0}r.prototype.c=function(b){b=parseInt(b.substring(0,b.indexOf(" ")),10);return!isNaN(b)&&b<=p()};r.prototype.hasExpired=r.prototype.c;r.prototype.b=function(b){return b.substring(b.indexOf(" ",b.indexOf(" ")+1)+1)};r.prototype.getData=r.prototype.b;

This looks like PageSpeed module and not CloudFlare.

If you're certain that it's not, can you please show URL where this is happening so that we could look into it?

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 9, 2016

@Joostvanderlaan

Then still, you could stop Cloudflare from doing that by disabling the App in Cloudflare that injects javascript"

The question here is not to "solve the problem". It's to understand why this is happening, and whether it's expected (by Cloudflare) behavior.

Again, it was said that Cloudflare would not "touch" any AMP page from now on. Whether you are affected or not may depends on the Cloudflare features you use, right?

@RReverser You are right. I have sent a message to customer, as they seem to have solved the issue now (this is not added to their pages any longer), asking what they did exactly. I cannot share the URL w/o their consent, but again, the JS is not there any more.

I'll post back (if if they reply!) with any info I can get.

Thanks for your replies.

weeblr commented Jun 9, 2016

@Joostvanderlaan

Then still, you could stop Cloudflare from doing that by disabling the App in Cloudflare that injects javascript"

The question here is not to "solve the problem". It's to understand why this is happening, and whether it's expected (by Cloudflare) behavior.

Again, it was said that Cloudflare would not "touch" any AMP page from now on. Whether you are affected or not may depends on the Cloudflare features you use, right?

@RReverser You are right. I have sent a message to customer, as they seem to have solved the issue now (this is not added to their pages any longer), asking what they did exactly. I cannot share the URL w/o their consent, but again, the JS is not there any more.

I'll post back (if if they reply!) with any info I can get.

Thanks for your replies.

@RReverser

This comment has been minimized.

Show comment
Hide comment
@RReverser

RReverser Jun 9, 2016

@weeblr Cool, glad it's resolved.

@weeblr Cool, glad it's resolved.

@jmarantz

This comment has been minimized.

Show comment
Hide comment
@jmarantz

jmarantz Jun 9, 2016

For the record, the next release of PageSpeed will automatically disable
any injection of scripts or other non-amp-compliant changes, whenever
serving an AMP document.

In the meantime, a good workaround is to use
ModPagespeedRewriteLevel OptimizeForBandwidth (apache)
pagespeed RewriteLevel OptimizeForBandwidth; (nginx)
in the /amp subdirectory.

-Josh

On Thu, Jun 9, 2016 at 6:47 AM, Ingvar Stepanyan notifications@github.com
wrote:

@weeblr https://github.com/weeblr Cool, glad it's resolved.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2380 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AB2kPXhAQKNPtrETd8O8NuHVY--9Zc7iks5qJ-8sgaJpZM4Hnltf
.

jmarantz commented Jun 9, 2016

For the record, the next release of PageSpeed will automatically disable
any injection of scripts or other non-amp-compliant changes, whenever
serving an AMP document.

In the meantime, a good workaround is to use
ModPagespeedRewriteLevel OptimizeForBandwidth (apache)
pagespeed RewriteLevel OptimizeForBandwidth; (nginx)
in the /amp subdirectory.

-Josh

On Thu, Jun 9, 2016 at 6:47 AM, Ingvar Stepanyan notifications@github.com
wrote:

@weeblr https://github.com/weeblr Cool, glad it's resolved.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2380 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AB2kPXhAQKNPtrETd8O8NuHVY--9Zc7iks5qJ-8sgaJpZM4Hnltf
.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 9, 2016

Thanks for this information. Few users have actually control over their servers. Was wondering if adding a header would also be a valid alternative solution? as in:
Cache-Control: no-transform

Rgds

weeblr commented Jun 9, 2016

Thanks for this information. Few users have actually control over their servers. Was wondering if adding a header would also be a valid alternative solution? as in:
Cache-Control: no-transform

Rgds

@jmarantz

This comment has been minimized.

Show comment
Hide comment
@jmarantz

jmarantz Jun 9, 2016

RE disabling PageSpeed via server controls: usually a user who has
installed mod_pagespeed/ngx_pagespeed will be in control of how to adjust
the settings. I guess you are talking about situations where a hosting
provider has installed mod_pagespeed for a site without giving them any
control? Do you see that a lot?

Anyway, in that situation, adding
response-header cache-control:no-transform will work, as will PageSpeed:off.

On Thu, Jun 9, 2016 at 8:58 AM, Weeblr notifications@github.com wrote:

Thanks for this information. Few users have actually control over their
servers. Was wondering if adding a header would also be a valid alternative
solution? as in:
Cache-Control: no-transform

Rgds


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2380 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AB2kPWCP5c0oMpyDyz4lWrwf90qD409gks5qKA4TgaJpZM4Hnltf
.

jmarantz commented Jun 9, 2016

RE disabling PageSpeed via server controls: usually a user who has
installed mod_pagespeed/ngx_pagespeed will be in control of how to adjust
the settings. I guess you are talking about situations where a hosting
provider has installed mod_pagespeed for a site without giving them any
control? Do you see that a lot?

Anyway, in that situation, adding
response-header cache-control:no-transform will work, as will PageSpeed:off.

On Thu, Jun 9, 2016 at 8:58 AM, Weeblr notifications@github.com wrote:

Thanks for this information. Few users have actually control over their
servers. Was wondering if adding a header would also be a valid alternative
solution? as in:
Cache-Control: no-transform

Rgds


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2380 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AB2kPWCP5c0oMpyDyz4lWrwf90qD409gks5qKA4TgaJpZM4Hnltf
.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 9, 2016

Re: hosting companies, I assume so, and I don't have a large sample enough for a valid opinion.
I will add the PageSpeed:Off header anyway, doesn't cost much and can save some head scratching for some.
Thnks

weeblr commented Jun 9, 2016

Re: hosting companies, I assume so, and I don't have a large sample enough for a valid opinion.
I will add the PageSpeed:Off header anyway, doesn't cost much and can save some head scratching for some.
Thnks

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 9, 2016

@weeblr could you make that an option? Since I would still like to use pagespeed for my images :)

@weeblr could you make that an option? Since I would still like to use pagespeed for my images :)

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 9, 2016

Well I am doing the Joomla AMP extension, so I was only talking about the HTML document response. Assets are not going through the CMS, and are simply served by the server, so there's no issue there.

weeblr commented Jun 9, 2016

Well I am doing the Joomla AMP extension, so I was only talking about the HTML document response. Assets are not going through the CMS, and are simply served by the server, so there's no issue there.

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 14, 2016

Example;
I want to optimize images only (to save bandwith). PageSpeed can do that without changing the HTML and without injecting any JS. So this will work fine with AMP. Bonus, my images will be smaller and served faster. That's what AMP is about, right? Being fast and small.

Now when you disable PageSpeed on the HTML, the above wouldn't work. So at least make this an option in the extension if you really must add this. At least people who do have a serious webhost (including control of their server) can decide for themselves if they want PageSpeed or not.

Joostvanderlaan commented Jun 14, 2016

Example;
I want to optimize images only (to save bandwith). PageSpeed can do that without changing the HTML and without injecting any JS. So this will work fine with AMP. Bonus, my images will be smaller and served faster. That's what AMP is about, right? Being fast and small.

Now when you disable PageSpeed on the HTML, the above wouldn't work. So at least make this an option in the extension if you really must add this. At least people who do have a serious webhost (including control of their server) can decide for themselves if they want PageSpeed or not.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 14, 2016

Still a bit confused. InPlace optimization means the images URLs are not altered. Optimization happens independently of the HTML, so adding PageSpeed off on the HTML should not prevent Pagespeed to optimize your images

weeblr commented Jun 14, 2016

Still a bit confused. InPlace optimization means the images URLs are not altered. Optimization happens independently of the HTML, so adding PageSpeed off on the HTML should not prevent Pagespeed to optimize your images

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 14, 2016

Pagespeed still needs to read the image urls from the HTML. That's why it needs to be enabled for that html (or better: amphtml) page.

Pagespeed still needs to read the image urls from the HTML. That's why it needs to be enabled for that html (or better: amphtml) page.

@weeblr

This comment has been minimized.

Show comment
Hide comment

weeblr commented Jun 14, 2016

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 14, 2016

ok, you're right in this case. Still, there's other filters like trim url's that could be useful with AMP.
https://developers.google.com/speed/pagespeed/module/filter-trim-urls#operation

ok, you're right in this case. Still, there's other filters like trim url's that could be useful with AMP.
https://developers.google.com/speed/pagespeed/module/filter-trim-urls#operation

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 14, 2016

Actually, that's probably another example that would not work, as using the "base' tag is not allowed in AMP html spec (not sure why actually, maybe to make link resolution easier for AMP js?).
It looks like each feature in PS would have to be reviewed and checked for validity.

weeblr commented Jun 14, 2016

Actually, that's probably another example that would not work, as using the "base' tag is not allowed in AMP html spec (not sure why actually, maybe to make link resolution easier for AMP js?).
It looks like each feature in PS would have to be reviewed and checked for validity.

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 14, 2016

There is a base tag in the example, though I am not sure that needs to be explicitly there. couldn't it trim the urls without the base tag? (it knows the domain from the config and request uri)

There is a base tag in the example, though I am not sure that needs to be explicitly there. couldn't it trim the urls without the base tag? (it knows the domain from the config and request uri)

@RReverser

This comment has been minimized.

Show comment
Hide comment
@RReverser

RReverser Jun 14, 2016

@Joostvanderlaan in any case such transformation is unlikely to give any benefits if you're using gzip.

@Joostvanderlaan in any case such transformation is unlikely to give any benefits if you're using gzip.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 14, 2016

Though I'm not familiar with PS, trimming seems to be the act of putting the base URL in the base tag, and removing all unneeded parts of URLs on the page, to make them relative to the base tag content. Don't see what could be trimmed otherwise.

weeblr commented Jun 14, 2016

Though I'm not familiar with PS, trimming seems to be the act of putting the base URL in the base tag, and removing all unneeded parts of URLs on the page, to make them relative to the base tag content. Don't see what could be trimmed otherwise.

@Joostvanderlaan

This comment has been minimized.

Show comment
Hide comment
@Joostvanderlaan

Joostvanderlaan Jun 14, 2016

A relative url does not need a base tag. Trimming means stripping the domain and making things relative. Not also putting that domain in the base tag. As seen in the PS example, the base url is in both the original + the output, nothing changed there.
You could use it for other tricks like relatively linking all assets in example.com/assets/ as seen here http://webdesign.tutsplus.com/articles/quick-tip-set-relative-urls-with-the-base-tag--cms-21399

A relative URL exists for a long time, even without base tags they work.

Anyway, we're drifting. The point is, why would you add headers that disable PS altogether in an extension, while other people might have the need to use both PS and your extension? It just doesn't make any sense to disable pagespeed by default for all AMP pages.

After all, every removed byte, or even better, millisecond of decreased latency, is a performance gain. And that's what AMP is about.

A relative url does not need a base tag. Trimming means stripping the domain and making things relative. Not also putting that domain in the base tag. As seen in the PS example, the base url is in both the original + the output, nothing changed there.
You could use it for other tricks like relatively linking all assets in example.com/assets/ as seen here http://webdesign.tutsplus.com/articles/quick-tip-set-relative-urls-with-the-base-tag--cms-21399

A relative URL exists for a long time, even without base tags they work.

Anyway, we're drifting. The point is, why would you add headers that disable PS altogether in an extension, while other people might have the need to use both PS and your extension? It just doesn't make any sense to disable pagespeed by default for all AMP pages.

After all, every removed byte, or even better, millisecond of decreased latency, is a performance gain. And that's what AMP is about.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Jun 14, 2016

You're right with that, the base tag is not needed.
This discussion however totally convinced me that for a distributed product such as mine, from a support standpoint, making this a user-available setting is a bad idea.
Remember, we are not talking about disabling PS on AMP pages, only on the HTML. Meaning that resources (images mostly in our case) will still be optimized.
And indeed, this is not the right place to have this discussion!

weeblr commented Jun 14, 2016

You're right with that, the base tag is not needed.
This discussion however totally convinced me that for a distributed product such as mine, from a support standpoint, making this a user-available setting is a bad idea.
Remember, we are not talking about disabling PS on AMP pages, only on the HTML. Meaning that resources (images mostly in our case) will still be optimized.
And indeed, this is not the right place to have this discussion!

@dknecht

This comment has been minimized.

Show comment
Hide comment
@dknecht

dknecht Aug 31, 2016

Contributor

@cramforce Can we re-name or close this ticket? CloudFlare no longer has this issue.

Contributor

dknecht commented Aug 31, 2016

@cramforce Can we re-name or close this ticket? CloudFlare no longer has this issue.

@cramforce

This comment has been minimized.

Show comment
Hide comment
@cramforce

cramforce Aug 31, 2016

Member

Definitely. Lets move other topics to its own issue.

Member

cramforce commented Aug 31, 2016

Definitely. Lets move other topics to its own issue.

@cramforce cramforce closed this Aug 31, 2016

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Sep 15, 2016

Hi again

Just a quick note to mention that Incapsula, another significant CDN provider, is doing about the same and inserts some (analytics) javascript in the fly, not skipping AMP pages.

Still some communication work to do it seems...

Cheers

weeblr commented Sep 15, 2016

Hi again

Just a quick note to mention that Incapsula, another significant CDN provider, is doing about the same and inserts some (analytics) javascript in the fly, not skipping AMP pages.

Still some communication work to do it seems...

Cheers

@cramforce

This comment has been minimized.

Show comment
Hide comment
@cramforce

cramforce Sep 15, 2016

Member

Mind opening up a new bug? Do you have contacts there you could ping?

On Thu, Sep 15, 2016 at 2:22 AM, Weeblr notifications@github.com wrote:

Hi again

Just a quick note to mention that Incapsula, another significant CDN
provider, is doing about the same and inserts some (analytics) javascript
in the fly, not skipping AMP pages.

Still some communication work to do it seems...

Cheers


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#2380 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFeT_LKG01-XtLnoiw0B4zZJnQe5l_lks5qqQ5vgaJpZM4Hnltf
.

Member

cramforce commented Sep 15, 2016

Mind opening up a new bug? Do you have contacts there you could ping?

On Thu, Sep 15, 2016 at 2:22 AM, Weeblr notifications@github.com wrote:

Hi again

Just a quick note to mention that Incapsula, another significant CDN
provider, is doing about the same and inserts some (analytics) javascript
in the fly, not skipping AMP pages.

Still some communication work to do it seems...

Cheers


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#2380 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFeT_LKG01-XtLnoiw0B4zZJnQe5l_lks5qqQ5vgaJpZM4Hnltf
.

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Sep 16, 2016

Did that here #5051

I have no contact at all :)

Rgds

weeblr commented Sep 16, 2016

Did that here #5051

I have no contact at all :)

Rgds

@Incaprules

This comment has been minimized.

Show comment
Hide comment
@Incaprules

Incaprules Sep 21, 2016

Hello all,
I'm your contact from Incapsula.
We've reviewed this thread and working on a solution

Hello all,
I'm your contact from Incapsula.
We've reviewed this thread and working on a solution

@mattpramschufer

This comment has been minimized.

Show comment
Hide comment
@mattpramschufer

mattpramschufer Nov 2, 2016

Has there been any update on this yet?

Has there been any update on this yet?

@Incaprules

This comment has been minimized.

Show comment
Hide comment
@Incaprules

Incaprules Nov 2, 2016

Yes, Incapsula now avoids injections in AMP pages automatically for all sites on all accounts

Yes, Incapsula now avoids injections in AMP pages automatically for all sites on all accounts

@weeblr

This comment has been minimized.

Show comment
Hide comment
@weeblr

weeblr Nov 2, 2016

This was tracked on #5051.

weeblr commented Nov 2, 2016

This was tracked on #5051.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment