New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPRO turns `application/json` into `application/javascript` #1321

Closed
laborima opened this Issue Jun 21, 2016 · 20 comments

Comments

Projects
None yet
4 participants
@laborima
Copy link

laborima commented Jun 21, 2016

mod-pagespeed-stable 1.11.33.2-r0 under apache

Since the last update of pagespeed using default options Ajax Json call gives

Without pagespeed :

Cache-Control: max-age=3600
Connection: Keep-Alive
Content-Encoding: gzip
Content-Language: fr
Content-Length: 786
Content-Type: application/json;charset=UTF-8
Date: Tue, 21 Jun 2016 07:33:24 GMT
Keep-Alive: timeout=15, max=100
Vary: Accept-Encoding
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
x-content-type-options: nosniff

With pagespeed:

Cache-Control: max-age=1421
Connection: Keep-Alive
Content-Encoding: gzip
Content-Language: fr
Content-Length: 707
Content-Type: application/javascript; charset=UTF-8
Date: Tue, 21 Jun 2016 07:41:14 GMT
Etag: W/"PSA-aj-1IRiir6u8x"
Expires: Tue, 21 Jun 2016 08:04:56 GMT
Keep-Alive: timeout=15, max=100
Vary: Accept-Encoding
X-Frame-Options: SAMEORIGIN
X-Original-Content-Length: 3953
X-XSS-Protection: 1; mode=block
x-content-type-options: nosniff, nosniff

Notice the Content-Type: application/javascript; charset=UTF-8 which break our Ajax call. Haven't find any way to disabled it. The server full json response.

columns.txt

Regards

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented Jun 21, 2016

You could add no-transform to the Cache-Control response-headers for your json responses to exclude them from being processed by mod_pagespeed:
https://developers.google.com/speed/pagespeed/module/configuration#notransform

Alternatively you could also exclude any urls that result in json responses by disallowing them:
https://developers.google.com/speed/pagespeed/module/restricting_urls

@laborima

This comment has been minimized.

Copy link

laborima commented Jun 21, 2016

Hi,

Thank you for this fast answer. But it doesn't explain why for some (majority) json responses I get
Content-Type: application/json;charset=UTF-8 and for this one I get Content-Type: application/javascript; charset=UTF-8. Url is the same only parameters and returned content change. Content is always valid json and server set Content-type headers to application/json. PageSpeed Off headers and JSON are fine.

Ps: I have no way to add no-transform to the server REST API, and it can affects a lot of uris (All our REST API)

Regards

@crowell

This comment has been minimized.

Copy link
Contributor

crowell commented Jun 21, 2016

What page(s) are you getting the application/javascript header instead of
json? Does it happen 100% of the time on those specific pages, and not
happen 100% of the time on others?

On Tue, Jun 21, 2016 at 4:14 AM, laborima notifications@github.com wrote:

Hi,

Thank you for this fast answer. But it doesn't explain why for some
(majority) json responses I get
Content-Type: application/json;charset=UTF-8 and for this one I get
Content-Type: application/javascript; charset=UTF-8. Url is the same only
parameters and returned content change. Content is always valid json and
server set Content-type headers to application/json. PageSpeed Off headers
and JSON are fine.

Regards


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1321 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAlnYGYchI3vpxOLdxwi-IMCIL1uM7otks5qN51QgaJpZM4I6ckK
.

@laborima

This comment has been minimized.

Copy link

laborima commented Jun 21, 2016

Hi,

It happen 100% of the time for this url:
https://blabla/service/module/entity-datagrid/config/columns?itemType=bcpg%3AnutList&list=nutList
returning:
ret1.txt

And never for this url:
https://blabla/service/module/entity-datagrid/config/columns?itemType=bcpg%3AcostList&list=costList
returning:
ret2.txt

Both url return same header from server without pagespeed. With pagespeed content-type is different for the first one.

Regards

@crowell

This comment has been minimized.

Copy link
Contributor

crowell commented Jun 21, 2016

rewritten json resources get the application/javascript content-type from pagespeed. (see here https://github.com/pagespeed/mod_pagespeed/blob/master/pagespeed/kernel/http/content_type.cc#L51 )

can you share the headers from the one that doesn't get the application/javascript header?

@laborima

This comment has been minimized.

Copy link

laborima commented Jun 21, 2016

If you look more deeper in the file https://github.com/pagespeed/mod_pagespeed/blob/master/pagespeed/kernel/http/content_type.cc#L71 it returns also application/json

We are using YUI2 as javascript library which doesn't recognize application/javascript as json but as text.

Both headers are the same:

Without pagespeed :

Cache-Control: max-age=3600
Connection: Keep-Alive
Content-Encoding: gzip
Content-Language: fr
Content-Length: 786
Content-Type: application/json;charset=UTF-8
Date: Tue, 21 Jun 2016 07:33:24 GMT
Keep-Alive: timeout=15, max=100
Vary: Accept-Encoding
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
x-content-type-options: nosniff

Pagespeed should'nt transform original content-type. It wasn't the case in the previous stable release of pagespeed.

Regards

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jun 27, 2016

Testing locally with ret1 and ret2 above, what I see is that once PageSpeed finishes optimizing the files with IPRO they're turned from application/json to application/javascript, which seems like it's causing your problem. I don't know why you weren't seeing this with ret2, but my guess is that something about the content or the headers was keeping PageSpeed from optimizing it so it never got far enough to turn into application/javascript.

Regardless, this is a bug, and we should minify application/json resources without changing their content-type.

@jeffkaufman jeffkaufman changed the title PageSpeed rewrite content-type header IPRO turns `application/json` into `application/javascript` Jun 27, 2016

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 14, 2016

We have InPlaceRewriteContextTest::JsonWithJsonContentTypeSucceeds which looks like it is testing that when IPROing json we leave it as json, and that test passes even though in an end to end test we would fail.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 14, 2016

Talking to someone in security here, serving json as application/json is dangerous because it might get sniffed to other content types. Keeping existing things as application/json doesn't make sites any less secure however, so it's fine for us to preserve this content type in IPRO.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 14, 2016

Looking at our http cache, we are recording the input resource as Content-Type: application/json and the output resource as Content-Type: application/javascript.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 14, 2016

We're recording the output resource as application/javascript because we're using ContentType::kJavascript; this isn't about ContentType::kJson.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 14, 2016

It looks like all we need to do to fix this is to fix WriteExternalScriptTo to use IsJs() instead of directly comparing to kJavascript when it tries to preserve the original content type.

Writing a test for this, though, I'm running into a problem where at least in our current setup we don't optimize json in nginx.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 14, 2016

pagespeed_test.conf.template:types didn't have an entry for .json, so it makes sense we wouldn't be optimizing it. Weirdly, if I add an entry as application/json, we still don't optimize json, but if I add it as application/javascript we do.

  • application/json: no optimization, http cache for example.json has headers Date, Expires, Cache-Control, Etag.
  • application/javascript: optimization, http cache for example.json has headers ETag, Accept-Ranges, Content-Type, Date, Expires, Vary, X-Original-Content-Length, Content-Encoding, Content-Length
@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 15, 2016

  • .json -> application/json, InPlaceResourceRecorder::ConsiderResponseHeaders sees:
HTTP/1.1 200 (null)^M
ETag: "5787b083-45"^M
X-Extra-Header: 1^M
Content-Type: application/json^M
Date: Fri, 15 Jul 2016 11:52:19 GMT^M

and then again when called from the body filter instead of the check header filter:

HTTP/1.1 200 (null)^M
ETag: "5787b083-45"^M
X-Extra-Header: 1^M
Cache-Control: s-maxage=10^M
Accept-Ranges: bytes^M
Content-Type: application/json^M
Date: Fri, 15 Jul 2016 11:52:19 GMT^M
  • .json -> application/javascript, ConsiderResponseHeaders sees:
HTTP/1.1 200 (null)^M
ETag: "5787b083-45"^M
X-Extra-Header: 1^M
Content-Type: application/javascript^M
Date: Fri, 15 Jul 2016 11:55:01 GMT^M
Expires: Fri, 15 Jul 2016 12:00:01 GMT^M
Cache-Control: max-age=300^M

and when called from the body filter:

HTTP/1.1 200 (null)^M
ETag: "5787b083-45"^M
X-Extra-Header: 1^M
Accept-Ranges: bytes^M
Content-Type: application/javascript^M
Date: Fri, 15 Jul 2016 11:55:01 GMT^M
Expires: Fri, 15 Jul 2016 12:00:01 GMT^M
Cache-Control: max-age=300, s-maxage=10^M
@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 15, 2016

Instead of something keyed to content type, I wonder if PSOL is responding to the missing cache-control header? But it would be really weird for a missing cache-control header to make us not write the content-type to the the http cache.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 15, 2016

I don't see anything in ngx_pagespeed's pagespeed_test.conf.template that should be making us set different cache lifetimes for js and json

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 15, 2016

I wonder if this is the same ComputeCaching issue I posted about on the mailing list?

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 15, 2016

It's not that issue: logging response headers at the beginning and end of ComputeCaching I don't see it removing Content-Type: application/json or changing the Cache-Control header.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 15, 2016

The difference in caching comes from ContentType::IsLikelyStaticResource calling js static by json not, and this affecting our heuristic caching rules.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Jul 15, 2016

If I mark kJson as static (which is incorrect), that makes us optimize it in nginx ipro.

This all makes sense now: in Apache both the js file and the json file have explicitly configured cache control headers, for 600s, while in Nginx they don't. Because js is heuristically cacheable this isn't a problem, and we can still ipro it, but because json isn't then ipro can't happen. I should fix this by setting explicit caching headers in our Nginx test config to mirror Apache's.

(Niggling question: why are the headers we write to the input resource cache for the json file missing the content type?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment