OutputSanitizingAsyncFetch: runs right before PSOL responds #1695

oschaaf · 2017-12-13T22:23:12Z

Headers starting with '@' will be removed from any output emitted
by PSOL. This allows a concept of 'internal headers' which can be
used to pass information around internally.

In a follow-up PR this will be used to track (and remember) which
redirects have been followed while fetching a resource, and with that
we will be able to verify if using the resource as a basis for optimized
output would violate effective Content-Security-Policies, if any.

Note that a header is not allowed to contain '@' per http spec, and
if one does 'escape' httpd will respond with a 5xx.

Headers starting with '@' will be removed from any output emitted by PSOL. This allows a concept of 'internal headers' which can be used to pass information around internally. In a follow-up PR this will be used to track (and remember) which redirects have been followed while fetching a resource, and with that we will be able to verify if using the resource as a basis for optimized output would violate effective Content-Security-Policies, if any. Note that a header is not allowed to contain '@' per http spec, and if one does 'escape' httpd will respond with a 5xx.

Using php and/or using "Headers always" in configuration to set up CSP headers in httpd is probably pretty common. These will both end up in err_headers_out, a httpd-only concept meaning that the headers should always be emitted, even on error statusses. We want to consider these CSP headers living in this collection when rewriting content, but leave them alone otherwise. Instead of introducing an err_headers_out (like) concept in RewriteDriver to model this, we will rely on the internal headers concept. ScanFilter will also look at kInternalContentSecurityPolicy. This header will be sanitized from our own output. Relies on #1695

morlovich · 2017-12-18T21:20:48Z

Hmm, I guess Apache code can just filters those when converting back?

oschaaf · 2017-12-18T21:31:04Z

@morlovich Yes, fixed that in another PR.
https://github.com/pagespeed/mod_pagespeed/pull/1696/files#diff-09994d3bc4ac0373770c34bfe92b93f2R118

However, I recently ran into a hard to reproduce CHECK failure, probably caused by one of these changes:

[1214/123730:FATAL:headers.cc(81)] Check failed: static_cast<size_t>(headers->size()) == needed.size() (7 vs. 6)
[ RUN      ] ProxyInterfaceTest.FlushHugeHtml

Perhaps the check failure is caused by OutputSanitizingAsyncFetch::HandleDone also stripping the headers, maybe it shouldn't do that when wrapping ProxyFetch, but I'm still working to understand how that could cause the above check failure.

morlovich · 2017-12-18T21:34:24Z

net/instaweb/http/async_fetch.cc

+}
+
+void OutputSanitizingAsyncFetch::HandleDone(bool success) {
+  SanitizeResponseHeaders();


This one might be dubious threading-wise? Though I think our story on this hasn't been very clean.

Welll... There is some documentation here:
https://github.com/pagespeed/mod_pagespeed/blob/master/net/instaweb/http/public/async_fetch.h#L114
.... which is not a very good place to have it.

At any rate thread issues seem very likely for this check if you look at how the code in RemoveAllWithPrefix looks like.

- Filter headers starting with '@' in AddResponseHeadersToRequestHelper

Using php and/or using "Headers always" in configuration to set up CSP headers in httpd is probably pretty common. These will both end up in err_headers_out, a httpd-only concept meaning that the headers should always be emitted, even on error statusses. We want to consider these CSP headers living in this collection when rewriting content, but leave them alone otherwise. Instead of introducing an err_headers_out (like) concept in RewriteDriver to model this, we will rely on the internal headers concept. ScanFilter will also look at kInternalContentSecurityPolicy. This header will be sanitized from our own output. Relies on #1695

oschaaf · 2017-12-19T09:42:20Z

updated, this PR now includes the code to also strip internal headers in Apache, and OutputSanitizingAsyncFetch::HandleDone no longer touches the response headers.

morlovich · 2017-12-21T16:09:34Z

Had a flash of a "dumb question": instead of adding headers with magic names and then having to care about removing them, why not just have a separate spot, like a second ResponseHeaders?

And I think I need to re-echo my earlier though that we may have more bugs with err_headers_out elsewhere..

oschaaf · 2017-12-21T21:52:45Z

@morlovich I think that is a great question..

Let me put down some thoughts: The initial reason reason for storing internal headers in the current ResponseHeaders set is that these get persisted automatically. That is a useful attribute for the redirect-following feature, because we want to remember how we got them so we will be able to evaluate CSP when we pull them from cache.

Then came the story about CSP often ending up in err_response_headers.The concept of err_headers out seemed a bit httpd-specific.
I was a bit afraid of ending up with response_headers, extra_response_headers, err_response_headers, and internal_headers. In the follow up to this PR I opted for storing the headers using the same magic naming scheme (@Content-Security-Policy) -- because for this we need to only read it and leave it alone after that.

Having said that: I can see that it would be useful if we would be able to remember which header came from where with respect to err_headers_out (which would help with the possible other bugs with err_headers_out you mention, I think?). Modelling err_response_headers may be useful to accomplish that. Though thinking about it, maybe I would discuss an alternative: adding a 'source' attribute (and maybe some more metadata) to header/value pairs contained in the current ResponseHeaders structure to make it easier for server implementations to correctly map back any rewritten headers we give them, and perhaps state our intent (maybe "write back/"don't write back", or even set/append/delete/skip). If we can state intent, we don't need magic names.

Considering the above, I'd lean towards thinking that this is stuff for more discussion maybe, and perhaps a separate PR. What do you think?

oschaaf · 2018-01-06T11:11:22Z

@morlovich any thoughts on #1695 (comment) ?

jmarantz · 2018-01-08T13:47:23Z

net/instaweb/http/async_fetch.cc

+
+bool OutputSanitizingAsyncFetch::SanitizeResponseHeaders() {
+  if (response_headers() != nullptr &&
+      response_headers()->RemoveAllWithPrefix("@")) {


can you put the "@" in a static class constant in Headers?

there you can put doc that this is illegal from a protocol perspective (https://tools.ietf.org/html/rfc2616#page-31) 2.2 Basic Rules for 'token' excludes separators, including "@". That's why we can use it as an in-memory sentinal as long as it is never serialized.

jmarantz · 2018-01-08T13:48:16Z

net/instaweb/http/public/async_fetch.h

@@ -316,6 +316,26 @@ class SharedAsyncFetch : public AsyncFetch {
  DISALLOW_COPY_AND_ASSIGN(SharedAsyncFetch);
 };

+// Can be used to sanitize headers and data before forwarding them on to the


s/Can be used/Used/

jmarantz · 2018-01-08T13:48:58Z

net/instaweb/http/public/async_fetch.h

+class OutputSanitizingAsyncFetch : public SharedAsyncFetch {
+public:
+  explicit OutputSanitizingAsyncFetch(AsyncFetch* base_fetch);
+  virtual ~OutputSanitizingAsyncFetch();


s/virtual/override/ (modulo moving it to before the ';')

jmarantz · 2018-01-08T13:51:08Z

pagespeed/apache/header_util.cc

@@ -115,6 +115,9 @@ void AddResponseHeadersToRequestHelper(const ResponseHeaders& response_headers,
  for (int i = 0, n = response_headers.NumAttributes(); i < n; ++i) {
    const GoogleString& name = response_headers.Name(i);
    const GoogleString& value = response_headers.Value(i);
+    if (strings::StartsWith(name, "@")) {
+      continue;


curious: does this get hit in tests? the concerning thing here is whether there's something linking through PSOL into the integrations that we have to catch on each integration.

@jmarantz
I can see that this line should be covered by a test (and perhaps give known implementors with ports similar to mod_pagespeed a head's up, those that do not use ProxyFetch to wire up html rewriting).

Currently this is not hit in tests, but in the next planned follow-up to this PR, it will be easy to add and end-to-end test because that is where the rubber will start hitting the road:
https://github.com/apache/incubator-pagespeed-mod/pull/1696/files#diff-923e5a0c8d9a28d13112afd3d77b35a0R239

oschaaf · 2018-01-09T13:07:09Z

@jmarantz processed your comments in eed1ff1
(barring the one about test-coverage, which I proposed to land in the follow up to this)

oschaaf requested review from jmarantz and morlovich December 13, 2017 22:23

oschaaf mentioned this pull request Dec 14, 2017

CSP vs httpd's err_headers_out #1696

Closed

morlovich reviewed Dec 18, 2017

View reviewed changes

- Don't sanitize in HandleDone

c587e35

- Filter headers starting with '@' in AddResponseHeadersToRequestHelper

jmarantz reviewed Jan 8, 2018

View reviewed changes

Process review comments from Josh

eed1ff1

oschaaf closed this Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutputSanitizingAsyncFetch: runs right before PSOL responds #1695

OutputSanitizingAsyncFetch: runs right before PSOL responds #1695

oschaaf commented Dec 13, 2017

morlovich commented Dec 18, 2017

oschaaf commented Dec 18, 2017

morlovich Dec 18, 2017

morlovich Dec 18, 2017

oschaaf commented Dec 19, 2017 •

edited

morlovich commented Dec 21, 2017

oschaaf commented Dec 21, 2017

oschaaf commented Jan 6, 2018

jmarantz Jan 8, 2018

jmarantz Jan 8, 2018

jmarantz Jan 8, 2018

jmarantz Jan 8, 2018

oschaaf Jan 8, 2018

oschaaf commented Jan 9, 2018

OutputSanitizingAsyncFetch: runs right before PSOL responds #1695

OutputSanitizingAsyncFetch: runs right before PSOL responds #1695

Conversation

oschaaf commented Dec 13, 2017

morlovich commented Dec 18, 2017

oschaaf commented Dec 18, 2017

morlovich Dec 18, 2017

Choose a reason for hiding this comment

morlovich Dec 18, 2017

Choose a reason for hiding this comment

oschaaf commented Dec 19, 2017 • edited

morlovich commented Dec 21, 2017

oschaaf commented Dec 21, 2017

oschaaf commented Jan 6, 2018

jmarantz Jan 8, 2018

Choose a reason for hiding this comment

jmarantz Jan 8, 2018

Choose a reason for hiding this comment

jmarantz Jan 8, 2018

Choose a reason for hiding this comment

jmarantz Jan 8, 2018

Choose a reason for hiding this comment

oschaaf Jan 8, 2018

Choose a reason for hiding this comment

oschaaf commented Jan 9, 2018

oschaaf commented Dec 19, 2017 •

edited