New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid cache entries on aborted requests #1081

Closed
jeffkaufman opened this Issue May 14, 2015 · 13 comments

Comments

Projects
None yet
2 participants
@jeffkaufman
Copy link
Contributor

jeffkaufman commented May 14, 2015

In debugging #1048 we found a case where we weren't handling IPRO recording properly. To reproduce:

  • Set up a heavily rate limited origin server
  • Set up a local apache proxying that origin
  • Set up PageSpeed on that apache with IPRO
  • Request a file for the first time, and cancel the download partway through
  • Observe that the partially downloaded file is stored in the cache
@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

We need to check r->connection->aborted before saving contents to the cache. I'll do this.

This may also affect nginx; I'll check.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

I checked: this doesn't affect nginx. In nginx we check last_buf which isn't set if the user aborts the load. I checked both the code and manual loading.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

The fix is very simple:

--- a/pagespeed/apache/mod_instaweb.cc
+++ b/pagespeed/apache/mod_instaweb.cc
@@ -823,6 +823,13 @@ apr_status_t instaweb_in_place_check_headers_filter(ap_filter_t* filter,
       response_headers.SetDate(timer.NowMs());
       response_headers.ComputeCaching();

+      if (request->connection->aborted) {
+        // Browser cancelled download, request may be incomplete, don't record
+        // to the cache.
+        // See https://github.com/pagespeed/mod_pagespeed/issues/1081
+        recorder->Fail();
+      }
+
       // We now have the final headers.  If they don't let us cache then we'll
       // abort even though we've already buffered up the whole resource.
       InstawebHandler::AboutToBeDoneWithRecorder(request, recorder);
@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

Testing this is really annoying. I have a manual test set up, which works fine, but to test this properly we need a rate-limiting webserver and to do this on apache 2.2 we need to install other modules. I tried writing a shell script:

function send_slow_test_response() {
   local first_half="document.write('";
   local second_half="hello world');";
   local length=$(echo "$first_half $second_half" | wc -c);

   echo -e "HTTP/1.1 200 OK\\r";
   echo -e "Content-Type: text/javascript\\r";
   echo -e "Content-Length: $length\\r";
   echo -e "Cache-Control: public,max-age=700\\r";
   echo -e "Connection: close\\r";
   echo -e "Date: $(date)\\r";
   echo -e "\\r";
   echo "$first_half";
   sleep 1;
   echo "$second_half";
}
send_slow_test_response  | stdbuf -i0 -o0 -e0 nc -l -p 8765 &
curl -D- localhost:8080/simple-proxy/ ... Ctrl+C

but couldn't get it to work reliably. Not sure why.

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented May 14, 2015

@jeffkaufman

re: simple fix:
Would it be possible to somehow DCHECK when a port gets the specifics of this wrong?

re: testing rate limiting
with apache/incubator-pagespeed-ngx#936 PSOL gets the capability to forward flushes from php, which might be useful here? (I needed that for testing purposes as well).

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented May 14, 2015

Sorry - I meant #1066 instead of the ngx_pagespeed part of that

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

@oschaaf yes, #1066 would make this easier to test. I tried doing it with PHP and couldn't pass flushes through; that seems to be why. @morlovich are you going to get a chance to look at #1066? You'd said you were interested in looking at it, but if not I can.

Would it be possible to somehow DCHECK when a port gets the specifics of this wrong?

I don't think we can DCHECK, because we don't know how to tell that the download was aborted except in a server-specific way (apache: r->connection->aborted, nginx: don't receive last_buf).

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented May 14, 2015

@jeffkaufman: "I don't think we can DCHECK, because we don't know how to tell that the download was aborted except in a server-specific way (apache: r->connection->aborted, nginx: don't receive last_buf)."

perhaps somehow assuming failure in the recorder until success is explicitly indicated via an api call would help here? Though that might require an api change which might be too much.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

perhaps somehow assuming failure in the recorder until success is explicitly indicated via an api call
would help here?

Maybe adding:

InPlaceResourceRecorder::RequestAborted(bool aborted)

?

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

So anyone working with InPlaceResourceRecorder has to think about how to know if the request was aborted?

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented May 14, 2015

@jeffkaufman yes, that would make it explicit, and would be easy maintenance-wise, which sounds great to me.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented May 14, 2015

Actually, what if I just do this as another (required) argument to DoneAndSetHeaders:

   // Because of Apache's quirky filter order, we cannot get both the
   // uncompressed final contents and the complete headers at the same time.
   //
+  // Set entire_response_received to true if you know that the response data fed
+  // into Write() is complete.  For example, if the browser cancelled the
+  // download and so this is a partial response, set entire_response_received to
+  // false so we know not to cache it.
+  //
   // Does not take ownership of response_headers.
   //
   // Deletes itself. Do not use object after calling DoneAndSetHeaders().
-  void DoneAndSetHeaders(ResponseHeaders* response_headers);
+  void DoneAndSetHeaders(ResponseHeaders* response_headers,
+                         bool entire_response_received);
@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented May 14, 2015

@jeffkaufman I think that would be even better. That would follow the Done(success) interface of the other classes more closely, which seems more natural.

jeffkaufman added a commit that referenced this issue May 15, 2015

If we're receiving a response slowly and the user aborts, we need to …
…note this and not record the truncated resource to cache.

Fixes #1081

Nginx side of the change: apache/incubator-pagespeed-ngx#966

jeffkaufman added a commit that referenced this issue May 27, 2015

If we're receiving a response slowly and the user aborts, we need to …
…note this and not record the truncated resource to cache.

Fixes #1081

Nginx side of the change: apache/incubator-pagespeed-ngx#966
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment