Skip to content

Fix static file cached 304 response #1692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 14, 2018
Merged

Fix static file cached 304 response #1692

merged 7 commits into from
Aug 14, 2018

Conversation

mrvanes
Copy link
Contributor

@mrvanes mrvanes commented Feb 2, 2018

@codecov
Copy link

codecov bot commented Feb 2, 2018

Codecov Report

Merging #1692 into master will decrease coverage by 0.53%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1692      +/-   ##
==========================================
- Coverage   80.83%   80.29%   -0.54%     
==========================================
  Files         105      104       -1     
  Lines       13591    13552      -39     
==========================================
- Hits        10986    10882     -104     
- Misses       2605     2670      +65

@webknjaz webknjaz requested review from jaraco, webknjaz and a team February 2, 2018 09:07
@webknjaz
Copy link
Member

webknjaz commented Feb 2, 2018

Thanks! I'll try to review this ASAP. But if not, it's okay to ping me.

# self.body is a generator object that must be yielded
# for _caching.tee_output() to finish it's job and
# remove orphaned threading._Event object from cache
for i in self.body:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced by self.collapse_body(), but I'm still trying to figure out a fix on a different level.

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems to have elicited a new failure in the Windows AppVeyor tests. I've restarted the tests for this PR twice and they fail consistently. We'll have to devise a fix for that failed test before we can accept this.

@@ -929,6 +939,7 @@ def finalize(self):
# and 304 (not modified) responses MUST NOT
# include a message-body."
dict.pop(headers, 'Content-Length', None)
self._flush_body()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaraco do you fully agree with such approach? I'd prefer not putting the body there over consuming it here.

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still uneasy with this PR. I don't fully understand its implications and as much as I've tried to document the changes, I don't know what to write to explain the behavior.

Furthermore, this sort of change really needs a test to capture the desired behavior. The associated ticket describes a scenario which should trigger the failure, so what needs to be done is codify that scenario in a test. I'm out of time to work on this today, but I welcome others to implement the test.

@cherrypy cherrypy deleted a comment Feb 4, 2018
@cherrypy cherrypy deleted a comment Feb 4, 2018
@cherrypy cherrypy deleted a comment Feb 4, 2018
@webknjaz
Copy link
Member

webknjaz commented Feb 4, 2018

@jaraco it feels like you got into my head and wrote down every single thought I had regarding this PR :)
I've been trying to go through the code and trace everything which is happening during such flow, but only succeed with a few parts of it so far. I need more time to understand it in full. It seems to me that it patches the side-effect, but should fix the root cause of the issue.
And of course regression test is 100% mandatory for such a complicated case.

@mrvanes
Copy link
Contributor Author

mrvanes commented Feb 4, 2018

It took me more than a month to understand the time-outs we experienced (and that included working my way into the dungeons of cherrypy). I'm glad it takes you guys a while as well ;)

The issue I linked fully describes the bug and condition, and it's hard to explain it better. It does help to exactly follow the code as I explain, but it also helps to fully grok the CachingTool.

The problem arises around a race condition that is caused by expiring a previous normally cached static file. As the MemoryCache.expire_cache daemon thread only deletes the variant and leaves the uri in the store dictionary, MemoryCache.get() will now follow variant = uricache.wait() codepath on a next request, and AntiStampedeCache will guard the cache by inserting a threading._Event object in the cache (because it didn't find a result). Now, if this next request happens to be an 'if-modified-since' request, the static.serve_fileobj() path will except (caused by cptools.validate_since()) before the body is filled with the fileObject and set headers to 304 not-modified. Response.finalize() on it's turn used to skip the tee_output generator on this and leave the threading._Event behind, holding off the next requests untill the threading._Event timed out. Causing our observed time-outs.

However, only consuming tee_output on this 304 case would fill the cache with an empty body and repsonse headers 304. This is nonsense for someone that would request the same uri/variant without an if-modified-since header, because that client would not have the file in cache, but would be served with an empty body, containing a header (not-modified) it never asked for!

I do doubt the sanity of the coarse empty body case cache deletion in tee_output. It certainly helps to solve the observed case and doesn't harm others. The only problem is, it interferes with the caching of potentially expensive empty body results, but I have no clue how to correctly address that and well, we had to move on with our project ;)

By the way: This bug also raises my question why first requests to non-existing uri's in caching.store won't receive the AntiStampedeCache guard treatment. Of all circumstances, a freshly restarted server that has empty (memory)cache should be protected against a request stampede? Of course, I see why it happens, but I think it could use some attention as well?

@mrvanes
Copy link
Contributor Author

mrvanes commented Feb 12, 2018

So, did my second attempt at explaining the race help?

@webknjaz
Copy link
Member

I didn't have time to check this attentively just yet.

@mrvanes
Copy link
Contributor Author

mrvanes commented Feb 14, 2018

I suddenly realised why I had so much trouble understanding the CachingTool in the first place and suggest another fix for that: the _caching.get() function isn't a getter, so it's name is confusing. It loads the server.response if cache is available and returns true or false on success/failure to do that. I think it should be renamed load().

@mrvanes
Copy link
Contributor Author

mrvanes commented Mar 28, 2018

Still no time to review the PR?

@c00kiemon5ter
Copy link

Hi everyone,

I think this is fine as it is now. It is a proper fix, even if it is not the best possible solution. As this does not affect any public API or method signature, I think this should be merged. If you think there is room for improvement, lets open an issue and link it to this PR with a reference to the merge commit and work on it when time allows - but until that is done we can have the fix.

What do you think?

@webknjaz
Copy link
Member

webknjaz commented Jun 6, 2018

@c00kiemon5ter Thanks for your opinion!

The main blocker for me as a maintainer is being able to understand the implementation and what are the consequences for other parts of code.

I'm really sorry that I didn't find time to dig into it, but to me it feels irresponsible to blindly merge changes. I hope you understand this.

It's still in my long backlog. As a community member, you can try helping @jaraco and me to verify things and comment on this more, so that it will make it easier for us to do reviews and would take much less time.
Normally we have some limited amount of FOSS time available and we cannot always commit to spending a lot of it, but if it starts looking easy to go through the chances that we'll look closer and maybe reach "merge stage" are much higher. Thanks you in advance for understanding!

P.S. @mrvanes could you please rebase this PR on top of current master?

mrvanes and others added 5 commits June 6, 2018 19:34
…r. Use 'iter' to make iterator from iterable.
…I still don't understand why deleting the cache is necessary. The previous comment said 'an empty response will be cached', but that seems inconsistent with the 'if body' block.
…I still don't understand why deleting the cache is necessary. The previous comment said 'an empty response will be cached', but that seems inconsistent with the 'if body' block.
@webknjaz
Copy link
Member

webknjaz commented Jun 7, 2018

// Side note:

def get(invalid_methods=('POST', 'PUT', 'DELETE'), debug=False, **kwargs):

it looks like the list should also include PATCH

@webknjaz
Copy link
Member

webknjaz commented Jun 7, 2018

// Also side note:

https://tools.ietf.org/html/rfc7231#section-4.3.7

Responses to the OPTIONS method are not cacheable.

@webknjaz webknjaz added the reproducer: missing This PR or issue leaks code, which reproduce the problem described or clearly understandable STR label Jun 7, 2018
@jaraco jaraco merged commit 157a4e4 into cherrypy:master Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reproducer: missing This PR or issue leaks code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants