Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

shutdown-and-flushing: Improve shutdown, support/test FlushHtml #936

Closed
wants to merge 1 commit into from

Conversation

oschaaf
Copy link
Member

@oschaaf oschaaf commented Mar 16, 2015

shutdown-and-flushing: Improve shutdown, support/test FlushHtml

- Improve shutting down both gracefully and quickly
- Set the flush flags on buffers when FlushHtml is on for optimized
  html responses
- Add tests for these features
- Don't call proxy_fetch->Flush() each time our body filter is called,
  because that would only make sense if we'd also set buf->flush.

Needs changes in PSOL, which are pending review.

@oschaaf oschaaf force-pushed the oschaaf-trunk-tracking-flushing branch 2 times, most recently from 14460b2 to 768ccc5 Compare March 16, 2015 16:36
@oschaaf
Copy link
Member Author

oschaaf commented Mar 16, 2015

Tests that ran today brought a problem to light. I'll look into that, and make a comment here when I've pushed the fix.

@morlovich
Copy link
Contributor

This looks rather dangerous to me --- the code in ProxyFetch doesn't expect Done to be called externally like this, and it looks like a way towards crashes accessing freed memory, as fetch code may end up still writing towards the fetch object.

@oschaaf
Copy link
Member Author

oschaaf commented Mar 17, 2015

@morlovich thanks for having a look. We call proxy_fetch->Done(false) from ngx_pagespeed too [1] in case requests time out, are aborted, or error otherwise, so I thought this would be OK. Should we have a look at that?

[1] https://github.com/pagespeed/ngx_pagespeed/blob/master/src/ngx_pagespeed.cc#L1696

@morlovich
Copy link
Contributor

Does something ensure that you don't call any further Write() and Done()
things?

On Tue, Mar 17, 2015 at 10:19 AM, Otto van der Schaaf <
notifications@github.com> wrote:

@morlovich https://github.com/morlovich thanks for having a look. We
call proxy_fetch->Done(false) from ngx_pagespeed too [1] in case requests
time out, are aborted, or error otherwise, so I thought this would be OK.
Should we have a look at that?

[1]
https://github.com/pagespeed/ngx_pagespeed/blob/master/src/ngx_pagespeed.cc#L1696


Reply to this email directly or view it on GitHub
#936 (comment)
.

@oschaaf
Copy link
Member Author

oschaaf commented Mar 17, 2015

@morlovich Yes, nps won't touch any proxy fetches after calling Done(true|false) on them.
That applies to the proposed CancelOutstanding() as well (the goal here is to be able to shutdown real quick)

@morlovich
Copy link
Contributor

Might you not still have an html parse ongoing, though?

On Tue, Mar 17, 2015 at 11:06 AM, Otto van der Schaaf <
notifications@github.com> wrote:

@morlovich https://github.com/morlovich Yes, nps won't touch any proxy
fetches after calling Done(true|false) on them.
That applies to the proposed CancelOutstanding() as well (the goal here is
to be able to shutdown real quick)


Reply to this email directly or view it on GitHub
#936 (comment)
.

@oschaaf
Copy link
Member Author

oschaaf commented Mar 17, 2015

@morlovich Yes, it is possible there are html parses going on indeed. Does calling proxy_fetch->Done(false) not stop/cancel the associated html parser? The code that uses this call is at:
See https://github.com/pagespeed/ngx_pagespeed/pull/936/files#diff-8fb776418fb480e5f084e8500b8f0b0cR3022 (ps_exit_child_process())

This works if we are lucky, but no guarantees. With some synthetic ongoing load this still crashes.
Cancelling out the proxy fetches seemed like a good first step, as we're not going to wait for upstreams at this point and have no further need for them. Any further suggestions? Should there perhaps something be added to the rewrite driver factory too to support a quick shutdown?

@morlovich
Copy link
Contributor

Hmm... You're right, it should behave correctly as long as there are no
other Write or Done calls in a sense that the parser's worldview is
consistent.
I was thinking about this completely wrong: you're literally changing the
input events, which is perfectly fine.
(What might matter is that parsing may still go on for a while....)

Sorry for misleading you.

On Tue, Mar 17, 2015 at 3:28 PM, Otto van der Schaaf <
notifications@github.com> wrote:

@morlovich https://github.com/morlovich Yes, it is possible there are
html parses going on indeed. Does calling proxy_fetch->Done(false) not
stop/cancel the associated html parser? The code that uses this call is at:
See
https://github.com/pagespeed/ngx_pagespeed/pull/936/files#diff-8fb776418fb480e5f084e8500b8f0b0cR3022
(ps_exit_child_process())

This works if we are lucky, but no guarantees. With some synthetic ongoing
load this still crashes.
Cancelling out the proxy fetches seemed like a good first step, as we're
not going to wait for upstreams at this point and have no further need for
them. Any further suggestions? Should there perhaps something be added to
the rewrite driver factory too to support a quick shutdown?


Reply to this email directly or view it on GitHub
#936 (comment)
.

@oschaaf
Copy link
Member Author

oschaaf commented Mar 17, 2015

@morlovich I'm not sure, but would it be a reasonable idea to add a separate call to the driver factory to stop the schedulers as a first step in shutting down quickly?

@oschaaf oschaaf force-pushed the oschaaf-trunk-tracking-flushing branch 4 times, most recently from fe2ddf7 to e1c9b3d Compare March 19, 2015 11:30
@oschaaf
Copy link
Member Author

oschaaf commented Mar 23, 2015

@morlovich Running without valgrind, quick shutdowns seems stable after some tweaking. However, I sometimes see sequences like this leading up to a DCHECK when running under valgrind:

     check [ -z 2015/03/23 00:44:11 [warn] 17691#0: [ngx_pagespeed 1.10.0.0-4597] RateController: drop deferred fetch of http://127.0.0.1:8050/mod_pagespeed_example/images/pagespeed_logo.png on shutdown
2015/03/23 00:44:11 [warn] 17691#0: [ngx_pagespeed 1.10.0.0-4597] RateController: drop fetch of http://127.0.0.1:8050/mod_pagespeed_example/styles/index_style.css on shutdown
2015/03/23 00:44:11 [warn] 17691#0: [ngx_pagespeed 1.10.0.0-4597] RateController: drop fetch of http://127.0.0.1:8050/mod_pagespeed_example/images/pagespeed_logo.png on shutdown
2015/03/23 00:44:11 [warn] 17691#0: [ngx_pagespeed 1.10.0.0-4597] RateController: drop fetch of http://127.0.0.1:8050/mod_pagespeed_example/styles/index_style.css on shutdown
2015/03/23 00:44:11 [warn] 17691#0: [ngx_pagespeed 1.10.0.0-4597] [0323/004412:WARNING:queued_worker_pool.cc(379)] Adding function to sequence 0xf197610 after shutdown
2015/03/23 00:44:11 [warn] 17691#0: [ngx_pagespeed 1.10.0.0-4597] RateController: drop fetch of http://127.0.0.1:8050/mod_pagespeed_example/styles/index_style.css on shutdown
2015/03/23 00:44:11 [alert] 17691#0: [ngx_pagespeed 1.10.0.0-4597] [0323/004412:FATAL:rewrite_driver.cc(2438)] Check failed: !release_driver_. 

Is this DCHECK happening because of something that I overlooked?

With this patch, the quick shutdown looks like this:

  1. The nginx side calls proxy_fetch->Done(false) on all outstanding proxy fetches
  2. The nginx side drains any queued events, and releases all outstanding request context.
  3. The nginx side calls rewrite_driver_factory->ShutDown(), which should now be the only one holding references to pending drivers/parsers (right?)

@jeffkaufman
To be able to test this under valgrind, I had to patch nginx because of the box it will force kill child processes one second after forwarding SIGTERM, which is too short for us to wrap up under high load. How should this be handled? Would it be OK to patch nginx for testing under valgrind?

@jeffkaufman
Copy link
Contributor

Patching nginx to let us finish cleaning up under high load with valgrind sounds like the right way to handle it for me. Ideally the patch would make shutdown timeout a configuration option, so we could turn it off without rebuiding nginx, but if that's too annoying I can make our scripts just patch the nginx code before running valgrind and handle the rebuild. If you do make it a configuration option I wonder whether they would accept that upstream?

@oschaaf
Copy link
Member Author

oschaaf commented Mar 23, 2015

@oschaaf oschaaf force-pushed the oschaaf-trunk-tracking-flushing branch 2 times, most recently from d0af3fe to 2d112a4 Compare March 23, 2015 22:47
@oschaaf
Copy link
Member Author

oschaaf commented Mar 24, 2015

@morlovich never mind the question about DCHECK(!release_driver).
ProxyFetch::CancelOutstanding should mind done_outstanding_ to make sure it doesn't make a repeated call to Done. Doing so stabilizes the test runs for this.

@oschaaf oschaaf force-pushed the oschaaf-trunk-tracking-flushing branch 2 times, most recently from 8f597a8 to 2dcaf7b Compare March 25, 2015 15:48
@oschaaf oschaaf force-pushed the oschaaf-trunk-tracking-flushing branch 5 times, most recently from c9ed1a0 to 9eda614 Compare April 7, 2015 08:17
@jeffkaufman
Copy link
Contributor

Is this ready for final review or are you still working on it?

ngx_pagespeed developers would need to patch in your nginx change, right? When this does go in, can you add that to https://github.com/pagespeed/ngx_pagespeed/wiki/Testing ?

@oschaaf
Copy link
Member Author

oschaaf commented Apr 15, 2015

@jeffkaufman This is ready for review, and I'll make the wiki change once this goes in!

@@ -193,7 +234,8 @@ ngx_int_t NgxBaseFetch::CopyBufferToNginx(ngx_chain_t** link_ptr) {
}

int rc = string_piece_to_buffer_chain(
request_->pool, buffer_, link_ptr, done_called_ /* send_last_buf */);
request_->pool, buffer_, link_ptr, done_called_ /* send_last_buf */,
true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment what this "true" means

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (true indicates buf->flush will be set)

@jeffkaufman
Copy link
Contributor

LGTM

@jeffkaufman
Copy link
Contributor

@jmarantz @morlovich do either of you want to look at this more?

- Improve shutting down both gracefully and quickly
- Set the flush flags on buffers when FlushHtml is on for optimized
  html responses
- Add tests for these features
- Don't call proxy_fetch->Flush() each time our body filter is called,
  because that would only make sense if we'd also set buf->flush.
@oschaaf oschaaf force-pushed the oschaaf-trunk-tracking-flushing branch from 9eda614 to 7f7fd78 Compare April 16, 2015 08:11
@oschaaf
Copy link
Member Author

oschaaf commented Apr 16, 2015

@jeffkaufman processed comments & force pushed

@oschaaf
Copy link
Member Author

oschaaf commented Apr 16, 2015

Should I make a pull with the PSOL side needed for this to https://github.com/pagespeed/mod_pagespeed ?

@jeffkaufman
Copy link
Contributor

Should I make a pull with the PSOL side needed for this to https://github.com/pagespeed/mod_pagespeed

@oschaaf Yes! I'm glad to say we can accept pull requests to all of PageSpeed now!

@jeffkaufman
Copy link
Contributor

LGTM

oschaaf added a commit to apache/incubator-pagespeed-mod that referenced this pull request Apr 16, 2015
- Adds ProxyFetch::CancelOutstanding() which helps ngx_pagespeed
  with shutting down quickly by calling Done(false) on all
  live ProxyFetch instances.
- Adds follow_flushes option, which makes ProxyFetch forward
  Flush calls to the RewriteDriver when set.

These changes are needed for: apache/incubator-pagespeed-ngx#936
@oschaaf
Copy link
Member Author

oschaaf commented Apr 16, 2015

oschaaf added a commit to apache/incubator-pagespeed-mod that referenced this pull request Apr 16, 2015
- Adds ProxyFetch::CancelOutstanding() which helps ngx_pagespeed
  with shutting down quickly by calling Done(false) on all
  live ProxyFetch instances.
- Adds follow_flushes option, which makes ProxyFetch forward
  Flush calls to the RewriteDriver when set.

These changes are needed for: apache/incubator-pagespeed-ngx#936
oschaaf added a commit to apache/incubator-pagespeed-mod that referenced this pull request Apr 16, 2015
- Adds ProxyFetch::CancelOutstanding() which helps ngx_pagespeed
  with shutting down quickly by calling Done(false) on all
  live ProxyFetch instances.
- Adds follow_flushes option, which makes ProxyFetch forward
  Flush calls to the RewriteDriver when set.

These changes are needed for: apache/incubator-pagespeed-ngx#936
@jeffkaufman jeffkaufman closed this Dec 6, 2016
@jeffkaufman jeffkaufman deleted the oschaaf-trunk-tracking-flushing branch December 6, 2016 16:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants