Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don’t shutdown() shared sockets (fixes #213) #463

Merged
merged 1 commit into from Feb 27, 2014

Conversation

eam
Copy link
Contributor

@eam eam commented Dec 18, 2013

Don’t shutdown() shared sockets (fixes #213)

When the Ruby interpreter forks after creating a mysql2 object, each mysql2
object will share the socket and the first instance to be garbage collected
will destroy the socket state for other objects in other processes by calling
shutdown(). This appears to be the underlying bug in issue #213.

Because the GC is non-deterministic sometimes child processes will exit
without calling the GC on copies in children before the children exit. In our
tests we force the GC to run, exposing the issue in a reliable manner.

I work around this issue by preventing the GC from collecting instances in
child processes entirely. This patch causes the GC to only collect mysql2
objects within the same process which spawned them (within the parent). #close
continues to function as it has, in case it is desired to force cleanup of the
socket state.

It would be ideal to have a more granular or configurable interface for cleanup
than mysql_close(), allowing us to perform everything but the call to
shutdown(), and to only close() descriptors in the child without side effects.

@tamird
Copy link
Contributor

tamird commented Dec 18, 2013

We (square) have observed substantial performance problems with Resque in production due to this issue, so we expect that this will benefit the majority of (MRI users) ∩ (Resque users).

Caveat: rbx doesn't like rb_pid_t, so a workaround may be necessary.

@sodabrew
Copy link
Collaborator

The sockets really shouldn't be shared at all, if we can avoid it. In Ruby 2.0, sockets default to FD_CLOEXEC, which forces the socket closed in the child process. I wonder if we can do this across the board.

As a thought experiment, does your approach work in this scenario:

  1. Parent open mysql socket.
  2. Parent forks.
  3. Parent exits.
  4. Child uses the mysql socket.
  5. Child exits.

Currently we may GC and close the socket at 3 and 5. In your patch, we would only close the socket at 3, but not 5?

@eam
Copy link
Contributor Author

eam commented Dec 18, 2013

I think there's a use case for a connection-opening/pooling parent which does work in short-lived forked children, which is how Resque operates. I agree it's a delicate endeavor and close by default is probably appropriate. My specific goal in writing this patch is to support the Square use case in Resque. Perhaps this should be configurable behavior? I do think it should be possible to pass a handle to a child safely, even if the default behavior is to close on exec.

In your scenario above, the behavior both currently and after my patch would be the same:

  1. Parent open mysql socket
  2. Parent forks
  3. Parent exits and MAY GC the object and shutdown() the connection. Or it may not. Non-deterministic.
  4. Child uses the mysql socket (or maybe fails depending on step 3's outcome)
  5. Child exits and does not touch socket state, regardless of step 3's outcome.

The trick here is that upon shutdown of a ruby process, a given individual object may not GC at all. But this behavior can be forced, to illustrate the nature of the problem. Try this, on an unpatched (or patched) mysql2:

#!/usr/bin/env ruby
# Give this script an arg of "gc" or "no-gc" to force GC behavior

require 'rubygems'
require 'mysql2'

client = Mysql2::Client.new(
  :username => 'eam',
  :database => 'eam',
  :password => 'xxx',
)

# parent is connected and can query
client.query("select 1;")

arg = ARGV.shift # pick behavior

# now fork
if fork
  # parent
  if arg == "no-gc"
    # this generally ensures the gc will NOT collect the client object
    Kernel.exit!
  elsif arg == "gc"
    # This generally ensures the gc DOES collect the client object
    client = nil
    GC.start
  else
    # if we didn't force the GC's hand it may GC the mysql2 object, or it may
    # not. The GC will tend to have the same behavior per-run for simple scripts
  end
else
  # child
  sleep 1
  # may or may not fail, depending on GC behavior in parent
  client.query("select 1;")
end

@sodabrew
Copy link
Collaborator

Should we actively disable FD_CLOEXEC then, so that behavior is consistent going forward into Ruby 2.0? How is Resque handling Ruby 2.0?

@sodabrew
Copy link
Collaborator

I reopened #213, and tentatively scheduling this for version 0.3.16. Mostly because 0.3.15 is almost ready to roll now.

@eam
Copy link
Contributor Author

eam commented Dec 18, 2013

I imagine it should be acceptable to let apps such as Resque specifically ask to disable setting FD_CLOEXEC. On by default seems fine to me.

I don't know much about Resque other than this specific bug, sorry.

* common socket state. #close still functions in a child process if one
* wishes to force the issue.
*/
if (wrapper->refcount == 0 && getpid() == wrapper->origin_pid) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The child process will decrement the refcount but not free the socket, leaving the parent with a refcount < 0. This will cause an open socket leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this essentially disables GC collection for objects which were allocated in a parent. I believe this is the only use of refcount, so whether we decrement it or not may not matter -- the GC will never collect it if the pid differs. #close will still operate and does not check refcount.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodabrew the child and parent each have a copy of the refcount, so the child will not decrement the parent's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, refcount is still properly tracked in the parent. When the parent's instance reaches zero the socket will be destroyed as it is now. This only disables collection in the child.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, thank you for reminding me that these are processes :)

@brianmario
Copy link
Owner

Well let's back up a minute. I might not be understanding the issue here fully but:

After you fork, you have to create a new connection anyway right? If the children re-establish a connection after fork this problem goes away, no?

@eam
Copy link
Contributor Author

eam commented Dec 18, 2013

After a fork Resque does not create a new connection. The processing model is similar to say, pre-fork Apache where the parent maintains a connection pool to databases to avoid the latency overhead of establishing a new database connection per-job or per-request.

The latency change of using a connection-pool vs not can be considerable for short-lived jobs/requests.

@brianmario
Copy link
Owner

@eam right, it's up to you to establish the connection in an after_fork callback like unicorn. As far as I know Resque does no coordination with other children about who's using what sockets from a connection pool.

@brianmario
Copy link
Owner

For context: https://devcenter.heroku.com/articles/forked-pg-connections#resque-ruby-queuing. This documentation is referencing a postgres connection, but the semantics are the same here.

@sodabrew
Copy link
Collaborator

The child will inherit the connection. I recently fixed this in an app after I upgraded to Ruby 2.0, where FD_CLOEXEC clued me in to the fact that I was regularly sharing a mysql connection eight ways.

I'm leaning on the side of calling this a bug that needs to be fixed in the user code, not in mysql2. The nondeterministic part of the issue makes me want to do something, though ;)

@eam
Copy link
Contributor Author

eam commented Dec 18, 2013

My area of expertise is actually not Ruby so I can't offer meaningful opinions about specific Ruby apps, but passing DB connections to child workers is a pretty common pattern for app design.

Pointing fingers, I would say this is a bug in libmysqlclient as it should be close()ing the socket for the immediate process but not using shutdown() to destroy the shared state. It should be possible to call a mysql_close() and close a particular process' usage of a connection without impacting other processes which also hold the socket. I'm following up with mysql as well, though this is a much slower path to resolution and won't fix the immediate issue as it likely would entail an api change in their client library.

The problem we're faced with is that we can't avoid having this object registered for collection by the Ruby GC. There is no user code we can add to enable connection reuse across forks in Ruby as things are today.

While establishing a connection per-job or per-request will work is very slow as it can add several serialized round-trips over the network. At scale, connection reuse becomes an essential optimization. I would like connection reuse to be possible.

@eam
Copy link
Contributor Author

eam commented Dec 18, 2013

I believe most other db layers provide an explicit #close method distinct from object de-allocation for this reason. All languages which support connection pooling in Apache mod_* implementations would have to.

@brianmario
Copy link
Owner

At scale, connection reuse becomes an essential optimization. I would like connection reuse to be possible.

I think the issue is that there is no facility for which to hand off connections from a pool to children using Resque. If you're using ActiveRecord - each child process will have it's own connection pool and assume it's the only process using each connection in it. This is why every pre-fork modeled system in Ruby suggests that you close connections in the before_fork callback and requires that you reconnect (not just database connections, redis, memcached, anything) in the after_fork callback.

That said, designing a system for Resque which would let the master process hand off connections from a pool would definitely be interesting. It would require redesigning Resque though. So unless you guys have already built that, I'm afraid your only option here is to use the before/after fork hooks. Supporting this in mysql2 itself will involve making a series of trade-offs that may benefit your use case but hurt others. The way things work today should be safe for use assuming you're closing and reopening connections before and after the fork. Unicorn and Rainbows have this same requirement.

For reference we've been using mysql2 with ActiveRecord in Resque at GitHub for almost 4 years now with no issues related to connections. At peak we're processing upwards of 100,000 jobs per minute.

@tamird
Copy link
Contributor

tamird commented Dec 18, 2013

Let's talk numbers:

On my local machine:

>> Benchmark.bmbm { |x| x.report('reconnect') { 1_000.times { ActiveRecord::Base.connection.disconnect!; ActiveRecord::Base.establish_connection } } }; nil
Rehearsal ---------------------------------------------
reconnect 135.430000   3.380000 138.810000 (140.439120)
---------------------------------- total: 138.810000sec

                user     system      total        real
reconnect 138.780000   3.450000 142.230000 (145.258256)
=> nil
>>

In my datacenter:

>> Benchmark.bmbm { |x| x.report('reconnect') { 1_000.times { ActiveRecord::Base.connection.disconnect!; ActiveRecord::Base.establish_connection } } }; nil
Rehearsal ---------------------------------------------
reconnect 145.580000   4.000000 149.580000 (150.080117)
---------------------------------- total: 149.580000sec

                user     system      total        real
reconnect 137.900000   3.510000 141.410000 (141.895566)
=> nil
>>

100ms overhead per resque job 😿

@brianmario
Copy link
Owner

http://bugs.ruby-lang.org/issues/5446 is relevent here as well

@brianmario
Copy link
Owner

I forgot that we recently started using https://github.com/spraints/resqued - it's worth a look if you haven't already.

@eam
Copy link
Contributor Author

eam commented Dec 18, 2013

What are the negative effects of this patch that raise concern? There were some more heavy-handed ways to make this happen but I'm fairly sure I avoided introducing anything problematic.

@tamird
Copy link
Contributor

tamird commented Dec 18, 2013

Seems that we all agree that mysql2 should either FD_CLOEXEC the socket or try really hard to prevent these problems (a la this PR), yes?

@brianmario
Copy link
Owner

Sorry, I'll spend some time making sure I understand the consequences of this patch and try to come to a decision.

@eam
Copy link
Contributor Author

eam commented Jan 7, 2014

I did some thinking over the break: I'd like to float a different approach to this, mimicking the way Perl's DBI handles things by adding an option:

       "InactiveDestroy" (boolean)

       The default value, false, means a handle will be fully destroyed as
       normal when the last reference to it is removed, just as you’d expect.

       If set true then the handle will be treated by the DESTROY as if it was
       no longer Active, and so the database engine related effects of
       DESTROYing a handle will be skipped.

       Think of the name as meaning ’treat the handle as not-Active in the
       DESTROY method’.

       For a database handle, this attribute does not disable an explicit call
       to the disconnect method, only the implicit call from DESTROY that
       happens if the handle is still marked as "Active".

       This attribute is specifically designed for use in Unix applications
       that "fork" child processes. Either the parent or the child process,
       but not both, should set "InactiveDestroy" true on all their shared
       handles.  (Note that some databases, including Oracle, don’t support
       passing a database connection across a fork.)

       To help tracing applications using fork the process id is shown in the
       trace log whenever a DBI or handle trace() method is called.  The
       process id also shown for every method call if the DBI trace level (not
       handle trace level) is set high enough to show the trace from the DBI’s
       method dispatcher, e.g. >= 9.

Implementation should be straightforward: turn the getpid() check into a check for whether this default-off value is enabled, and allow users to twiddle it from Ruby.

The documentation for the new option would be a great place to point people to when they encounter this.

* common socket state. #close still functions in a child process if one
* wishes to force the issue.
*/
if (wrapper->refcount == 0 && getpid() == wrapper->origin_pid) {
nogvl_close(wrapper);
xfree(wrapper->client);
xfree(wrapper);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause a socket and memory leak in forked children which don't actually use the shared mysql connection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of. The GC will not collect the object, but #close can still be used explicitly.

The reason for this is a bit of an oddity due to how mysql_close() uses shutdown(). Normally in cleanup you'd want to free the related client memory and only call close(), which would release the fd reference but not disrupt its use in other processes.

But we don't own the cleanup routines: mysql_close() also calls shutdown(), disrupts all use of the client object and the related socket. Ideally libmysqlclient would supply a way to free the client struct and close the handle without calling shutdown(). But as best I can tell this isn't possible.

There are some mitigating factors here: This can't leak resources in a loop, it becomes a one-time cost of forking. Because mysql_close() calls shutdown(s, 2), the TCP connection will be fully terminated and cleaned up from the kernel state. And #close still does the job explicitly -- anyone writing a proper program is going to want to explicitly manage objects with non-memory resources anyway; relying on the GC to collect things like stale connections is problematic in and of itself.

The limitations in libmysqlclient make this process a bit dirty no matter what so long as inheriting descriptors occurs. FD_CLOEXEC is the only completely clean answer, and probably correct by default, though it's also functionally restrictive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, close(s) causes a process to drop its reference to a socket. shutdown(s,2) will terminate the socket regardless of what other processes reference it. So as long as one process (in this case the parent) still calls mysql_close(), the entire connection state will be cleaned up at that time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyone writing a proper program is going to want to explicitly manage objects with non-memory resources anyway; relying on the GC to collect things like stale connections is problematic in and of itself. -- We support GC as a first class use case. It is not acceptable to break it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, let me clarify: The GC is necessarily unaware (or at least has limited awareness) of non-memory resource limits. These typically must be managed explicitly, regardless of GC behavior. I think a common example from Ruby is the use of constructs such as:

loop { IO.popen("echo hi") { |f| f.read } }

Instead of

loop { IO.popen("echo hi").read }

The former can clean up state after the yield, whereas the GC is unequipped to handle the external resources in the latter case. I'm only suggesting we face a similar issue here, where the mysql C library bits are the externalized resources. Compounding this, the externalized resources are shared. It's an otherwise correct cleanup in one GC instance which breaks another.

To be clear, the gc interaction is broken right now, any time a fork occurs. This is a possible fix (which I've amended a bit, I think this suggestion may be more palatable: #463 (comment) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Adding an option sounds reasonable to me. Update your PR and let's see how it comes out?

@sodabrew
Copy link
Collaborator

Poke for an update?

@dylanahsmith
Copy link
Contributor

This pull request is intentionally creating a memory leak in the child process to avoid disconnecting the socket for the parent process. Instead, it would make more sense to close the socket's file descriptor in the child process, then call mysql_close without worrying about it affecting the connection in the parent processes.

This can already be done manually as follows without exceptions:

client = Mysql2::Client.new
client.query("select 1")
fork do
  IO.new(client.socket).close
  client.close
end
Process.wait
client.query("select 1")

Setting the FD_CLOEXEC flag won't fix the issue being addressed by this pull request, since there is no exec, only a fork. We should still set the FD_CLOEXEC flag, since it is unlikely that the the file descriptor will be re-used after the exec, and the flag can always be cleared with IO.new(client.socket).close_on_exec = false in rare cases where the file descriptor is re-used.

@eam
Copy link
Contributor Author

eam commented Feb 10, 2014

I agree. I don't know why I thought the socket wasn't accessible -- closing it before entering mysql_close() should prevent all of this bad behavior.

@eam
Copy link
Contributor Author

eam commented Feb 10, 2014

I've updated this PR per dylanahsmith's insight. Much simpler, no more downside. Thanks!

@@ -186,6 +186,7 @@ static void *nogvl_close(void *ptr) {
fcntl(wrapper->client->net.fd, F_SETFL, flags | O_NONBLOCK);
#endif

close(wrapper->client->net.fd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this portable with windows. We could we have this in the #ifndef _WIN32 block, since we don't have to worry about fork being used on windows.

@eam
Copy link
Contributor Author

eam commented Feb 10, 2014

Fixed non-unix portability issues, thanks again.

@dylanahsmith
Copy link
Contributor

👍

@sodabrew
Copy link
Collaborator

This is awesome, thanks for staying with the issue to find a really clean fix. Ping when you're ready for a merge review.

@tamird
Copy link
Contributor

tamird commented Feb 10, 2014

@sodabrew I think this is ready :)

@eam
Copy link
Contributor Author

eam commented Feb 10, 2014

👍

@@ -184,6 +184,7 @@ static void *nogvl_close(void *ptr) {
flags = fcntl(wrapper->client->net.fd, F_GETFL);
if (flags > 0 && !(flags & O_NONBLOCK))
fcntl(wrapper->client->net.fd, F_SETFL, flags | O_NONBLOCK);
close(wrapper->client->net.fd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a comment above close() - the git text is pretty good, here's an alternate condensed suggestion:

    /* Call close() on the socket before calling mysql_close(). This
     * prevents mysql_close() from calling shutdown() on the socket.
     * The difference is that close() acts on the current process only,
     * while shutdown() acts for all parent and child processes, which
     * has impacts on applications using a forked worker strategy.
     */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mysql_close() is calling shutdown() on the socket, I think the socket is being closed by the server because mysql_close() sends a QUIT message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does call shutdown -- we initially found this via strace. The actual call is in include/mysql/psi/mysql_socket.h which is invoked through vio_cancel()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial stacktrace from some of our earlier debugging -- how we found it:

#0 0x0000003a678e9190 in shutdown () from /lib64/libc.so.6
#1 0x0000003451a56e06 in vio_close () from /usr/lib64/mysql/libmysqlclient_r.so.16
#2 0x0000003451a5683e in vio_delete () from /usr/lib64/mysql/libmysqlclient_r.so.16
#3 0x0000003451a52962 in end_server () from /usr/lib64/mysql/libmysqlclient_r.so.16
#4 0x0000003451a54bee in cli_advanced_command () from /usr/lib64/mysql/libmysqlclient_r.so.16
#5 0x0000003451a52a7a in mysql_close () from /usr/lib64/mysql/libmysqlclient_r.so.16
#6 0x00007fffefde8059 in nogvl_close (wrapper=0x13103c0) at ../../../../ext/mysql2/client.c:190
#7 decr_mysql2_client (wrapper=0x13103c0) at ../../../../ext/mysql2/client.c:205
#8 0x00007fffefde924c in rb_mysql_result_free (ptr=0x1419c50) at ../../../../ext/mysql2/result.c:88
#9 0x00007ffff7c01961 in run_final () at gc.c:1415
#10 finalize_list () at gc.c:1429
#11 finalize_deferred () at gc.c:1450
#12 rb_gc () at gc.c:3112

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I stand corrected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into this a bit more - it does both. The shutdown() and then close() occurs immediately so we'd never see the reply to the mysql-level QUIT. I'll edit the comment a bit to reflect this.

@sodabrew
Copy link
Collaborator

I have one last concern, from the server's perspective, are we now leaving dangling sockets open, since nobody is calling shutdown() anymore? My recollection is that the OS will handle this for us once all fd's for a socket are closed, but I want to jog my memory of this behavior.

@@ -184,6 +184,7 @@ static void *nogvl_close(void *ptr) {
flags = fcntl(wrapper->client->net.fd, F_GETFL);
if (flags > 0 && !(flags & O_NONBLOCK))
fcntl(wrapper->client->net.fd, F_SETFL, flags | O_NONBLOCK);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this? Since we aren't sending a QUIT message anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can go away, close() should never block.

@dylanahsmith
Copy link
Contributor

@sodabrew man 2 close says

The close() call deletes a descriptor from the per-process object reference table. If this is the last reference to the underlying object, the object will be deactivated. For example, ...; on the last close of a socket(2) associated naming information and queued data are discarded; ...

This is why setting FD_CLOEXEC is a good idea, since it will avoid spawned processes from keeping the file descriptor open. The onus would still be on forked processes to close the Mysql2::Client when they aren't using it.

@eam
Copy link
Contributor Author

eam commented Feb 10, 2014

Updated to remove the unnecessary nonblocking code and to comment per above discussion.

@dylanahsmith
Copy link
Contributor

Ping when you're ready for a merge review.

@sodabrew ping. Looks like this is ready to merge.

@sodabrew
Copy link
Collaborator

Thanks! I have one test I'd like to run locally, which is creating and closing a large number of connections and checking to make sure that the MySQL server correctly handles the closed socket. show status like '%onn%';

#ifndef _WIN32
flags = fcntl(wrapper->client->net.fd, F_GETFL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags is no longer used, please also remove a few lines up:

 #ifndef _WIN32
    int flags;
  #endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@sodabrew
Copy link
Collaborator

Verified that MySQL 5.1 and 5.5 do clean up the connection when the socket is closed this way.

For history buffs, the fcntl flags were originally added in ec00873.

Reminder to self: move #include <fcntl.h> from mysql2-ext.h to infile.c.

Close the mysql socket before calling mysql_close(). mysql_close() calls
shutdown() on the attached socket. In a situation involving fork() and copies
of an object spanning multiple interpreter instances this will result in the
first instance to GC breaking all other instances sharing the socket.

dylanahsmith notes that we can avoid this by simply closing the socket before
entering mysql_close(), preventing the shutdown() from having any impact on the
shared socket state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants