Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysql2 gem improperly closing MySQL connections #606

Closed
dtest opened this issue Apr 2, 2015 · 26 comments
Closed

mysql2 gem improperly closing MySQL connections #606

dtest opened this issue Apr 2, 2015 · 26 comments

Comments

@dtest
Copy link

dtest commented Apr 2, 2015

The mysql2 gem doesn't properly close the connection, even when explicitly calling the 'client.close' method, resulting in increase of 'Aborted_clients' counter indicating an improper closure of the connection. My setup is MySQL 5.6.23 installed from MySQL's ubuntu repo on an Ubuntu Trusty EC2 instance, mysql2 gem v0.3.18, ruby 2.0.0p353 (embedded sensu version), libmysqlclient18

dtest@ip-10-1-1-73:~# /opt/sensu/embedded/bin/ruby --version
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]
dtest@ip-10-1-1-73:~# /opt/sensu/embedded/bin/gem list | grep mysql
mysql (2.9.1)
mysql2 (0.3.18)
dtest@ip-10-1-1-73:~# ldd /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/mysql2-0.3.18/lib/mysql2/mysql2.so
  linux-vdso.so.1 =>  (0x00007fffa07fe000)
  libruby.so.2.0 => /opt/sensu/embedded/lib/libruby.so.2.0 (0x00007f8bde4ec000)
  libmysqlclient.so.18 => /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 (0x00007f8bddf35000)
  libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8bddd10000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8bdd94b000)
  librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f8bdd743000)
  libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f8bdd53e000)
  libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f8bdd305000)
  libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8bdcfff000)
  /lib64/ld-linux-x86-64.so.2 (0x00007f8bdeba9000)
  libz.so.1 => /opt/sensu/embedded/lib/libz.so.1 (0x00007f8bdcde1000)
  libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f8bdcadd000)
  libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8bdc8c7000)
dtest@ip-10-1-1-73:~# mysql --version
mysql  Ver 14.14 Distrib 5.6.23, for Linux (x86_64) using  EditLine wrapper

I have the following example script:

#!/usr/bin/ruby

require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'mysql2'

puts Time.now.strftime("%d/%m/%Y %H:%M:%S")+"-Before connect";
# socket
client = Mysql2::Client.new(:host => 'localhost', :username => 'ruby', :password => 'test', :socket => "/var/lib/mysql/mysql.sock");
# tcp
#client = Mysql2::Client.new(:host => '127.0.0.1', :username => 'ruby', :password => 'test');
puts Time.now.strftime("%d/%m/%Y %H:%M:%S")+"- Query 1";
client.query("SELECT 1;");
sleep(1);
puts Time.now.strftime("%d/%m/%Y %H:%M:%S")+"- Query 2";
client.query("SELECT 1;");
sleep(1);
puts Time.now.strftime("%d/%m/%Y %H:%M:%S")+"- Query 3";
client.query("SELECT 1;");
sleep(1);
puts Time.now.strftime("%d/%m/%Y %H:%M:%S")+"- Close";
client.close;
sleep(1);
puts Time.now.strftime("%d/%m/%Y %H:%M:%S")+"- End";

Which produces output like this, for both socket and TCP connections:

dtest@ip-10-1-1-73:~# mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";/opt/sensu/embedded/bin/ruby mysql_test.mysql2.tcp.rb ;mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 12106 |
+-----------------+-------+
02/04/2015 00:00:32-Before connect
02/04/2015 00:00:32- Query 1
02/04/2015 00:00:33- Query 2
02/04/2015 00:00:34- Query 3
02/04/2015 00:00:35- Close
02/04/2015 00:00:36- End
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 12107 |
+-----------------+-------+
dtest@ip-10-1-1-73:~# mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";/opt/sensu/embedded/bin/ruby mysql_test.mysql2.socket.rb ;mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 12109 |
+-----------------+-------+
02/04/2015 00:01:26-Before connect
02/04/2015 00:01:26- Query 1
02/04/2015 00:01:27- Query 2
02/04/2015 00:01:28- Query 3
02/04/2015 00:01:29- Close
02/04/2015 00:01:30- End
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 12110 |
+-----------------+-------+
dtest@ip-10-1-1-73:~# grep ruby /var/log/mysql/error.log
2015-04-02 00:00:35 11797 [Warning] Aborted connection 12561 to db: 'unconnected' user: 'ruby' host: '127.0.0.1' (Got an error reading communication packets)
2015-04-02 00:01:29 11797 [Warning] Aborted connection 12566 to db: 'unconnected' user: 'ruby' host: 'localhost' (Got an error reading communication packets)

We can see the entry in the error log at the same time the script calls the 'client.close' method.

I have verified that the mysql gem does not increase the 'Aborted_clients' counter with the same script and linked libraries.

@jeremy
Copy link
Contributor

jeremy commented Apr 2, 2015

Noticed this here also, and on older MySQL as well. Nice repro.

@sodabrew
Copy link
Collaborator

sodabrew commented Apr 2, 2015

Could you run the same tests against multiple gem versions - e.g. from 0.3.12 - 0.3.18, and see if it started at a particular release?

@dtest
Copy link
Author

dtest commented Apr 3, 2015

Looks like it came in 0.3.16:

0.3.15

dtest@ip-10-1-1-73:~# /opt/sensu/embedded/bin/gem uninstall mysql2
Successfully uninstalled mysql2-0.3.12
dtest@ip-10-1-1-73:~# /opt/sensu/embedded/bin/gem install mysql2 -v 0.3.15
Fetching: mysql2-0.3.15.gem (100%)
Building native extensions.  This could take a while...
Successfully installed mysql2-0.3.15
Parsing documentation for mysql2-0.3.15
unable to convert "\xC0" from ASCII-8BIT to UTF-8 for lib/mysql2/mysql2.so, skipping
Installing ri documentation for mysql2-0.3.15
1 gem installed
dtest@ip-10-1-1-73:~# mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";/opt/sensu/embedded/bin/ruby mysql_test.mysql2.socket.rb ;mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 11039 |
+-----------------+-------+
03/04/2015 02:49:33-Before connect
03/04/2015 02:49:33- Query 1
03/04/2015 02:49:34- Query 2
03/04/2015 02:49:35- Query 3
03/04/2015 02:49:36- Close
03/04/2015 02:49:37- End
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 11039 |
+-----------------+-------+

0.3.16

dtest@ip-10-1-1-73:~# /opt/sensu/embedded/bin/gem uninstall mysql2
Successfully uninstalled mysql2-0.3.17
dtest@ip-10-1-1-73:~# /opt/sensu/embedded/bin/gem install mysql2 -v 0.3.16
Fetching: mysql2-0.3.16.gem (100%)
Building native extensions.  This could take a while...
Successfully installed mysql2-0.3.16
Parsing documentation for mysql2-0.3.16
unable to convert "\xE0" from ASCII-8BIT to UTF-8 for lib/mysql2/mysql2.so, skipping
Installing ri documentation for mysql2-0.3.16
1 gem installed
dtest@ip-10-1-1-73:~# mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";/opt/sensu/embedded/bin/ruby mysql_test.mysql2.socket.rb ;mysql --defaults-file=~/.my.cnf -e "SHOW GLOBAL STATUS LIKE 'aborted_clients'";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 11093 |
+-----------------+-------+
03/04/2015 02:56:50-Before connect
03/04/2015 02:56:50- Query 1
03/04/2015 02:56:51- Query 2
03/04/2015 02:56:52- Query 3
03/04/2015 02:56:53- Close
03/04/2015 02:56:54- End
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Aborted_clients | 11094 |
+-----------------+-------+

@anazar
Copy link

anazar commented Jun 19, 2015

👍 -- same issue encountered when upgraded from 0.3.14 -> 0.3.18 ... resulted in too many connection errors on ruby 2.0.0

@telescope
Copy link

Hi,

Is anyone else having an issue with this? I'm getting a lot of dead connection errors using Rails and Puma when load increases.

Is the workaround to move down to 0.3.15 for the time being?

I would help out but I have no experience of writing a C gem (not written any C in 20 years) - I will take a look though.

@sodabrew
Copy link
Collaborator

sodabrew commented Jul 8, 2015

This was an intentional change; previously, closing the connection (or even having a connection object get garbage-collected) in one process would close the connection in all other process which had also inherited the same open connection handle. This is because mysql_close() calls shutdown() syscall on the socket. By switching to a plain close() on the socket, if another process was using the connection it would be unaffected.

Is there actually a negative effect on your MySQL server?

@limitusus
Copy link

@sodabrew
In very low traffic service and huge amount of disks, there should be few negative effects.
mysqld.err grew up to over 20GB only due to this warnings in my service in several days, wasting the disk space and making it difficult to read error logs. It's an enough negative effect, isn't it?

@sodabrew
Copy link
Collaborator

Ah, I didn't realize this would end up as a log line, I feel your pain on that. I wonder if there's a way to ask MySQL not to be noisy about this, and we can recommend that in the docs.

@limitusus
Copy link

Yes I'm disabling warning log competely for now, by set global log_warnings = 0 on such servers. But I hope this issue fixes, since obviously suppressing all warning logs is a bad idea.

@cbandy
Copy link
Contributor

cbandy commented Aug 6, 2015

Queries aren't necessary to reproduce. Opening and closing a connection with valid credentials will do the same.

ruby -r mysql2 -e 'Mysql2::Client.new(host: "...", username: "...").close'

With this I was able to see that version 0.3.15 sends one more TCP packet than later versions. Could it be that we are no longer sending COM_QUIT?

@sodabrew
Copy link
Collaborator

sodabrew commented Aug 6, 2015

That's correct, we're no longer sending it. This is to avoid the situation where a connection is inherited into a child process and then gets garbage collected. The child process would be quitting the connection on behalf of the parent process. The motivation was actually to avoid the shutdown() syscall, and to use close() instead, with the effect of swallowing the COM_QUIT into /dev/null.

@rbroemeling
Copy link

@sodabrew I consider the original change mis-informed, to say the least. When running in a server/client architecture like this, it is pretty safe to assume that when one side (the client) closes the connection, the server is no longer supposed to treat that connection as valid. The real answer to "how to stop closing a connection from affecting other processes" is to ensure that you don't share connections across processes, which is -- AFAIK -- an informed best-practice in any case. Changing the mysql2 gem in a questionable way in order to support people being able to share connections across processes seems like an inadvisable path.

The log error/warning spam caused by this is extreme. I'd ask that the change be reverted, so that the mysql2 gem properly closes down a connection when it is asked to do so.

@sodabrew
Copy link
Collaborator

I totally hear that argument, and I'm comfortable at least adding an option to control this behavior. Are you handy with this codebase to submit a PR to kick off the coding for it?

@rbroemeling
Copy link

@sodabrew Sure, I'm willing to kick things off. I'll see if I can spin something up over the next day or so.

@sodabrew
Copy link
Collaborator

Awesome, thanks. Currently the master branch is about to become 0.4.0, with prepared statements support, and there's already one new flag (:sslverify). We can add this early in the 0.4.x series.

@rbroemeling
Copy link

@sodabrew I've been thinking about this a fair amount, though I haven't started writing any code yet. It is my understanding that what the implemented fix tries to solve is unexpected inheritance and side-effects of MySQL connections -- it is trying to ensure that principle of least surprise attaches to the behavior of the connections, such that if you open a connection in the parent, the child won't disconnect it.

I'm wondering if the more correct implementation isn't to enforce "ownership" for the connection. Meaning, whenever MySQL2 has a connection, that connection is tagged with an "owner" (a PID). Then, when disconnecting/closing/being garbage collected, it either:

  • does a "clean" shutdown if the current PID == the "owner" PID.
  • does an "unclean" shutdown (i.e. does not issue COM_QUIT) if the current PID != the "owner" PID.

It's my understanding that this is how a number of other projects have pursued the same problem, so perhaps this is a better way forward than hard-enforcing either way (clean vs. unclean) or providing a hard-enforcement toggle to switch back and forth between them.

Any thoughts?

@cbandy
Copy link
Contributor

cbandy commented Aug 23, 2015

It has already been considered. The parent process created the connection, handed it to a child process, then garbage collected the connection. Garbage collection by the parent/owner was the original problem.

@rbroemeling
Copy link

@cbandy Ah, OK -- I'm just trying to make this work in a way that doesn't involve a straight configuration. What about detecting whether we are being garbage-collected or not; and doing a full shutdown if we are not being garbage-collected; and doing the unclean shutdown (without COM_QUIT) if we are being garbage-collected? There is still an edge case where the MySQL would throw errors/complain about aborted connections if connection objects are being garbage-collected without being properly closed first, but in that instance it would always be possible for the user of the library to "fix" the problem by explicitly closing the connection before the object is thrown away to be GC'ed.

@rbroemeling
Copy link

@cbandy / @sodabrew Any thoughts on the above suggestion?

@sodabrew
Copy link
Collaborator

sodabrew commented Sep 1, 2015

It's a good idea – having the explicit Client#close do a shutdown makes sense. It's still kind of surprising that explicit close would be cleaner than allowing the GC to work, but it's not terrible.

Unfortunately this is a reasonably common pattern: A parent process opens a connection, and spawns a child process which uses that connection. If the parent processes exits and reaps the connection, now the child has a broken connection. If the child exits and reaps the connection, now the parent has a broken connection that it cannot hand to a different child process. One solution has been to recommend automatic reconnect, while solves the immediate problem of the broken connection but loses the performance benefits of keeping connections alive in a pool.

A tuning flag to choose "loose" vs "correct" connection close behavior is probably still very useful.

@cbandy
Copy link
Contributor

cbandy commented Sep 1, 2015

detecting whether we are being garbage-collected or not; and doing a full shutdown if we are not being garbage-collected; and doing the unclean shutdown (without COM_QUIT) if we are being garbage-collected?

Sounds good.

A tuning flag to choose "loose" vs "correct" connection close behavior is probably still very useful.

Sounds good. Should the default be to send COM_QUIT or not?

@cbandy
Copy link
Contributor

cbandy commented Sep 2, 2015

detecting whether we are being garbage-collected or not; and doing a full shutdown if we are not being garbage-collected; and doing the unclean shutdown (without COM_QUIT) if we are being garbage-collected?

I think I've done this in cbandy@terminate-on-close.

After familiarizing myself with this code and the MySQL C API, I'm curious why we go to such great lengths to try and share copies of MYSQL across forks. As I understand it, doing so is entirely unsupported:

  • MYSQL

    This structure represents a handle to one database connection. It is used for almost all MySQL functions. Do not try to make a copy of a MYSQL structure. There is no guarantee that such a copy will be usable.

@sodabrew
Copy link
Collaborator

sodabrew commented Sep 2, 2015

The branch looks good, thanks for digging into this. I'll provide some connects on commit and you can open a PR when you're ready.

We're not copying or hacking the MYSQL connection structure. A fork is a complete copy of the program that retains the same resources as the parent process. The problem that #463 went after is that if your program has some MySQL handles open, and then forks, the child may find it doesn't need those MySQL handles. So it garbage collects them. At that point the GC calls Client#close, which sends the COM_QUIT message and calls shutdown() on the socket -- but the effect of this is to break the connection for all users, including the parent process.

@cbandy
Copy link
Contributor

cbandy commented Sep 2, 2015

Thank you for phrasing it yet another way for my thick head. Yes, we want to not mysql_close at GC for very legitimate use cases!

Can you offer some guidance on the configuration interface? I'm thinking it should look very much like connect_timeout=. The parent or child would set the option right after fork. The accesor/method name depends on the default.

My strong opinion is that we should revert the current behavior so that any GC will send the QUIT message then reap the socket. This could break existing applications, but seems the least surprising in the end.

@rbroemeling
Copy link

@cbandy The terminate-on-close fork that you've done up looks exactly correct, thanks!

Effectively what is being discussed now is a configuration interface that controls how connections that are being garbage-collected are disposed of, correct?

I'm curious what the significant benefit of that configuration is, @sodabrew (i.e. when would it be used)? I've been thinking about it, and as much as I agree that it is best practice to not share FDs between parents and children; it is also a fairly common paradigm. At some level, having the GC only GC the object (and not tear down the MySQL connection) is correct as well, and at that point the error on the MySQL server is completely due to the connection not being properly closed with close() -- which the user should be doing anyway, correct?

I feel like the configuration parameter is not entirely necessary, as the same effect can be achieved by people by just properly calling .close() on their mysql connections before allowing the object to be GC'ed.

@sodabrew
Copy link
Collaborator

sodabrew commented Sep 2, 2015

The posted change looks like it can drop right in as the only behavior. I need to think about this a little more, but I think it's clean. The fact is that if multiple processes share a single connection, they also cannot interleave their read/write on that connection, because the connection itself is stateful (i.e. can be in query results mode, can be in the middle of a transaction, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants