Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysql2::Error: This connection is still waiting for a result, try again once you have the result #772

Closed
mmozuras opened this issue Jul 12, 2016 · 12 comments
Milestone

Comments

@mmozuras
Copy link

mmozuras commented Jul 12, 2016

We've been experiencing this issue in production to varying degrees for the last two years:

Mysql2::Error: This connection is still waiting for a result, try again once you have the result

I've looked at previous issues (like #566, #532, #99), but didn't find anything that helps. We've also tried multiple things ourselves, but did not manage to solve it.

We're running:

mysql2 (0.4.4)
rails (4.2.5.2)
unicorn (4.9.0)

Ruby 2.2

Percona XtraDB Cluster 5.6 
(based on Percona Server 5.6, and subsequently it's based on MySQL 5.6)

Stacktraces don't show anything interesting, here's one example:

app/domain/user_msgs/for_thread.rb:35:in `user_msg_thread'
app/domain/user_msgs/for_thread.rb:27:in `among_receivers?'
app/domain/user_msgs/for_thread.rb:14:in `fetch'
app/domain/user_msgs/for_thread.rb:3:in `fetch'
app/controllers/api/v2/msg_threads_controller.rb:145:in `user_msg_thread'
app/controllers/api/v2/msg_threads_controller.rb:35:in `show'
config/initializers/1.patches/action_controller.rb:7:in `process_action'
app/controllers/api/v2/base_controller.rb:112:in `api'

We didn't notice any particular pattern to when it happens either:

screenshot 2016-07-12 15 20 05

Any help would be appreciated 😄

@dmitry-g
Copy link

After some investigation it appeared that the issue was caused by incorrect signal handling.

  1. A signal (like SIGINT or SIGQUIT) interrupted a Mysql2::Client which was in progress.
  2. mysql_client_wrapper->active_thread was not reset to Qnil and it was still pointing to current Unicorn thread.
  3. Custom signal handling didn't allow unicorn worker to shut down/restart and the worker continued to handle requests.
  4. Each request which came to the worker failed, because mysql_client_wrapper->active_thread was pointing to rb_thread_current (Mysql2::Error: This connection is still waiting for a result, try again once you have the result).

@richardlarocque
Copy link

I ran into another, closely related variant of this issue. It can be triggered when someone uses Thread#kill to interrupt a long-running query. Here's a small repro:

#!/usr/bin/env ruby

require 'mysql2'

def long_query(connection)
  begin
    connection.query('SELECT SLEEP(1000);')
  rescue StandardError => e
    puts "Caught #{e}"
  ensure
    puts "Ensured"
  end
  puts 'done'
end

conn = Mysql2::Client.new(:host => "localhost",
                          :username => "root",
                          :password => 'p4ssw0rd')

t = Thread.new { long_query(conn) }
sleep 1
puts "Killing #{t}, #{t.status}"
t.kill
sleep 1
puts "Killed #{t}, #{t.status}"

conn.query('SELECT 1;')

On my machine, the output is:

Killing #<Thread:0x000000022f7ef0>, sleep
Ensured
Killed #<Thread:0x000000022f7ef0>, false
/var/lib/gems/2.3.0/gems/mysql2-0.4.4/lib/mysql2/client.rb:107:in `_query': This connection is in use by: #<Thread:0x000000022f7ef0@./main.rb:20 dead> (Mysql2::Error)
        from /var/lib/gems/2.3.0/gems/mysql2-0.4.4/lib/mysql2/client.rb:107:in `block in query'
        from /var/lib/gems/2.3.0/gems/mysql2-0.4.4/lib/mysql2/client.rb:106:in `handle_interrupt'
        from /var/lib/gems/2.3.0/gems/mysql2-0.4.4/lib/mysql2/client.rb:106:in `query'
        from ./main.rb:27:in `<main>'

ie. the killed thread is still marked as actively using the connection, even after it's been killed.


This variant is a little bit worse than the ones caused by signals or remotely-triggered exceptions, because there is no reasonable way to mask the Thread#kill signal. The handle_interrupt trick from https://github.com/brianmario/mysql2/pull/731/files won't work here.


I imagine this has been discussed before, but I haven't found any reference to it. Would it be reasonable to try to solve this with an exception-handler in the C extension code? The exception handlers should get run in the Thread#kill case. Would it be possible to register a handler to reset the connection state when an exception is thrown while waiting for a response to some query?

@sodabrew
Copy link
Collaborator

sodabrew commented Oct 5, 2016

@richardlarocque I cannot think of any reasonable workaround for a Thread#kill. At best, the next time you try to use the connection it can say, "The handle is marked in-use by a dead thread, so you need to reconnect."

@curtp
Copy link

curtp commented Nov 18, 2016

For those who are using Passenger to serve their app and use a script to kill passenger processes, check your script. We're running a script via cron to kill processes which are consuming too much ram. Thanks to the research by @dmitry-g, I looked into our script and was able to improve it to remove the use of kill. Passenger 5.0 comes with the passenger-config command which can be used to gracefully shut down passenger processes. We switched to that command recently and since we've started using it, this exception went away. Here is the script we're using to kill the processes for those who are interested (thanks to whoever came up with this script, it has been very helpful).

for i in /usr/local/rvm/bin/rvm-shell -c "passenger-memory-stats" |grep RubyApp|awk '{if ($4 > 299 && $7 == "RubyApp:") print $1}' ; do passenger-config detach-process $i; done

@sodabrew sodabrew modified the milestones: 0.4.5, 0.4.6 Nov 18, 2016
@sodabrew
Copy link
Collaborator

sodabrew commented Jan 3, 2017

It looks like #811 would provide a fix for this problem.

@mmozuras
Copy link
Author

mmozuras commented May 2, 2017

Thanks for the help, everyone! We no longer encounter this issue. 👍

@mmozuras mmozuras closed this as completed May 2, 2017
@eredi93
Copy link

eredi93 commented May 16, 2017

this happened again, we are running the latest gem. here is the log

» 15 May 2017 17:10:35.192 i-07a3c182330826af5 xxxxxxx-rails:ActiveRecord::StatementInvalid (Mysql2::Error: This connection is still waiting for a result, try again once you have the result: SELECT `xxxxxx`.* FROM `xxxxxxxx` WHERE `xxxxxx`.`type` IN ('xxxxxxx', 'xxxxxxxx') AND `xxxxxxx`.`id` = 111111111 ORDER BY `xxxxxxxx`.`id` ASC LIMIT 1 /*application:IntercomApp,line:/app/controllers/intercom_base_controller.rb:28:in `append_info_to_payload'*/):
» 15 May 2017 17:10:35.192  i-07a3c182330826af5 xxxxxxx: config/initializers/stacktrace.rb:10:in `block in instrument'
» 15 May 2017 17:10:35.192 i-07a3c182330826af5 xxxxxxx: config/initializers/stacktrace.rb:8:in `instrument'
» 15 May 2017 17:10:35.192  i-07a3c182330826af5 xxxxxxx: config/initializers/mysql2_exception_bucket.rb:4:in `execute'
» 15 May 2017 17:10:35.193 i-07a3c182330826af5 xxxxxxx: config/initializers/mysql2_readonly_reconnect.rb:4:in `execute'
» 15 May 2017 17:10:35.193  i-07a3c182330826af5 xxxxxxx: app/controllers/xxxxxxx_base_controller.rb:28:in `append_info_to_payload'
» 15 May 2017 17:10:35.192  i-07a3c182330826af5 xxxxxxx: config/initializers/stacktrace.rb:10:in `block in instrument' Context
» 15 May 2017 17:10:35.192  i-07a3c182330826af5 xxxxxxx: config/initializers/stacktrace.rb:8:in `instrument' Context
» 15 May 2017 17:10:35.192  i-07a3c182330826af5 xxxxxxx: config/initializers/mysql2_exception_bucket.rb:4:in `execute' Context
» 15 May 2017 17:10:35.193 i-07a3c182330826af5 xxxxxxx: config/initializers/mysql2_readonly_reconnect.rb:4:in `execute' Context
» 15 May 2017 17:10:35.193  i-07a3c182330826af5 xxxxxxx: app/controllers/xxxxxxx_base_controller.rb:28:in `append_info_to_payload' Context
» 15 May 2017 17:10:35.193 i i-07a3c182330826af5 xxxxxxx: config/initializers/stacktrace.rb:10:in `block in instrument' Context
» 15 May 2017 17:10:35.193  i-07a3c182330826af5 xxxxxxx: config/initializers/stacktrace.rb:8:in `instrument' Context
» 15 May 2017 17:10:35.193  i-07a3c182330826af5 xxxxxxx: app/lib/xxxxxxx/rack/version_header.rb:10:in `call' Context
» 15 May 2017 17:10:35.193  i-07a3c182330826af5 xxxxxxx: app/lib/xxxxxxx/rack/version_metrics.rb:9:in `call' Context
» 15 May 2017 17:10:35.193  i-07a3c182330826af5 xxxxxxx: app/lib/xxxxxxx/rack/bad_request_for_broken_json.rb:11:in `call' Context
» 15 May 2017 17:10:35.193 i-07a3c182330826af5 xxxxxxx: app/lib/xxxxxxxx/rack/character_encoding_coercion.rb:20:in `call' Context

can you help me debug it and see why this is still happening? @sodabrew @darnaut

@sodabrew
Copy link
Collaborator

What are your initializers? mysql2_exception_bucket.rb and mysql_readonly_reconnect.rb?

@austinhalpin
Copy link

mysql2_exception_bucket.rb is basically just a catch all for mysql2 exceptions:
We're recording metrics whenever we see certain mysql exceptions for alarming purposes.

Example:

module ActiveRecord
  module Mysql2ExceptionBucket
    def execute(*)
      super
    rescue Mysql2::Error, ActiveRecord::StatementInvalid => e
      if e.message =~ /This connection is still waiting for a result/
        record_metric
      end
    raise
  end

mysql_readonly_reconnect.rb is something we wrote to force the connection to reconnect!, we found that after we moved to AWS Aurora failovers were no longer handled gracefully & needed a redeploy to bounce connections. To avoid that, we wrote this initializer:

module ActiveRecord
  module Mysql2ReadonlyReconnect
    def execute(*)
      super
    rescue Mysql2::Error, ActiveRecord::StatementInvalid => e
      if e.message =~ /read-only/
        Rails.logger.info("MySQL running in read-only mode: reconnecting before raising error")
        reconnect!
      end

      raise
    end
  end
end

ActiveRecord::ConnectionAdapters::Mysql2Adapter.prepend(ActiveRecord::Mysql2ReadonlyReconnect)

@sodabrew
Copy link
Collaborator

Interesting re: Aurora failovers! Do you have metrics for Aurora failover events? I can imagine a relationship between these two scenarios:

Rails app sends query
Aurora fails over, connection errs silently
Rails app tries to send next query on the connection that erred silently: 'still waiting for a result'
Rails app tries to send next query on a connection that was in the pool: 'in read-only mode'

@austinhalpin
Copy link

The event that @eredi93 reported was isolated to a single host & wasn't triggered during any database failover, which is why we're at a bit of loss trying to figure out what the cause could be here. :(

@stefansundin
Copy link
Contributor

@austinhalpin Thanks for sharing your initializer, it helped me! I'm also curious if you've encountered any other gotchas in the last year since your post. :)

I have been using Aurora a bit recently, and encountered the same issue with failovers. At first I thought that connections were not reset when the failover happened. After some more testing it does appear that all connections are reset, but that Rails often manages to reconnect before the DNS update has propagated fully. The Aurora cluster endpoint is a CNAME record with a 5 second TTL.

So my conclusion is that the mysql process on the old master restarts too quickly and is able to receive connections again (but now as a slave) before the DNS has propagated fully. Clients that manage to connect are oblivious to what happened and tries to write to the slave.

I changed your initializer slightly, using pool.disconnect! instead of reconnect! in order to force the whole connection pool to disconnect at once (this should hopefully reduce the number of errors that end users may receive). I also switched the regex to search for a string.

require 'active_record'

module ActiveRecord
  module Mysql2ReadonlyReconnect
    def execute(*)
      super
    rescue Mysql2::Error, ActiveRecord::StatementInvalid => e
      if e.message.include?('The MySQL server is running with the --read-only option')
        pool.disconnect!
      end
      raise
    end
  end
end

ActiveRecord::ConnectionAdapters::Mysql2Adapter.prepend(ActiveRecord::Mysql2ReadonlyReconnect)

Finally, this document has been helpful to adopt best practices for Aurora: https://d1.awsstatic.com/whitepapers/RDS/amazon-aurora-mysql-database-administrator-handbook.pdf

The document mentions "smart drivers" for Aurora. Basically the table INFORMATION_SCHEMA.REPLICA_HOST_STATUS let's you discover other replicas directly in SQL. You can also query SELECT @@innodb_read_only; to determine if you're actually on the master if that is what you expect.

There are some other Aurora-specific variables available:

> show variables like 'aurora%';
+------------------------------------+------------------+
| Variable_name                      | Value            |
+------------------------------------+------------------+
| aurora_lab_mode                    | OFF              |
| aurora_load_from_s3_role           |                  |
| aurora_max_alter_table_log_entries | 1000             |
| aurora_max_connections_limit       | 16000            |
| aurora_select_into_s3_role         |                  |
| aurora_server_id                   | staging-aurora-1 |
| aurora_use_key_prefetch            | ON               |
| aurora_version                     | 2.02             |
+------------------------------------+------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants