Add support for ActiveRecord's connection pool #29

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

BDQ commented Mar 26, 2012

This adds support for using the ActiveRecord connection pool when connecting to multiple databases for both mysql2 and postgres. This has no impact when using postgres schemas.

This is a sizable enough change so I want to outline some points about it.

  1. Justification for using the connection pool:

1.a) Currently apartment creates a new connection for each request, totally by-passing Rails built-in connection pool. To highlight this, the output below is taken from a (freshly restarted) MySQL server, before and after 1,000 requests to a test Rails app using apartment:

Before (no connection pool)

mysql> show processlist;
+----+------+-----------+------+---------+------+-------+------------------+
| Id | User | Host      | db   | Command | Time | State | Info             |
+----+------+-----------+------+---------+------+-------+------------------+
|  1 | root | localhost | NULL | Query   |    0 | NULL  | show processlist |
+----+------+-----------+------+---------+------+-------+------------------+
1 row in set (0.00 sec)

mysql> show processlist;
+------+------+-----------+--------+---------+------+-------+------------------+
| Id   | User | Host      | db     | Command | Time | State | Info             |
+------+------+-----------+--------+---------+------+-------+------------------+
|    1 | root | localhost | NULL   | Query   |    0 | NULL  | show processlist |
| 1001 | root | localhost | app1   | Sleep   |   62 |       | NULL             |
| 1002 | root | localhost | app2   | Sleep   |   62 |       | NULL             |
| 1004 | root | localhost | app2   | Sleep   |   62 |       | NULL             |
+------+------+-----------+--------+---------+------+-------+------------------+

As you can see the "Id" column shows that 1,000 db connections were created.

After (with connection pool)

+----+------+-----------+------+---------+------+-------+------------------+
| Id | User | Host      | db   | Command | Time | State | Info             |
+----+------+-----------+------+---------+------+-------+------------------+
|  1 | root | localhost | NULL | Query   |    0 | NULL  | show processlist |
+----+------+-----------+------+---------+------+-------+------------------+
1 row in set (0.00 sec)

mysql> show processlist;
+----+------+-----------+-----------------------+---------+------+-------+------------------+
| Id | User | Host      | db                    | Command | Time | State | Info             |
+----+------+-----------+-----------------------+---------+------+-------+------------------+
|  1 | root | localhost | NULL                  | Query   |    0 | NULL  | show processlist |
|  2 | root | localhost | multi_db_master       | Sleep   |  291 |       | NULL             |
|  3 | root | localhost | multi_db_master       | Sleep   |  290 |       | NULL             |
|  4 | root | localhost | multi_db_master       | Sleep   |  290 |       | NULL             |
|  5 | root | localhost | app1                  | Sleep   |  291 |       | NULL             |
|  6 | root | localhost | app1                  | Sleep   |  291 |       | NULL             |
|  7 | root | localhost | app1                  | Sleep   |  291 |       | NULL             |
|  8 | root | localhost | app2                  | Sleep   |  291 |       | NULL             |
|  9 | root | localhost | app2                  | Sleep   |  291 |       | NULL             |
| 10 | root | localhost | app2                  | Sleep   |  291 |       | NULL             |
+----+------+-----------+-----------------------+---------+------+-------+------------------+

And after this change you can see it's maintaining x connections for each database, with no churn on connection Id's.

1.b) It's also significantly faster, here's the before and after numbers:

Before (no connection pool)

Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:              66.39 secs
Data transferred:          25.02 MB
Response time:              0.66 secs
Transaction rate:          15.06 trans/sec
Throughput:             0.38 MB/sec
Concurrency:                9.91
Successful transactions:        1000
Failed transactions:               0
Longest transaction:            1.13
Shortest transaction:           0.14

After (with connection pool)

Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:              25.04 secs
Data transferred:          19.96 MB
Response time:              0.25 secs
Transaction rate:          39.94 trans/sec
Throughput:             0.80 MB/sec
Concurrency:                9.95
Successful transactions:        1000
Failed transactions:               0
Longest transaction:            0.38
Shortest transaction:           0.06

As you can see it dropped the time required for 1,000 requests from 10 concurrent clients (against 3 unicorn workers) from 66 seconds to 25 seconds.

Here's the siege command used to generate these tests:

 siege -r 100 -c 10 -b -f localurls.txt 

The urls file contains just two separate domains.

  1. The implementation defines a dummy ActiveRecord model for each database and forces that to be used as the "key" for the connection pool within Rails.

This basically means that each separate database will now maintain it's own pool of connections to the database Note: This might require some custom database server configuration for large number of databases + connections, but we have a version of this running in production with 2,500 mysql databases.

  1. I also needed the change the behavior of Apartment::Database.current_database (for abstract_adapter) as it was interrogating the AR:Base.connection for the db name, we now store this as an app configuration value to make it more independent of what ActiveRecord is doing.

  2. Some existing specs had to be refactored to get them to run successfully (together) with the addition of my new specs.

Let me know if you have any questions / concerns.

Owner

bradrobertson commented Mar 27, 2012

thanks for all the info along with the pull request. I'll take a look at all of this sometime this week. Also hoping to solve the issue of test order so hopefully going forward our test setup will be a little more resilient.

Contributor

BDQ commented Mar 27, 2012

I've got some thoughts on the test order issue, I ran into a lot of problems getting the specs running for this change.

Here's my suggestions

  1. Do all the before / after creating and dropping of the databases using the underlying mysql2 and postgres libraries directly and not ActiveRecord itself, I think that will cut down on a lot of the weirdness with the before/after filters.

  2. I think it might be better to just stub the Apartment::Database.adapter method with the actual class (and configuration) for each spec, as opposed to just relying on apartment to figure it out. We don't expect the adapter to change in production, so I don't think you should be concerned about this type of stub.

  3. Improve the spec_helper after block to reset everything to it's default configuration.

@BDQ , "This has no impact when using postgres schemas." So are you saying schemas are in fact working with pooled connection, or your proposed fix doesnt address schemas? Thanks for clarifying.

Contributor

BDQ commented Apr 13, 2012

@jfoo It's my understanding that the current code when using schemas is using a connection pool correctly.

Looking at current code:

https://github.com/bradrobertson/apartment/blob/development/lib/apartment/adapters/abstract_adapter.rb#L126

The connection is always established on ActiveRecord::Base, and this is fine for schemas as it is always the same database. But for mysql / separate db's the if you establish a connection on ActiveRecord::Base to different database the existing pool is discarded, with the effect that no pool is really used.

My proposed change:

https://github.com/BDQ/apartment/blob/connection_pool/lib/apartment/adapters/abstract_adapter.rb#L142

Creates a dummy AR class that serves as the connection pool owner for all databases (other than the master, which still uses AR:Base). Hence, we never reestablish a connection on the same AR class to a different database, so the pool is used.

Long story short, schemas were always using a connection pool so my change has no effect on that.

Owner

bradrobertson commented Apr 15, 2012

I'd like to do a bit of research into mysql before accepting this pull request. It was brought to my attention that mysql databases behave similarly to postgresql schemas, where you can query from multiple databases and join on them just like you can schemas. At first glance in the console, it seems this is true: I was able to query a table from a different database just by prepending the db name. This is exactly how postgres allows us to query different schemas without having to mess with connections at all.

I think this would be a nicer approach than having to worry about connections an pooling. I don't think it will be quite as easy as the postgres implementation, as postgres gives you a method to just change the schema_search_path in one call for all models, but maybe there's an equivalent in ActiveRecord that would allow use to prepend db names for mysql?

Contributor

BDQ commented Apr 17, 2012

That's an interesting possibility too, I took a quick scan of the ActiveRecord mysql2 adapter and didn't see any support for this kind of thing in there.

We might be able to use / replicate something similar to what the set_table_name (or table_name) method does:

https://github.com/rails/rails/blob/36d7af34d6e878a4557ba8a2c282609da2f646ba/activerecord/lib/active_record/model_schema.rb#L120

It could prove to be more efficient than my approach of maintaining lots of separate connection pools.

Contributor

BDQ commented Jun 20, 2012

@bradrobertson - I've spent quiet a bit of time testing your suggestion above (a more "postgres" style of connecting to other dbs). I was using the table_name= method(s) and while it works really well using simple/single models if falls down with joins or any complex Arel logic at all. While it's still technically possible it will require a lot of monkey patching on ActiveRecord to fix the joins, and other areas.

I still think my suggested connection pool approach is solid, and have recently improved it handle some Rails 3.2.6 changes. We've had this code in production with 2,500+ DBs and it's been working perfectly.

Are you likely to merge this? I will rebase off master if so, otherwise I don't wanna waste my time.

Owner

bradrobertson commented Jun 20, 2012

Hey Brian,

That's too bad, I was hoping we could work with something more postgres
like. Are you saying joins don't work when you prefix with the db name?
Because certainly they don't work across connections, but I would have
thought that maybe they'd work across databases, which would be the
advantage of prepending the db name.

Anyway, since you seem to be the most active of the mysql users and I don't
have much time to really develop this myself I'll take you as an
authoritative source. If you can rebase against master and make sure the
tests pass, I'll be able to merge it in.

On Wed, Jun 20, 2012 at 11:56 AM, Brian Quinn <
reply@reply.github.com

wrote:

@bradrobertson - I've spent quiet a bit of time testing your suggestion
above (a more "postgres" style of connecting to other dbs). I was using the
table_name= method(s) and while it works really well using simple/single
models if falls down with joins or any complex Arel logic at all. While
it's still technically possible it will require a lot of monkey patching on
ActiveRecord to fix the joins, and other areas.

I still think my suggested connection pool approach is solid, and have
recently improved it handle some Rails 3.2.6 changes. We've had this code
in production with 2,500+ DBs and it's been working perfectly.

Are you likely to merge this? I will rebase off master if so, otherwise I
don't wanna waste my time.


Reply to this email directly or view it on GitHub:
#29 (comment)

Owner

bradrobertson commented Jul 15, 2012

err... rebase against development i mean. thx

Owner

bradrobertson commented Sep 26, 2012

just a heads up, apartment has now moved to influitive

Owner

bradrobertson commented Jan 4, 2013

btw @BDQ not sure if you're still using Apartment, but I've just pushed up changes that allow Apartment to switch databases without using different connections. It's the original plan I had for the gem to avoid having to muck with connection pools etc...

I know it's been a while but if you're up for it, would love to hear your thoughts on the latest dev branch

Contributor

BDQ commented Jan 7, 2013

Hey @bradrobertson - We're still using it in a big way, but got sick of fighting with mysql and swapped to postgresql. Haven't hit any interesting numbers with the postgres install yet, but it appears a look faster. (FYI: we hit 6,700+ mysql DBs on one box without any issues).

Took a quick look at the dev branch, "use" makes a lot of sense - don't know why I didn't think of that! Recent stuff looks awesome, will try and bump one of my apps to include the latest soon.

BDQ closed this Jan 7, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment