Part 4: Multi db improvements, Basic API for connection switching #34052

eileencodes · 2018-10-02T17:58:06Z

This PR implements the basic API requirements laid out in #33877 by DHH. The PR aims to focus only on implementing the connects_to and connected_to API. For now it does not tackle any configuration changes (we can hash that out in future PRs). If this API is acceptable I will add tests.

cc/ @dhh @matthewd @rafaelfranca @tenderlove

This PR adds the ability to 1) connect to multiple databases in a model,
and 2) switch between those connections using a block.

To connect a model to a set of databases for writing and reading use
the following API. This API supersedes establish_connection. The
writing and reading keys represent handler / mode names and
animals and animals_replica represents the database key to look up
the configuration hash from.

class AnimalsBase < ApplicationRecord
  connects_to database: { writing: :animals, reading: :animals_replica }
end

Inside the application - outside the model declaration - we can switch
connections with a block call to connected_to.

If we want to connect to a db that isn't default (ie readonly_slow) we
can connect like this:

Outside the model we may want to connect to a new database (one that is
not in the default writing/reading set) - for example a slow replica for
making slow queries. To do this we have the connected_to method that
takes a database hash that matches the signature of connects_to. The
connected_to method also takes a block.

ModelInPrimary.connected_to(database: { slow_readonly: :primary_replica_slow }) do
  ModelInPrimary.do_something_thats_slow
end

For models that are already loaded and connections that are already
connected, connected_to doesn't need to pass in a database because
you may want to run queries against multiple databases using a specific
mode/handler.

In this case connected_to can take a handler and use that to swap on
the connection passed. This simplies queries - and matches how we do it
in GitHub. Once you're connected to the database you don't need to
re-connect, we assume the connection is in the pool and simply pass the
handler we'd like to swap on.

ActiveRecord::Base.connected_to(hander: :reading) do
  Dog.read_something_from_dog
  ModelInPrimary.do_something_from_model_in_primary
end

tenderlove · 2018-10-02T18:13:59Z

activerecord/lib/active_record/connection_handling.rb

+    def connected_to(database: nil, handler: nil, &blk)
+      if database && handler
+        raise ArgumentError, "connected_to can only accept handler or database, but not both arguments."
+      end


Should we raise an exception if both are nil?

tenderlove · 2018-10-02T18:20:52Z

I like that we can use ActiveRecord::Base.connected_to(handler: :writing) { ... } to switch handlers, but it seems really long to write. I'm guessing that most folks will just have read/write replicas and probably name them "read" and "write" (that's what we do). Could we also introduce something like:

ActiveRecord::Base.for_writing { ... } that's just a synonym for ActiveRecord::Base.connected_to(handler: :writing) { ... }? If we follow a naming convention, it seems like we could shorten the code we have to write.

rafaelfranca · 2018-10-02T21:25:40Z

What would happen if two models have the same handler name for two different databases? Should we support that? Say:

class Dog
  connects_to database: { writing: :animals, reading: :animals_replica }
end

class Book
  connects_to database: { writing: :things, reading: :things_replica }
end

Also say we have that scenario, how would the following code work?

Dog.connected_to(hander: :reading) do
  Dog.create!
  Book.create!
end

If I got the implementation correctly Book would use the same connection handler as Dog because all models share the same connection handler in ActiveRecord::Base.

eileencodes · 2018-10-02T21:34:39Z

What would happen if two models have the same handler name for two different databases? Should we support that? Say:

That's exactly what I want to support because that's how we do it at GitHub. We have 10 connections that belong to a writing handler (we call it default) and 10 connections that belong to a reading handler (we call it readonly). I'm a little confused because I thought you said at Shopify you use the multiple handler approach as well? The underlying connection swapping behavior here isn't different from the original PR, just the public API is.

In GitHub we'd don't write this:

Dog.connected_to(handler: :reading) do
  Dog.create! # explode from Dog bc doing a write on a read
  Book.create! # isn't called but not because it's Dog's handler, you told Rails what handler to use - `reading`. 
end

Instead we write this (but with GitHub instead of Ar Base bc Rails doesn't support this yet)

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.create!
  Book.create!
end

If we want to write to multiple dbs we can do that by using the writing handler:

ActiveRecord::Base.connected_to(handler: :writing) do
  Dog.create! # success
  Book.create! # success
end

Dog and Book know which database they belong to because the model tells them. The connected_to method looks up the handler, and then Rails looks up the connection from that handler with the connection specification name.

tenderlove · 2018-10-02T21:43:06Z

@rafaelfranca I think we should support your first scenario, but if you want to switch both models you need to do it at AR::Base as @eileencodes mentions. Maybe we should raise an exception if you don't call it on AR::Base?

dhh · 2018-10-02T22:08:15Z

I like this. A few notes:

If you're connecting directly to a specific database, you shouldn't have to declare the role:

ModelInPrimary.connected_to(database: :primary_replica_slow) do
  ModelInPrimary.do_something_thats_slow
end

Re: handler, I don't really like that word much. I'd prefer to use "role". That would connect with the future 3-tier database.yml configuration setup as well. So it would be:

ActiveRecord::Base.connected_to(role: :reading) do
  Dog.create!
  Book.create! # Will raise if a :reading role isn't found on Book
end

@tenderlove I'd be curious to see how many instances of connection switching you have in the code? I was initially partial to having some syntatic sugar, but I don't think switching roles mid-flight is going to be a super common action. And if it isn't, then I'd rather be as clear as possible about what's going on.

On the larger topic of r/w splitting, @eileencodes, you're working towards a place where AR automatically will pick the :writing role when AR is doing INSERTs and :reading role when AR is doing SELECTs, right? I thought there was some confusion about whether that's within this initial scope of work when discussing with @matthewd in the earlier thread.

eileencodes · 2018-10-02T22:20:41Z

If you're connecting directly to a specific database, you shouldn't have to declare the role

👍 I will work on changing this requirement. Currently I have it so it creates a new handler (or role) to organize the connections. But that's not actually necessary now that I think about it so I'll adjust this PR accordingly.

Re: handler, I don't really like that word much

I can change this. For background handler makes sense to me since it is switching on the connection_handler - but perhaps that's too much for the user to need to know.

On the larger topic of r/w splitting, @eileencodes, you're working towards a place where AR automatically will pick the :writing role when AR is doing INSERTs and :reading role when AR is doing SELECTs, right

Yes but this is further down the line (ie not for this PR). Rails needs to be able to switch connections before it can know what to switch to.

I'd be curious to see how many instances of connection switching you have in the code?

We actually do this quite a bit since we default to the replicas, expect in certain circumstances where we need to explicitly call readonly.

296 block calls to readonly db in app and lib
3 block calls to write db in lib
6 block calls to a dynamic switcher that chooses the db in app and lib.

tenderlove · 2018-10-02T22:22:29Z

I'd be curious to see how many instances of connection switching you have in the code?

More than I thought. I was counting and then @eileencodes finished before me. 😊

dhh · 2018-10-02T22:27:01Z

So your default replicas are not readonlys? If we get AR to do the automatic r/w splitting, would you still need as many explicit calls? Or would you only need it when using slow-read dbs?

deepj · 2018-10-03T10:52:45Z

A question: Can this switching be used as a failover? Let's say, a primary connection failed, switch to secondary (backup) one.

eileencodes · 2018-10-03T11:45:25Z

@dhh we default to read but switch on the request type (GET == read, POST == write) rather than the sql query. So in some cases we need to switch back to the read or to the write in order to handle that. I assume we will need less of those if we have Rails auto switch based on SQL rather than request type. I think that if we really do need the helper methods we can add those later.

@deepj No. We're quite a bit aways from something like that.

rafaelfranca · 2018-10-05T22:52:16Z

That's exactly what I want to support because that's how we do it at GitHub. We have 10 connections that belong to a writing handler (we call it default) and 10 connections that belong to a reading handler (we call it readonly). I'm a little confused because I thought you said at Shopify you use the multiple handler approach as well? The underlying connection swapping behavior here isn't different from the original PR, just the public API is.

Yes, we use the multiple handlers, but it seems the implementation store the name of the handlers in ActiveRecord::Base so it means two models can't have the same handler name because the later will override the former definition even if you define in two different models. Am I getting the implementation wrong?

Would not:

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.create!
  Book.create!
end

fail because Dog will be connected to the same database than Book (things, not animals given Book was define after Dog)?

rafaelfranca · 2018-10-05T23:17:34Z

Ok I think I get it. The handler is the same for all models but it holds a connection pool for each model with a different connection_specification_name.

I agree with DHH's suggestions for the API.

👍 from me.

eileencodes · 2018-10-05T23:39:54Z

Ok I think I get it. The handler is the same for all models but it holds a connection pool for each model with a different connection_specification_name.

Yup! That's exactly how it works.

I'm writing up some tests and will be pushing up later this weekend or early next week. I think we're almost ready to merge this (with DHH's changes). That will unblock a lot of the future work. 😄

Also @matthewd originally had some concerns about threads but we paired today and found it's not a problem. The connection handler is thread local so we're good there 👍

eileencodes · 2018-10-10T13:36:33Z

Changed handler to mode
Changed connected_to to take a single database instead of a hash since that's for connecting to a specific database / role.
Raise if both mode and handler are nil in connected_to
Added tests
Added docs (note I don't think we're ready to update the guides yet so I left those. I'd rather be able to tell a cohesive story at the end)

dhh · 2018-10-10T15:23:02Z

I think mode is an improvement over handler, but I'm not sure it's quite enough. If you think about modes when opening files, it's the mode with which you open the same file that's designated. Not a different file. So the extension here is that mode would refer to different ways of connecting to the same database, not to different databases.

That's why I like role. That implies that there are multiple actors (databases) that play a specific role in the application. And role is also flexible enough to work with things like :statistics or :analytics where mode would be awkward.

eileencodes · 2018-10-10T15:24:53Z

Wow. I swear your post said mode and now I looked back and it says role. I will change it. Not sure how my brain confused those two... 😳

eileencodes · 2018-10-10T15:35:33Z

mode switched to role 👍

This PR adds the ability to 1) connect to multiple databases in a model, and 2) switch between those connections using a block. To connect a model to a set of databases for writing and reading use the following API. This API supercedes `establish_connection`. The `writing` and `reading` keys represent handler / role names and `animals` and `animals_replica` represents the database key to look up the configuration hash from. ``` class AnimalsBase < ApplicationRecord connects_to database: { writing: :animals, reading: :animals_replica } end ``` Inside the application - outside the model declaration - we can switch connections with a block call to `connected_to`. If we want to connect to a db that isn't default (ie readonly_slow) we can connect like this: Outside the model we may want to connect to a new database (one that is not in the default writing/reading set) - for example a slow replica for making slow queries. To do this we have the `connected_to` method that takes a `database` hash that matches the signature of `connects_to`. The `connected_to` method also takes a block. ``` AcitveRecord::Base.connected_to(database: { slow_readonly: :primary_replica_slow }) do ModelInPrimary.do_something_thats_slow end ``` For models that are already loaded and connections that are already connected, `connected_to` doesn't need to pass in a `database` because you may want to run queries against multiple databases using a specific role/handler. In this case `connected_to` can take a `role` and use that to swap on the connection passed. This simplies queries - and matches how we do it in GitHub. Once you're connected to the database you don't need to re-connect, we assume the connection is in the pool and simply pass the handler we'd like to swap on. ``` ActiveRecord::Base.connected_to(role: :reading) do Dog.read_something_from_dog ModelInPrimary.do_something_from_model_in_primary end ```

…ishes connection Related to rails#34052

Since both methods are public API I think it makes sense to add these tests in order to prevent any regression in the behavior of those methods after the 6.0 release. Exercise `connected_to` - Ensure that the method raises with both `database` and `role` arguments - Ensure that the method raises without `database` and `role` Exercise `connects_to` - Ensure that the method returns an array of established connections(as mentioned in the docs of the method) Related to rails#34052

salimane · 2019-05-07T15:14:18Z

It seems that joins are not being correctly handled across databases

development:
  primary:
    <<: *default
    <<: *primary_account
    database: primary_db
  another_db:
    <<: *default
    <<: *another_db_account
    database: another_db_with_another_schema
    migrations_paths: "db/another_db_migrate"


class Person < ApplicationRecord
  has_one :animal,
          class_name: 'Animal',
          foreign_key: 'person_id',
          dependent: :nullify, inverse_of: :person
end

class Animal < ApplicationRecord
  self.abstract_class = true
  connects_to database: { reading: :another_db, writing: :another_db }
  belongs_to :person, foreign_key: 'person_id', inverse_of: :animal
end

Person.joins(:animal).to_sql
=> "SELECT `persons`.* FROM `persons` INNER JOIN `animals` ON `animals`.`person_id` = `persons`.`person_id`"

animals should be using the another_db database not the primary database

or am I missing something @eileencodes ?

eileencodes · 2019-05-07T15:17:13Z

No you're not missing something, Rails does not yet handle joining across separate databases. We're working on supporting the ability for Rails to recognize the connections are different and to split up the queries into 2 selects but the join syntax isn't going to be possible across 2 machines.

salimane · 2019-05-07T20:14:27Z

Can I suggest something?

During joins and related, check if the 2 classes share the same host:

if yes, add the database if DB names are different.
if no, raise an error suggesting that the joins is impossible. We could even raise exceptions earlier every time we see has_one and family with classes that have different connections with different hosts. So that by the time, we are in joins we definitely know the 2 connections share the same host, we just need to check if we should prefix database names or not.

what do you think @eileencodes ?

voordev · 2019-05-21T10:16:00Z

Perhaps a stupid question but does this enable:

Multitennant setup where organisation entity loads its own database
Organisation.organisation_customers.all would return only records from current organisation database?
This would solve the " select users from organisation where organisation_id = x " queries and enable " select users from organisation "?

Is there a way to add and or create databases at runtime? From what i understand now you have to append db config to databasy.yml

I could not find a conclusive post on this.
<3 from Amsterdam to rails core team and all contributors to make this happen.

voordev · 2019-05-31T23:55:28Z

Nobody?

eileencodes · 2019-06-03T20:22:09Z

No questions are stupid @voordev, however I'm not 100% I understand what you're asking. I've started working on documentation here #36389 that will hopefully answer some of your questions about how Rails works.

voordev · 2019-06-07T16:33:46Z

@eileencodes

I mean does rails 6 support multi tennant. Entity universities with 1 database per entity. or can we better use other not to be named solutions?

eileencodes · 2019-06-07T16:36:19Z

As the docs note, no, not yet, Rails doesn't support sharding.

vitobotta · 2019-06-19T18:21:56Z

Hi! May I ask what is the current status? I would like to be able to safely split reads/writes between master and slaves in a MySQL replication, automatically. Is this possible yet? Thanks in advance.

Part 4: Multi db improvements, Basic API for connection switching rails/rails#34052

mujz · 2019-08-09T11:11:20Z

Awesome work!
There's a typo in the PR's description. hander should be handler.

concept47 · 2019-08-11T18:21:13Z

I had one question. Is it possible to specify a series of replicas in database.yml for reading so that when you do something like this

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.read_something_from_dog
  ModelInPrimary.do_something_from_model_in_primary
end

it can go to one of the different dbs you have specified in that group ... vs just one specific db?

shekhar098 · 2019-11-11T06:48:29Z

ActiveRecord::Base.connected_to(database: :key_logs) do . I want manual db switching. connected_to method is looking for adaptor details in database.yml.

I have multiple database with same details. I don't want multiple schema . Rails 6 is working on schema switching?

deivinsontejeda · 2022-07-01T14:46:45Z

Hi @eileencodes,

I'm working on this features. let me share quickly my use case...

We are working on a multi-tenant, but using different physically DB (each customer a DB).
The DB is create when a new customer is registered. (We have some process which will create the DB)
When DB was created, it does not exist in connects_to. Here is my question, we can run ApplicationRecord.connects_to once app is booted in order to register connection of new DB?

Leave some code to help you see my current scenario...

database.yml

test:
  shard:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_SHARD_TEST') %>
    migrations_paths: db/shard_migrate

  shard_replica:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_SHARD_TEST') %>
    replica: true

  global:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_GLOBAL_TEST') %>
    migrations_paths: db/global_migrate

  global_replica:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_GLOBAL_TEST') %>
    replica: true

Then, ApplicationRecord

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  # We have some code to read current config and return the Hash with shards available in database.yml
  connects_to shards: { global: { writing: :global, reading: :global_replica }, shard: { writing: :shard, reading: :shard_replica } }
  scope :active, -> { where(active: true) }
end

Until here no mayor issues, I can switch connections and everything looks ok...

But now, imagine I created a new DB connection which is not part of database.yml (I have some internal process to read them).

Again my question, I would like to reload ApplicationRecord.connects_to in order to register that new connection...

ApplicationRecord.connects_to shards: { global: { writing: :global, reading: :global_replica }, shard: { writing: :shard, reading: :shard_replica }, customer1: { writting: :customer1, reading: :customer1_replica} }

The customer1 is dynamically fetched/added. In my local tests when I do that, I got ActiveRecord::ConnectionNotEstablished.

ActiveRecord::Base.connected_to(shard: :company1) do
  Model.find(id)
end

Any thoughts?

I appreciate your feedback 👍

eileencodes · 2022-07-01T15:19:23Z

Again my question, I would like to reload ApplicationRecord.connects_to in order to register that new connection...

No this isn't allowed - doing this would clobber all existing connections to add the new one and potentially in the middle of a request. That's super dangerous so even if there was a workaround I wouldn't recommend it. There's also other potential issues with doing a setup like this and it's safest to reload the application when messing with database connections.

The recommended way of doing this (or at least the way we discussed at my prior job of working around this) is to pre-setup your shards. For example, say you currently have 4 customers. Instead of only setting up 4 shards for each existing customer you'd set up 100 shards and add the connections for those. Then in your global router table all you would need to do is insert a new record for the new tenant when they sign up. The connections will be active but inaccessible until the customer is added to the DB.

The other way is add a new config and new connection and then deploy the app each time you need to add a new customer.

In the future it's better to open a new issue or ask questions on the forum. I occasionally unsubscribe from old PRs and it also hides these questions from future readers looking for the same answer

eileencodes added the activerecord label Oct 2, 2018

eileencodes added this to the 6.0.0 milestone Oct 2, 2018

eileencodes mentioned this pull request Oct 2, 2018

WIP: Add the ability to swap connections in Active Record #33877

Closed

3 tasks

eileencodes changed the title ~~Basic API for connection switching~~ WIP: Basic API for connection switching Oct 2, 2018

tenderlove reviewed Oct 2, 2018

View reviewed changes

eileencodes force-pushed the connection-switching branch 3 times, most recently from 291b558 to 008a3e6 Compare October 10, 2018 12:51

eileencodes changed the title ~~WIP: Basic API for connection switching~~ Part 4: Multi db improvements, Basic API for connection switching Oct 10, 2018

eileencodes force-pushed the connection-switching branch from 008a3e6 to 7a609db Compare October 10, 2018 13:37

eileencodes force-pushed the connection-switching branch from 7a609db to 72f7bb9 Compare October 10, 2018 15:30

eileencodes force-pushed the connection-switching branch from 72f7bb9 to 31021a8 Compare October 10, 2018 16:13

benlangfeld mentioned this pull request Oct 25, 2018

Database config lookup doesn't work powerhome/rails-components-multi-database#1

Open

bf4 mentioned this pull request Oct 29, 2018

Proposed Feature: Add DatabaseTasks#connection_class to simplify multiple database apps #31719

Closed

bogdanvlviv added a commit to bogdanvlviv/rails that referenced this pull request Nov 12, 2018

Ensure that ActiveRecord::Base#connected_to with :database establ…

d6ba146

…ishes connection Related to rails#34052

bogdanvlviv mentioned this pull request Nov 12, 2018

Ensure that ActiveRecord::Base#connected_to with :database establishes connection #34429

Merged

bogdanvlviv mentioned this pull request Nov 14, 2018

Exercise connected_to and connects_to methods #34453

Merged

jhawthorn mentioned this pull request Dec 11, 2018

Add ActiveRecord::Base.connected_to? #34680

Merged

juice mentioned this pull request Feb 18, 2019

Rails test are failing on fixture loading rails-sqlserver/activerecord-sqlserver-adapter#672

Closed

sidot3291 mentioned this pull request Apr 4, 2019

6.0.0.beta2: .connected_to() block throws "No connection pool with 'primary' found." #35800

Closed

mainameiz mentioned this pull request Apr 5, 2019

Multiple databases teespring-labs/active_record_replica#16

Closed

hareku mentioned this pull request May 13, 2019

Feature Request: Connection switching for multiple databases go-gorm/gorm#2456

Closed

suketa added a commit to suketa/rails_sandbox that referenced this pull request Jun 29, 2019

try046

532b0c7

Part 4: Multi db improvements, Basic API for connection switching rails/rails#34052

thedrummeraki mentioned this pull request Sep 16, 2019

Implement multiple database support YourAnime-moe/youranime.moe#75

Merged

bquorning mentioned this pull request Dec 9, 2019

Rails 6.0 compatibility zendesk/active_record_shards#225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part 4: Multi db improvements, Basic API for connection switching #34052

Part 4: Multi db improvements, Basic API for connection switching #34052

eileencodes commented Oct 2, 2018

tenderlove Oct 2, 2018

tenderlove commented Oct 2, 2018

rafaelfranca commented Oct 2, 2018

eileencodes commented Oct 2, 2018

tenderlove commented Oct 2, 2018

dhh commented Oct 2, 2018

eileencodes commented Oct 2, 2018

tenderlove commented Oct 2, 2018

dhh commented Oct 2, 2018

deepj commented Oct 3, 2018

eileencodes commented Oct 3, 2018

rafaelfranca commented Oct 5, 2018

rafaelfranca commented Oct 5, 2018

eileencodes commented Oct 5, 2018

eileencodes commented Oct 10, 2018

dhh commented Oct 10, 2018

eileencodes commented Oct 10, 2018 •

edited

eileencodes commented Oct 10, 2018

salimane commented May 7, 2019 •

edited

eileencodes commented May 7, 2019

salimane commented May 7, 2019 •

edited

voordev commented May 21, 2019 •

edited

voordev commented May 31, 2019

eileencodes commented Jun 3, 2019

voordev commented Jun 7, 2019

eileencodes commented Jun 7, 2019

vitobotta commented Jun 19, 2019

mujz commented Aug 9, 2019

concept47 commented Aug 11, 2019

shekhar098 commented Nov 11, 2019

deivinsontejeda commented Jul 1, 2022

eileencodes commented Jul 1, 2022

Part 4: Multi db improvements, Basic API for connection switching #34052

Part 4: Multi db improvements, Basic API for connection switching #34052

Conversation

eileencodes commented Oct 2, 2018

tenderlove Oct 2, 2018

Choose a reason for hiding this comment

tenderlove commented Oct 2, 2018

rafaelfranca commented Oct 2, 2018

eileencodes commented Oct 2, 2018

tenderlove commented Oct 2, 2018

dhh commented Oct 2, 2018

eileencodes commented Oct 2, 2018

tenderlove commented Oct 2, 2018

dhh commented Oct 2, 2018

deepj commented Oct 3, 2018

eileencodes commented Oct 3, 2018

rafaelfranca commented Oct 5, 2018

rafaelfranca commented Oct 5, 2018

eileencodes commented Oct 5, 2018

eileencodes commented Oct 10, 2018

dhh commented Oct 10, 2018

eileencodes commented Oct 10, 2018 • edited

eileencodes commented Oct 10, 2018

salimane commented May 7, 2019 • edited

eileencodes commented May 7, 2019

salimane commented May 7, 2019 • edited

voordev commented May 21, 2019 • edited

voordev commented May 31, 2019

eileencodes commented Jun 3, 2019

voordev commented Jun 7, 2019

eileencodes commented Jun 7, 2019

vitobotta commented Jun 19, 2019

mujz commented Aug 9, 2019

concept47 commented Aug 11, 2019

shekhar098 commented Nov 11, 2019

deivinsontejeda commented Jul 1, 2022

eileencodes commented Jul 1, 2022

eileencodes commented Oct 10, 2018 •

edited

salimane commented May 7, 2019 •

edited

salimane commented May 7, 2019 •

edited

voordev commented May 21, 2019 •

edited