Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 4: Multi db improvements, Basic API for connection switching #34052

Merged
merged 1 commit into from Oct 10, 2018

Conversation

eileencodes
Copy link
Member

This PR implements the basic API requirements laid out in #33877 by DHH. The PR aims to focus only on implementing the connects_to and connected_to API. For now it does not tackle any configuration changes (we can hash that out in future PRs). If this API is acceptable I will add tests.

cc/ @dhh @matthewd @rafaelfranca @tenderlove


This PR adds the ability to 1) connect to multiple databases in a model,
and 2) switch between those connections using a block.

To connect a model to a set of databases for writing and reading use
the following API. This API supersedes establish_connection. The
writing and reading keys represent handler / mode names and
animals and animals_replica represents the database key to look up
the configuration hash from.

class AnimalsBase < ApplicationRecord
  connects_to database: { writing: :animals, reading: :animals_replica }
end

Inside the application - outside the model declaration - we can switch
connections with a block call to connected_to.

If we want to connect to a db that isn't default (ie readonly_slow) we
can connect like this:

Outside the model we may want to connect to a new database (one that is
not in the default writing/reading set) - for example a slow replica for
making slow queries. To do this we have the connected_to method that
takes a database hash that matches the signature of connects_to. The
connected_to method also takes a block.

ModelInPrimary.connected_to(database: { slow_readonly: :primary_replica_slow }) do
  ModelInPrimary.do_something_thats_slow
end

For models that are already loaded and connections that are already
connected, connected_to doesn't need to pass in a database because
you may want to run queries against multiple databases using a specific
mode/handler.

In this case connected_to can take a handler and use that to swap on
the connection passed. This simplies queries - and matches how we do it
in GitHub. Once you're connected to the database you don't need to
re-connect, we assume the connection is in the pool and simply pass the
handler we'd like to swap on.

ActiveRecord::Base.connected_to(hander: :reading) do
  Dog.read_something_from_dog
  ModelInPrimary.do_something_from_model_in_primary
end

@eileencodes eileencodes added this to the 6.0.0 milestone Oct 2, 2018
@eileencodes eileencodes changed the title Basic API for connection switching WIP: Basic API for connection switching Oct 2, 2018
def connected_to(database: nil, handler: nil, &blk)
if database && handler
raise ArgumentError, "connected_to can only accept handler or database, but not both arguments."
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise an exception if both are nil?

@tenderlove
Copy link
Member

I like that we can use ActiveRecord::Base.connected_to(handler: :writing) { ... } to switch handlers, but it seems really long to write. I'm guessing that most folks will just have read/write replicas and probably name them "read" and "write" (that's what we do). Could we also introduce something like:

ActiveRecord::Base.for_writing { ... } that's just a synonym for ActiveRecord::Base.connected_to(handler: :writing) { ... }? If we follow a naming convention, it seems like we could shorten the code we have to write.

@rafaelfranca
Copy link
Member

What would happen if two models have the same handler name for two different databases? Should we support that? Say:

class Dog
  connects_to database: { writing: :animals, reading: :animals_replica }
end

class Book
  connects_to database: { writing: :things, reading: :things_replica }
end

Also say we have that scenario, how would the following code work?

Dog.connected_to(hander: :reading) do
  Dog.create!
  Book.create!
end

If I got the implementation correctly Book would use the same connection handler as Dog because all models share the same connection handler in ActiveRecord::Base.

@eileencodes
Copy link
Member Author

What would happen if two models have the same handler name for two different databases? Should we support that? Say:

That's exactly what I want to support because that's how we do it at GitHub. We have 10 connections that belong to a writing handler (we call it default) and 10 connections that belong to a reading handler (we call it readonly). I'm a little confused because I thought you said at Shopify you use the multiple handler approach as well? The underlying connection swapping behavior here isn't different from the original PR, just the public API is.

In GitHub we'd don't write this:

Dog.connected_to(handler: :reading) do
  Dog.create! # explode from Dog bc doing a write on a read
  Book.create! # isn't called but not because it's Dog's handler, you told Rails what handler to use - `reading`. 
end

Instead we write this (but with GitHub instead of Ar Base bc Rails doesn't support this yet)

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.create!
  Book.create!
end

If we want to write to multiple dbs we can do that by using the writing handler:

ActiveRecord::Base.connected_to(handler: :writing) do
  Dog.create! # success
  Book.create! # success
end

Dog and Book know which database they belong to because the model tells them. The connected_to method looks up the handler, and then Rails looks up the connection from that handler with the connection specification name.

@tenderlove
Copy link
Member

@rafaelfranca I think we should support your first scenario, but if you want to switch both models you need to do it at AR::Base as @eileencodes mentions. Maybe we should raise an exception if you don't call it on AR::Base?

@dhh
Copy link
Member

dhh commented Oct 2, 2018

I like this. A few notes:

If you're connecting directly to a specific database, you shouldn't have to declare the role:

ModelInPrimary.connected_to(database: :primary_replica_slow) do
  ModelInPrimary.do_something_thats_slow
end

Re: handler, I don't really like that word much. I'd prefer to use "role". That would connect with the future 3-tier database.yml configuration setup as well. So it would be:

ActiveRecord::Base.connected_to(role: :reading) do
  Dog.create!
  Book.create! # Will raise if a :reading role isn't found on Book
end

@tenderlove I'd be curious to see how many instances of connection switching you have in the code? I was initially partial to having some syntatic sugar, but I don't think switching roles mid-flight is going to be a super common action. And if it isn't, then I'd rather be as clear as possible about what's going on.

On the larger topic of r/w splitting, @eileencodes, you're working towards a place where AR automatically will pick the :writing role when AR is doing INSERTs and :reading role when AR is doing SELECTs, right? I thought there was some confusion about whether that's within this initial scope of work when discussing with @matthewd in the earlier thread.

@eileencodes
Copy link
Member Author

If you're connecting directly to a specific database, you shouldn't have to declare the role

👍 I will work on changing this requirement. Currently I have it so it creates a new handler (or role) to organize the connections. But that's not actually necessary now that I think about it so I'll adjust this PR accordingly.

Re: handler, I don't really like that word much

I can change this. For background handler makes sense to me since it is switching on the connection_handler - but perhaps that's too much for the user to need to know.

On the larger topic of r/w splitting, @eileencodes, you're working towards a place where AR automatically will pick the :writing role when AR is doing INSERTs and :reading role when AR is doing SELECTs, right

Yes but this is further down the line (ie not for this PR). Rails needs to be able to switch connections before it can know what to switch to.

I'd be curious to see how many instances of connection switching you have in the code?

We actually do this quite a bit since we default to the replicas, expect in certain circumstances where we need to explicitly call readonly.

  • 296 block calls to readonly db in app and lib
  • 3 block calls to write db in lib
  • 6 block calls to a dynamic switcher that chooses the db in app and lib.

@tenderlove
Copy link
Member

I'd be curious to see how many instances of connection switching you have in the code?

More than I thought. I was counting and then @eileencodes finished before me. 😊

@dhh
Copy link
Member

dhh commented Oct 2, 2018

So your default replicas are not readonlys? If we get AR to do the automatic r/w splitting, would you still need as many explicit calls? Or would you only need it when using slow-read dbs?

@deepj
Copy link
Contributor

deepj commented Oct 3, 2018

A question: Can this switching be used as a failover? Let's say, a primary connection failed, switch to secondary (backup) one.

@eileencodes
Copy link
Member Author

@dhh we default to read but switch on the request type (GET == read, POST == write) rather than the sql query. So in some cases we need to switch back to the read or to the write in order to handle that. I assume we will need less of those if we have Rails auto switch based on SQL rather than request type. I think that if we really do need the helper methods we can add those later.

@deepj No. We're quite a bit aways from something like that.

@rafaelfranca
Copy link
Member

That's exactly what I want to support because that's how we do it at GitHub. We have 10 connections that belong to a writing handler (we call it default) and 10 connections that belong to a reading handler (we call it readonly). I'm a little confused because I thought you said at Shopify you use the multiple handler approach as well? The underlying connection swapping behavior here isn't different from the original PR, just the public API is.

Yes, we use the multiple handlers, but it seems the implementation store the name of the handlers in ActiveRecord::Base so it means two models can't have the same handler name because the later will override the former definition even if you define in two different models. Am I getting the implementation wrong?

Would not:

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.create!
  Book.create!
end

fail because Dog will be connected to the same database than Book (things, not animals given Book was define after Dog)?

@rafaelfranca
Copy link
Member

Ok I think I get it. The handler is the same for all models but it holds a connection pool for each model with a different connection_specification_name.

I agree with DHH's suggestions for the API.

👍 from me.

@eileencodes
Copy link
Member Author

Ok I think I get it. The handler is the same for all models but it holds a connection pool for each model with a different connection_specification_name.

Yup! That's exactly how it works.

I'm writing up some tests and will be pushing up later this weekend or early next week. I think we're almost ready to merge this (with DHH's changes). That will unblock a lot of the future work. 😄

Also @matthewd originally had some concerns about threads but we paired today and found it's not a problem. The connection handler is thread local so we're good there 👍

@eileencodes eileencodes force-pushed the connection-switching branch 3 times, most recently from 291b558 to 008a3e6 Compare October 10, 2018 12:51
@eileencodes eileencodes changed the title WIP: Basic API for connection switching Part 4: Multi db improvements, Basic API for connection switching Oct 10, 2018
@eileencodes
Copy link
Member Author

  • Changed handler to mode
  • Changed connected_to to take a single database instead of a hash since that's for connecting to a specific database / role.
  • Raise if both mode and handler are nil in connected_to
  • Added tests
  • Added docs (note I don't think we're ready to update the guides yet so I left those. I'd rather be able to tell a cohesive story at the end)

@dhh
Copy link
Member

dhh commented Oct 10, 2018

I think mode is an improvement over handler, but I'm not sure it's quite enough. If you think about modes when opening files, it's the mode with which you open the same file that's designated. Not a different file. So the extension here is that mode would refer to different ways of connecting to the same database, not to different databases.

That's why I like role. That implies that there are multiple actors (databases) that play a specific role in the application. And role is also flexible enough to work with things like :statistics or :analytics where mode would be awkward.

@eileencodes
Copy link
Member Author

eileencodes commented Oct 10, 2018

Wow. I swear your post said mode and now I looked back and it says role. I will change it. Not sure how my brain confused those two... 😳

@eileencodes
Copy link
Member Author

mode switched to role 👍

This PR adds the ability to 1) connect to multiple databases in a model,
and 2) switch between those connections using a block.

To connect a model to a set of databases for writing and reading use
the following API. This API supercedes `establish_connection`. The
`writing` and `reading` keys represent handler / role names and
`animals` and `animals_replica` represents the database key to look up
the configuration hash from.

```
class AnimalsBase < ApplicationRecord
  connects_to database: { writing: :animals, reading: :animals_replica }
end
```

Inside the application - outside the model declaration - we can switch
connections with a block call to `connected_to`.

If we want to connect to a db that isn't default (ie readonly_slow) we
can connect like this:

Outside the model we may want to connect to a new database (one that is
not in the default writing/reading set) - for example a slow replica for
making slow queries. To do this we have the `connected_to` method that
takes a `database` hash that matches the signature of `connects_to`. The
`connected_to` method also takes a block.

```
AcitveRecord::Base.connected_to(database: { slow_readonly: :primary_replica_slow }) do
  ModelInPrimary.do_something_thats_slow
end
```

For models that are already loaded and connections that are already
connected, `connected_to` doesn't need to pass in a `database` because
you may want to run queries against multiple databases using a specific
role/handler.

In this case `connected_to` can take a `role` and use that to swap on
the connection passed. This simplies queries - and matches how we do it
in GitHub. Once you're connected to the database you don't need to
re-connect, we assume the connection is in the pool and simply pass the
handler we'd like to swap on.

```
ActiveRecord::Base.connected_to(role: :reading) do
  Dog.read_something_from_dog
  ModelInPrimary.do_something_from_model_in_primary
end
```
bogdanvlviv added a commit to bogdanvlviv/rails that referenced this pull request Nov 12, 2018
bogdanvlviv added a commit to bogdanvlviv/rails that referenced this pull request Nov 15, 2018
Since both methods are public API I think it makes sense to add these tests
in order to prevent any regression in the behavior of those methods after the 6.0 release.

Exercise `connected_to`
  - Ensure that the method raises with both `database` and `role` arguments
  - Ensure that the method raises without `database` and `role`

Exercise `connects_to`
  - Ensure that the method returns an array of established connections(as mentioned
    in the docs of the method)

Related to rails#34052
@salimane
Copy link

salimane commented May 7, 2019

It seems that joins are not being correctly handled across databases

development:
  primary:
    <<: *default
    <<: *primary_account
    database: primary_db
  another_db:
    <<: *default
    <<: *another_db_account
    database: another_db_with_another_schema
    migrations_paths: "db/another_db_migrate"

class Person < ApplicationRecord
  has_one :animal,
          class_name: 'Animal',
          foreign_key: 'person_id',
          dependent: :nullify, inverse_of: :person
end

class Animal < ApplicationRecord
  self.abstract_class = true
  connects_to database: { reading: :another_db, writing: :another_db }
  belongs_to :person, foreign_key: 'person_id', inverse_of: :animal
end

Person.joins(:animal).to_sql
=> "SELECT `persons`.* FROM `persons` INNER JOIN `animals` ON `animals`.`person_id` = `persons`.`person_id`"

animals should be using the another_db database not the primary database

or am I missing something @eileencodes ?

@eileencodes
Copy link
Member Author

No you're not missing something, Rails does not yet handle joining across separate databases. We're working on supporting the ability for Rails to recognize the connections are different and to split up the queries into 2 selects but the join syntax isn't going to be possible across 2 machines.

@salimane
Copy link

salimane commented May 7, 2019

Can I suggest something?

During joins and related, check if the 2 classes share the same host:

  • if yes, add the database if DB names are different.
  • if no, raise an error suggesting that the joins is impossible. We could even raise exceptions earlier every time we see has_one and family with classes that have different connections with different hosts. So that by the time, we are in joins we definitely know the 2 connections share the same host, we just need to check if we should prefix database names or not.

what do you think @eileencodes ?

@voordev
Copy link

voordev commented May 21, 2019

Perhaps a stupid question but does this enable:

  1. Multitennant setup where organisation entity loads its own database
  2. Organisation.organisation_customers.all would return only records from current organisation database?
  3. This would solve the " select users from organisation where organisation_id = x " queries and enable " select users from organisation "?

Is there a way to add and or create databases at runtime? From what i understand now you have to append db config to databasy.yml

I could not find a conclusive post on this.
<3 from Amsterdam to rails core team and all contributors to make this happen.

@voordev
Copy link

voordev commented May 31, 2019

Nobody?

@eileencodes
Copy link
Member Author

No questions are stupid @voordev, however I'm not 100% I understand what you're asking. I've started working on documentation here #36389 that will hopefully answer some of your questions about how Rails works.

@voordev
Copy link

voordev commented Jun 7, 2019

@eileencodes

I mean does rails 6 support multi tennant. Entity universities with 1 database per entity. or can we better use other not to be named solutions?

@eileencodes
Copy link
Member Author

As the docs note, no, not yet, Rails doesn't support sharding.

@vitobotta
Copy link

Hi! May I ask what is the current status? I would like to be able to safely split reads/writes between master and slaves in a MySQL replication, automatically. Is this possible yet? Thanks in advance.

suketa added a commit to suketa/rails_sandbox that referenced this pull request Jun 29, 2019
Part 4: Multi db improvements, Basic API for connection switching
rails/rails#34052
@mujz
Copy link

mujz commented Aug 9, 2019

Awesome work!
There's a typo in the PR's description. hander should be handler.

@concept47
Copy link

I had one question. Is it possible to specify a series of replicas in database.yml for reading so that when you do something like this

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.read_something_from_dog
  ModelInPrimary.do_something_from_model_in_primary
end

it can go to one of the different dbs you have specified in that group ... vs just one specific db?

@shekhar098
Copy link

ActiveRecord::Base.connected_to(database: :key_logs) do . I want manual db switching. connected_to method is looking for adaptor details in database.yml.

I have multiple database with same details. I don't want multiple schema . Rails 6 is working on schema switching?

@deivinsontejeda
Copy link

Hi @eileencodes,

I'm working on this features. let me share quickly my use case...

  • We are working on a multi-tenant, but using different physically DB (each customer a DB).
  • The DB is create when a new customer is registered. (We have some process which will create the DB)
  • When DB was created, it does not exist in connects_to. Here is my question, we can run ApplicationRecord.connects_to once app is booted in order to register connection of new DB?

Leave some code to help you see my current scenario...

database.yml

test:
  shard:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_SHARD_TEST') %>
    migrations_paths: db/shard_migrate

  shard_replica:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_SHARD_TEST') %>
    replica: true

  global:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_GLOBAL_TEST') %>
    migrations_paths: db/global_migrate

  global_replica:
    <<: *default
    url: <%= ENV.fetch('DATABASE_URL_GLOBAL_TEST') %>
    replica: true

Then, ApplicationRecord

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  # We have some code to read current config and return the Hash with shards available in database.yml
  connects_to shards: { global: { writing: :global, reading: :global_replica }, shard: { writing: :shard, reading: :shard_replica } }
  scope :active, -> { where(active: true) }
end

Until here no mayor issues, I can switch connections and everything looks ok...

But now, imagine I created a new DB connection which is not part of database.yml (I have some internal process to read them).

Again my question, I would like to reload ApplicationRecord.connects_to in order to register that new connection...

ApplicationRecord.connects_to shards: { global: { writing: :global, reading: :global_replica }, shard: { writing: :shard, reading: :shard_replica }, customer1: { writting: :customer1, reading: :customer1_replica} }

The customer1 is dynamically fetched/added. In my local tests when I do that, I got ActiveRecord::ConnectionNotEstablished.

ActiveRecord::Base.connected_to(shard: :company1) do
  Model.find(id)
end

Any thoughts?

I appreciate your feedback 👍

@eileencodes
Copy link
Member Author

Again my question, I would like to reload ApplicationRecord.connects_to in order to register that new connection...

No this isn't allowed - doing this would clobber all existing connections to add the new one and potentially in the middle of a request. That's super dangerous so even if there was a workaround I wouldn't recommend it. There's also other potential issues with doing a setup like this and it's safest to reload the application when messing with database connections.

The recommended way of doing this (or at least the way we discussed at my prior job of working around this) is to pre-setup your shards. For example, say you currently have 4 customers. Instead of only setting up 4 shards for each existing customer you'd set up 100 shards and add the connections for those. Then in your global router table all you would need to do is insert a new record for the new tenant when they sign up. The connections will be active but inaccessible until the customer is added to the DB.

The other way is add a new config and new connection and then deploy the app each time you need to add a new customer.


In the future it's better to open a new issue or ask questions on the forum. I occasionally unsubscribe from old PRs and it also hides these questions from future readers looking for the same answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet