New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection Pool should release internal DB resources #14037
Comments
Can you provide more references than just this SO response? I can't seem to find anything in the PG docs that says we need to do this. It seems like switching search paths on each request is something you need to take responsibility for, e.g. if you're going to switch the schema search path, then you should be responsible for running the |
Okay. I'm not too good at convincing, but I'll give it a shot. This is the explanation from one of the DBAs on why the connection should be reset when returning to the pool (as used by PgBouncer):
Then if we'll look at the PG Change log, for v 8.3 we'll see:
Note that later the
I'm not an expert in PG, but that's what my quick research yields. Would be good to get an opinion from an expert here. But any connection pooling solutions I can see use the It's not a big deal for me personally to manage the search path manually, but I feel obligated to bring this up because if the search path is "leaking" then there's a chance of other data leaking too. |
@dnagir @tenderlove - the normal Postgres use case is that sessions can possibly originate from different users or locations and possibly use different schemas. In such cases, the connection pooler definitely needs to execute DISCARD ALL to clear all the previous session state data, prepared queries, locks, etc. prior to connecting the new session. The "normal" use case in Rails is that all authentication information, tables, timeout values, etc. will remain the same within the Rails app and if any of those entities change then the Rails app needs to reboot (and yes, there are ways to rollout your updated Rails app to avoid downtime, but that is another discussion.) So at first blush, it appears the Postgres connection adapter does not need to execute DISCARD ALL when disconnecting or resetting a connection. However, what happens in the case when an advisory lock has been set and not cleared because of an exception within your Rails code? Further, what about the use case where code in a Rails app may change the character set when establishing a Postgres connection via Active Record? In those cases, not executing DISCARD ALL when checking a connection back into the connection pool will leave the connection in an unwanted state. Looking at the code in Rails 4.0.4 Active Record, these scenarios are possible. Active Record's connection adapter does reconnect to the database when a connection is no longer active, which will reset the Postgres session data with settings from whatever Rails instance performs the reconnect, but when a particular connection is corrupt and not reset, subsequent connections are assigned to a possibly corrupt active connection, which can produce unwanted results. Preempting these cases requires not just executing DISCARD ALL, but insuring that parameters such as character set, timezone, etc. are reset each time Active Record grabs a connection from the pool. Currently, when the connection pool "expires" a connection, it merely sets the connection's status to "false" and then gives it out to the next connection requestor. So the Rails active record connection pool behaves like pgbouncer, the difference being pgbouncer both calls DISCARD ALL before marking an active connection as available, as well as resetting the session parameters each time a client connects or reconnects to the available connection. So I think the active record connection pool does need to add this behaviour to the Postgres adapter, to prevent problems and achieve reliability. Resetting the pg connection session parameters each time an idle active pg connection is allocated is not much of a performance hit and a small price to pay for increased reliability. In high throughput OLTP scenarios, Postgres developers will often not call DISCARD ALL nor reset the session if they are certain the state variables won't change; in such situations the overhead does add up. A typical Rails app is much less active and the connections are not reset very often. Thanks to @jberkus for explaining pgbouncer and Postgres connection pooling. #14392 adds DISCARD ALL to PostgreSQLAdapter#clear_cache! Still looking at code to add pg session params reset when connection pool assigns an unused active connection to requestor. |
DISCARD ALL has the same effect as: SET SESSION AUTHORIZATION DEFAULT; See http://www.postgresql.org/docs/8.3/static/sql-discard.html |
Dnagir, Alamoz: FWIW, I personally wouldn't expect Rails to run DISCARD ALL when returning connections to the pool, although it would be a nice option to have in addition to other things. Alamoz is correct when he says that most Rails applications don't SET session variables or use temporary tables anyway, so for the average Rails user this isn't really a concern. And if you know that your application doesn't create database state, then you don't want to be running DISCARD ALL all the time; it's not a "free" command, since it needs to check multiple memory structures even if it doesn't do anything. Now, Rails does use a lot of prepared queries these days. However, by convention these prepared queries are "anonymous", which means that we only really care about discarding them in order to save memory -- again, not a concern for most users. For advanced users with high performance requirements who are maybe using more advanced prepared query options, though, it would be good to have a discard option. However, for those users you don't need DISCARD ALL; you can just use DISCARD PLANS, which is cheaper. If you want an example of this, check the pgBouncer configuration file. There you can specify what command the pooler uses on session switch. If we're going to add something to Rails, I'd prefer something like that: a configurable option which would be no command by convention -- most likely in database.yml. |
@jberkus The Postgres connection adapter in Rails Active Record currently does included code to set Postgres character encoding, verbosity, schema search path, time zone and actually any other parameters passed to it from database.yml. I thought about enabling DISCARD ALL as a configurable option, but I'd prefer to put db specific settings outside of database.yml say in RAILS_ROOT/config/postgres.rb; this is in line with Rails convention and is more logical than placing all that stuff in database.yml (which is better for generic db config parameters.) /config/postgres.rb can contain a variable users can set to either 'DISCARD ALL' or 'DISCARD_PLAN', along with a switch indicating whether a DISCARD should be executed upon connection checkin; likewise for whether session variables should be reset. I'll look into adding this to Active Record. |
@jberkus One concern is whether any other internal state information is contained in a Postgres session, that may cause problems if not cleared? I've looked at the documentation and didn't notice anything. You are very likely able to answer this question. |
The Rails' ideology of safe by default would contradict to NOT doing There should be an option to opt out from The safe by default (rather unsafe) was the reason why mass assignment got "replaced" by strong_parameters. Similar thoughts should be applied to Also any medium to large Rails app's performance won't be affected by |
Alamoz, I'm hampered here by not knowing how built-in connection "pooling" in Rails works; I usually use pgbouncer together with Rails. If you have a continuous PostgreSQL session, users can do things which change the state of that session, including:
As I've pointed out, most Rails applications don't do the first two things because there's no way in the ORM to do them AFAIK. Dnagir points out correctly, though, that NOT doing DISCARD ALL by default is a security issue, although a weak one. Failure to deallocate prepared statements causes two issues: (1) gradual memory leakage due to growth of the prepared statements table, and (2) the possibility that a later application session would call an earlier session's prepared statement by mistake, and cause an error. Given how Rails uses prepared statements, though, the latter isn't particularly probable. If an application is relying on SET, though, there are much stronger possibilities for unexpected application errors. For example, the practice of partitioning by schema has become popular thanks to Instagram and Wanelo. This relies on having search_path set correctly for each user session, so sharing of one database connection by two application sessions could cause unexpected errors. Missing SET TIMEZONE is in some ways, worse, because it often won't be obvious that anything is wrong until the data is audited later. However, doing DISCARD ALL won't fix the applications which use SET with pooling; it will just make it obvious why they're broken. Temporary tables holds a real possibility of mischief, though, that using DISCARD ALL fixes, or at least makes the application error in a sensible way instead of a way that may be hard to detect until your data is already wrong. If each application session creates a temporary table called "my_session" and then two applications swap database backends due to pooling rotation, then they could get a query which successfully returns the wrong data. Also, if database connections live for a very long time, they would accumulate a large memory allocation on the server for temporary tables. Postgres sessions can also have a security context, mainly through SET ROLE. Again, this isn't something which Rails applications do by default. Again, if sessions are swapped, the security context would be inherited. Since any connection pool has to be the same database user, though, this doesn't actually give the application access to any data it couldn't get otherwise; this is mostly just a source of errors. My concern about doing DISCARD ALL by default is that I know many Rails apps which spin up a database connection every time a thread starts up, whether or not it needs the database. If you have an application which often has non-database sessions, the total number of DISCARD ALLs can be a siginificant source of load (in the case I looked at, we were talking about 1.6m per hour). However, my main reason to propose not doing DISCARD ALL by default was just backwards compatibility, rather than concerns about performance, which only affect specific usage patterns. Does that answer your question? |
Rails Active Record (the ORM) does let you set all those parameters, see the API doc at http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/PostgreSQLAdapter.html The AR connection pool stores connections in a Ruby queue object, with the number of elements limited by the configurable pool size. A connection is acquired from the pool using the scheme: 1. using an available connection from the queue, or 2. creating a new connection on the queue if the pool is not at capacity, or 3. waiting on the queue for a connection to become available. Regarding the concern about Rails threads acquiring connections, they would all grab a connection at boot time (if using something like Puma) and only have to attempt reconnection if the connection times out and an Active Record call occurs. If a thread is not doing any db operations, the connection could time out, at which point the pooler would execute DISCARD ALL. So that doesn't appear to be a problem. Rails apps which are scaled vertically on very large machines with many Rails threads could possibly be a concern. I would venture to guess the great majority of Rails apps scale horizontally on lots of small servers (mainly virtual servers.) The greatest Rails Postgres security concern is when users set up database.yml with credentials for user 'postgres' and then don't change them to some other user. They do this in order to run their rake tasks to perform database operations such as initial schema creation, data population and subsequently schema updates using 'rake db:migrate' after adding or modifying tables. But that is a usage problem, not a Rails problem. Thanks again for taking the time to answer these questions. |
My concern about frequent DISCARD ALL is definitely performance. Firstly, any time we do a DISCARD ALL, we must necessarily follow that with a series of SETs, which means you're talking about ~5 round-trips to the database server. (Though this could be reduced by sufficient code rearrangement, so as to mangle all the SETs into a single multi-statement string.) Secondly, a well-behaved (or at least, optimally-behaved) application will avoid keeping a database connection checked out of the pool: ideally, it will perform a separate pool checkout for each query, immediately returning the connection thereafter (modulo transactions, of course). FWIW, we do now do DISCARD ALL under some conditions: https://github.com/rails/rails/blob/9e457a8/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L614-620 -- but that's only called internally when a thread disappears unexpectedly and we want to take back its connection. If an application is doing something that involves a SET or similar, it is the application's responsibility to return the connection to the pool in the same state it left... which, if it doesn't have a more efficient plan, it can achieve by simply calling @dnagir can you expand on "any medium to large Rails app's performance won't be affected"? I've heard complaints from such apps that the 'SELECT 1' we do on checkout is a measurable performance issue. |
Alamoz, it's starting to sound like we need to go over this in person. Where are you located? Going to any conferences soon? What context does the Rails pooler operate in? Server? Process? |
Matthew, why would you be doing a series of SETs? What session variables are you SETing? I haven't, in general, seen Rails apps doing SET at all. Returning stuff to the pool after one query does raise a potential performance issue with DISCARD all in high-volume environments, though. If you're doing DISCARD ALL after every other query you execute, it is going to be costly for users who are already hitting the DB server hard. Mind you, most Rails apps barely touch the DB server. |
Sorry Josh, my "we" was ambiguous. I agree that most sane Rails apps shouldn't be doing any SETs. I meant these: https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L935 -- some of which can be skipped when they're already set correctly, thanks to Or should we be doing more to get those into the initial connection string, so they're kept/restored during a DISCARD? The Rails connection pool is in-process, sharing |
DISCARD ALL only needs to occur when a connection is checked back into the pool or reset, definitely not after every query, that isn't necessary. The pool maintains connections to Rails threads until they time out or are disconnected. The SET statements need to be done whenever a new connection is allocated. #14392 does DISCARD ALL only when a connection is cleared, but doesn't re-run the SET statements. My intent with #14392 is to show where the DISCARD ALL should occur; the SET statements are a bit trickier and I'd prefer to see if a committer is actually interested in implementing the change before doing that work. |
@matthewd Putting the settings into the initial connection string is definitely the best way to do it. The pool will have to save the connection settings for each Rails thread, in the event a connection is lost by the pool. Adding the DISCARD ALL, along with placing pg settings in the connection string, should have a minimal impact on performance, and making the DISCARD ALL a configurable parameter allows users to disable it if they wish. |
Alamoz, Matthew: This is really sounding like something we should hammer out in real time, possibly with 1 or 2 more folks who do more Rails than I do. Hangout? Skype? Conference? |
I'm open to that, but think it isn't necessary. I will be at the Postgres meeting tonight, perhaps we can have a few words at the pub afterwards. Think of the Rails connection pool as behaving just like pgbouncer, with not as many bells and whistles (yet.) |
Adrien, ok, let's talk that over some more. I didn't know what you were getting at last month. BTW, tonights preso is about a high-volume Rails app. |
Issue #12867 touches on important aspects of PG connection pooling. |
This issue has been automatically marked as stale because it has not been commented on for at least The resources of the Rails team are limited, and so we are asking for your help. If you can still reproduce this error on the Thank you for all your contributions. |
This issue has been automatically closed because of inactivity. If you can still reproduce this error on the Thank you for all your contributions. |
Do you guys have anything new RE this issue? |
@tenderlove sorry for resurrecting this old thread, however connection pool is still leaking state on master. An initializer with the if defined?(::PhusionPassenger)
# Phusion Passenger until at least version 5.2.3 calls the ActiveRecord::Base.clear_all_connections! method
# after instead of before forking the parent process. As PostgreSQL creates connections with the FD_CLOEXEC
# (close-on-exec) flag, the parent process connections are closed when the child process is executed.
#
# TODO: Remove this conditional when the aforementioned Phusion Passenger bug is fixed
_checkin_block = lambda do |_connection|
begin
_connection.raw_connection.exec('DISCARD ALL')
rescue
nil
end
end
_checkout_block = lambda do |_connection|
begin
_connection.__send__(:configure_connection)
rescue
nil
end
end
else
_checkin_block = lambda do |_connection|
_connection.raw_connection.exec('DISCARD ALL')
end
_checkout_block = lambda do |_connection|
_connection.__send__(:configure_connection)
end
end
if Rails::VERSION::STRING < "3"
::ActiveRecord::Base.connection.class.set_callback(:checkin, :before, &_checkin_block)
::ActiveRecord::Base.connection.class.set_callback(:checkout, :before, &_checkout_block)
else
::ActiveRecord::Base.connection.class.checkin(&_checkin_block)
::ActiveRecord::Base.connection.class.checkout(&_checkout_block)
end |
When a connection is returned to the connection pool, the associated resources do not seem to be released (judging by the
search_path
in PostgreSQL being preserved).This could be a major problem leaking state and the resources being held internally by the Database (potentially growing).
I've discovered this trying to figure out a few things in the other question.
As it was pointed out be one of the PostgreSQL gurus:
This is especially true if we'll look at dedicated pooling solutions, like
pgpool
that actually usesABORT; DISCARD ALL
commands before returning the connection to the pool.The text was updated successfully, but these errors were encountered: