You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should store connections on a singleton-per-thread basis, I assume using thread-local storage but it's not really important how.
Feature description
Currently, dbt's connection/thread management works like this:
we ensure that each model is run in a thread
those threads get the name of their connection as a parameter to most methods, etc - either as a model_name parameter, handled by DBWrapper, as the model.alias attribute, or in special cases as the connection_name parameter.
we store connection objects in a map where the keys are the model name that's being executed
individual runners (one per thread!) manage connection state based on the connection name in use
connections are potentially re-used in kind of complicated ways involving passing them around and renaming them (and potentially, re-homing them into new threads).
So threads are supposed to have one connection per thread, and that's managed by having the threads keep track of a managed name and carefully passing it along everywhere, using that name to open/close connections, etc.
My proposal is to instead formally bind connection objects to threads, and then have threads manage the open/close/transaction state of "their connection" in a more model-agnostic way. Each thread owns a connection stored in thread-local storage (or a dict mapping thread name -> connection, if thread-local storage is hard to get right - not important as long as it's a per-thread connection). No more names, no more "in use connections" storage, etc.
Who will this benefit?
This will help resolve our numerous connection-leaking related issues, and I think also make transaction management a lot simpler. Note that the latter will always be hard because we have to keep dbt's idea of transaction state in sync with the remote system's, and that's tricky.
The text was updated successfully, but these errors were encountered:
update/note after thinking about this a bit and poking around: thread-local storage isn't great for this problem, as we do need to access child thread connection objects from the main thread in order to do cancellation and cleanup. So it'll have to be a dict of thread IDs/names -> connections. I think the dict method also handles the test/hook running case more nicely and is less likely to have cross-platform weirdness, so it's probably for the best
Feature
We should store connections on a singleton-per-thread basis, I assume using thread-local storage but it's not really important how.
Feature description
Currently, dbt's connection/thread management works like this:
model_name
parameter, handled by DBWrapper, as themodel.alias
attribute, or in special cases as theconnection_name
parameter.So threads are supposed to have one connection per thread, and that's managed by having the threads keep track of a managed name and carefully passing it along everywhere, using that name to open/close connections, etc.
My proposal is to instead formally bind connection objects to threads, and then have threads manage the open/close/transaction state of "their connection" in a more model-agnostic way. Each thread owns a connection stored in thread-local storage (or a dict mapping thread name -> connection, if thread-local storage is hard to get right - not important as long as it's a per-thread connection). No more names, no more "in use connections" storage, etc.
Who will this benefit?
This will help resolve our numerous connection-leaking related issues, and I think also make transaction management a lot simpler. Note that the latter will always be hard because we have to keep dbt's idea of transaction state in sync with the remote system's, and that's tricky.
The text was updated successfully, but these errors were encountered: