Thread-local connections #1312

beckjake · 2019-02-20T18:48:51Z

Feature

We should store connections on a singleton-per-thread basis, I assume using thread-local storage but it's not really important how.

Feature description

Currently, dbt's connection/thread management works like this:

we ensure that each model is run in a thread
those threads get the name of their connection as a parameter to most methods, etc - either as a model_name parameter, handled by DBWrapper, as the model.alias attribute, or in special cases as the connection_name parameter.
we store connection objects in a map where the keys are the model name that's being executed
individual runners (one per thread!) manage connection state based on the connection name in use
connections are potentially re-used in kind of complicated ways involving passing them around and renaming them (and potentially, re-homing them into new threads).

So threads are supposed to have one connection per thread, and that's managed by having the threads keep track of a managed name and carefully passing it along everywhere, using that name to open/close connections, etc.

My proposal is to instead formally bind connection objects to threads, and then have threads manage the open/close/transaction state of "their connection" in a more model-agnostic way. Each thread owns a connection stored in thread-local storage (or a dict mapping thread name -> connection, if thread-local storage is hard to get right - not important as long as it's a per-thread connection). No more names, no more "in use connections" storage, etc.

Who will this benefit?

This will help resolve our numerous connection-leaking related issues, and I think also make transaction management a lot simpler. Note that the latter will always be hard because we have to keep dbt's idea of transaction state in sync with the remote system's, and that's tricky.

The text was updated successfully, but these errors were encountered:

beckjake · 2019-03-04T15:40:16Z

update/note after thinking about this a bit and poking around: thread-local storage isn't great for this problem, as we do need to access child thread connection objects from the main thread in order to do cancellation and cleanup. So it'll have to be a dict of thread IDs/names -> connections. I think the dict method also handles the test/hook running case more nicely and is less likely to have cross-platform weirdness, so it's probably for the best

beckjake · 2019-03-08T13:22:17Z

Fixed in #1336

beckjake mentioned this issue Mar 6, 2019

per-thread connections #1336

Merged

beckjake closed this as completed Mar 8, 2019

sacundim mentioned this issue Nov 4, 2021

Frequent nondeterministic errors in dbt run and dbt seed when using multiple threads Tomme/dbt-athena#41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread-local connections #1312

Thread-local connections #1312

beckjake commented Feb 20, 2019

beckjake commented Mar 4, 2019

beckjake commented Mar 8, 2019

Thread-local connections #1312

Thread-local connections #1312

Comments

beckjake commented Feb 20, 2019

Feature

Feature description

Who will this benefit?

beckjake commented Mar 4, 2019

beckjake commented Mar 8, 2019