Skip to content

test(database): regression test for sqla engine creation (#27897)#40237

Open
rusackas wants to merge 4 commits into
masterfrom
tdd/issue-27897-sqla-engine-cache
Open

test(database): regression test for sqla engine creation (#27897)#40237
rusackas wants to merge 4 commits into
masterfrom
tdd/issue-27897-sqla-engine-cache

Conversation

@rusackas
Copy link
Copy Markdown
Member

SUMMARY

This is a test-only PR opened as a TDD-style validation of issue #27897.

#27897 (filed 2024-04) reports that Database._get_sqla_engine calls create_engine on every invocation instead of caching the engine per URL. Per SQLAlchemy docs an engine is meant to be created once per process per URL so its connection pool can do its job; the current behavior defeats the pool every time DB_CONNECTION_MUTATOR configures one.

This PR adds one regression test on Database._get_sqla_engine:

  1. test_get_sqla_engine_caches_engine_per_url — patches create_engine, calls _get_sqla_engine(nullpool=False) twice for the same URL, and asserts create_engine was called exactly once. Will fail until the engine is cached.

How to interpret CI

  • CI green → engine caching has been added since the report; merging closes sqlalchemy engine should be created once per process #27897 and locks in the regression guard.
  • CI red → bug is still live (most likely outcome). Likely fix: introduce a per-URL engine cache in Database._get_sqla_engine (or hoist into a module-level dict keyed by sqlalchemy_url + connect_args).

TESTING INSTRUCTIONS

pytest tests/unit_tests/models/core_test.py::test_get_sqla_engine_caches_engine_per_url -v

ADDITIONAL INFORMATION

🤖 Generated with Claude Code

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 19, 2026

Code Review Agent Run #958a56

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 74c0769..74c0769
    • tests/unit_tests/models/core_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added the data:databases Related to database configurations and connections label May 19, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.01%. Comparing base (9c90a68) to head (66adb6e).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40237      +/-   ##
==========================================
- Coverage   64.20%   64.01%   -0.19%     
==========================================
  Files        2592     2592              
  Lines      139232   138949     -283     
  Branches    32327    32214     -113     
==========================================
- Hits        89389    88952     -437     
- Misses      48308    48459     +151     
- Partials     1535     1538       +3     
Flag Coverage Δ
hive 39.25% <85.71%> (+<0.01%) ⬆️
mysql 58.77% <100.00%> (+<0.01%) ⬆️
postgres ?
presto 40.93% <85.71%> (+<0.01%) ⬆️
python 60.26% <100.00%> (-0.14%) ⬇️
sqlite 58.49% <100.00%> (+<0.01%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

… once per URL

Closes #27897

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rusackas rusackas force-pushed the tdd/issue-27897-sqla-engine-cache branch from 74c0769 to 125624e Compare May 20, 2026 20:51
Wires up _ENGINE_CACHE — a module-level dict keyed by
(database_id, str(sqlalchemy_url), repr(sorted(engine_kwargs.items())))
— so that successive _get_sqla_engine(nullpool=False) calls reuse the
same Engine instance instead of building a fresh one each invocation.
Per SQLAlchemy docs the engine is meant to live for the process lifetime;
recreating defeats every pool an operator configures via
DB_CONNECTION_MUTATOR (the original bug report's duckdb queue-size-1
seeing multiple simultaneous connections).

nullpool=True engines are skipped — those are intentionally poolless and
there's nothing to reuse.

The regression test added in the prior commit clears _ENGINE_CACHE in
its setup so test ordering can't smuggle a stale entry past the
assertion.

Closes #27897
@rusackas rusackas requested a review from aminghadersohi May 21, 2026 20:56
Copy link
Copy Markdown
Contributor

@aminghadersohi aminghadersohi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix — the regression test is excellent and the cache comment block is unusually thorough. A few observations:

HIGH — thread safety on first access

_ENGINE_CACHE is a module-level dict with no locking (core.py:101). Under multi-threaded gunicorn workers the check-then-set at lines 587–589 is not atomic: two concurrent first-access calls for the same URL can both see a cache miss, both call create_engine, and both write — leaving a briefly-duplicated connection pool. Python's GIL keeps the dict from corrupting, but the "one engine per process per URL" guarantee becomes probabilistic rather than strict.

Suggested fix — protect lazy-init with a lock:

import threading
_ENGINE_CACHE: dict[tuple[int | None, str, str], Engine] = {}
_ENGINE_CACHE_LOCK = threading.Lock()

# in _get_sqla_engine, replace the cache block with:
with _ENGINE_CACHE_LOCK:
    if cached := _ENGINE_CACHE.get(cache_key):
        return cached
    engine = create_engine(sqlalchemy_url, **engine_kwargs)
    _ENGINE_CACHE[cache_key] = engine
return engine

MEDIUM — self.id is None for unsaved instances

The cache key uses self.id as its first element (core.py:583). Database.id is None before the object is saved to the database. Two distinct unsaved Database instances with the same URI and engine kwargs would silently share a cache entry. Production code always saves before connecting, so this is unlikely, but worth documenting or guarding (e.g. skip caching when self.id is None).

NITs

  • dict[tuple[Any, ...], Engine] → the key is always (int | None, str, str), so a more specific annotation is available.
  • cached = ...; if cached is not None: return cached → walrus: if cached := _ENGINE_CACHE.get(cache_key): return cached
  • Inline import at core_test.py:555 follows the existing file convention but worth cleaning up file-wide in a follow-up.

Praise

The regression test is exactly right: clears the global cache before running, mocks create_engine at the correct import path, calls twice, asserts call_count == 1 with a descriptive failure message. The docstring cites the SQLAlchemy docs and the originating issue. The nullpool guard is also correct — ephemeral engines should not be cached and you've handled both the lookup skip and the write skip.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 22, 2026

Code Review Agent Run #904d8a

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 125624e..9a6f76c
    • superset/models/core.py
    • tests/unit_tests/models/core_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread superset/models/core.py Outdated
# size-1 queue). Skip the cache when ``nullpool`` is True — those
# engines are intentionally poolless and there's nothing to reuse.
cache_key: tuple[Any, ...] | None = None
if not nullpool:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache is no longer engaged now. The new code is gated by if not nullpool: but every callsite of get_sqla_engine/_get_sqla_engine in superset/ accepts the default nullpool=True. The nullpool=True branch then forces poolclass=NullPool at :521, which is the exact behavior the issue says defeats DB_CONNECTION_MUTATOR's pool.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spot on — confirmed: every in-tree caller uses the default nullpool=True, so the cache as written was dormant exactly where it needs to engage. Fixed in 0555735 by removing the if not nullpool: gate so caching happens regardless. Even a NullPool engine has nontrivial construction cost (URL parsing, dialect resolution, connect_args setup, re-running DB_CONNECTION_MUTATOR), and the operator pool config #27897 is about can only persist if the engine object itself is reused.

Also updated the regression test to call with the default nullpool=True so it actually exercises the production path — if it had done that originally, you wouldn't have had to catch this for me. Thanks for the careful read!

@bito-code-review
Copy link
Copy Markdown
Contributor

This question isn’t related to the pull request. I can only help with questions about the PR’s code or comments.

Comment thread superset/models/core.py Outdated
# (database_id, str(sqlalchemy_url), repr(sorted(engine_kwargs.items()))).
# Populated only when ``nullpool=False`` — pooled engines are the only ones
# that benefit from process-wide reuse.
_ENGINE_CACHE: dict[tuple[Any, ...], Engine] = {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will editing a Database (password rotation, host change, SSH tunnel reconfig) or deleting it ever pop the cache here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The cache key is (database_id, str(sqlalchemy_url), repr(sorted(engine_kwargs.items()))):

  • Password rotation / host change / SSH tunnel reconfig: sqlalchemy_url is the decrypted URL (built from the latest Database fields each call), and engine_kwargs includes whatever DB_CONNECTION_MUTATOR produces from the latest Database state. So a rotated password or changed host means a different key on the next call → cache miss → fresh engine. Existing in-flight requests on the old engine continue against the old credentials until they finish, which matches SQLAlchemy's normal behavior.
  • Deletion: stale entries linger until process restart — that's a memory footprint concern (a few hundred bytes per dead entry) rather than a correctness one. I'd rather not couple this PR to a SQLAlchemy event hook in Database.__delete__ since the right invalidation surface is bigger than just delete (would also want it on rename/clone), and worth scoping as a follow-up.

Updated the cache header comment to make the URL/kwargs key contract explicit.

@sadpandajoe (blocker): the previous cache was gated by `if not nullpool:`
but every in-tree callsite of `get_sqla_engine` / `_get_sqla_engine` uses
the default `nullpool=True`, which forces `poolclass=NullPool` at L521 —
i.e. the cache was dormant exactly where it needs to engage to fix #27897.
Remove the gate; cache regardless of `nullpool`. Even a NullPool engine
has nontrivial construction cost (URL parsing, dialect resolution,
connect_args setup, re-running DB_CONNECTION_MUTATOR), and the operator
pool config that #27897 is about can only persist if the engine object
itself is reused.

@aminghadersohi (HIGH): thread-safety on first access. Two concurrent
calls under multi-threaded gunicorn could both miss the cache, both
call `create_engine`, and both write. Wrap lookup + write in a
`threading.Lock` so the "one engine per process per (id, URL, kwargs)"
guarantee is strict, not probabilistic.

@aminghadersohi (MEDIUM): `self.id is None` on unsaved instances. Two
distinct in-memory `Database` objects with the same URI would have
collided on a shared cache entry. Skip caching when `self.id is None`.

@aminghadersohi (NITs):
- Tightened `dict[tuple[Any, ...], Engine]` → `dict[tuple[int, str, str], Engine]`.
- Replaced `cached = ...; if cached is not None:` with walrus.

Cache invalidation (sadpandajoe's L101 question): password rotation,
host change, or any mutation that touches the decrypted URL or
DB_CONNECTION_MUTATOR-produced kwargs naturally falls through to a fresh
engine because the cache key includes both. Stale entries remain until
process restart — that's a memory footprint concern but not a
correctness one, and worth a follow-up rather than blocking this PR.

Updated regression test to exercise the production default path
(`nullpool=True`) instead of `nullpool=False`, which would have masked
sadpandajoe's finding. Added a second test asserting unsaved instances
don't cache.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rusackas
Copy link
Copy Markdown
Member Author

Thanks for the thorough review @aminghadersohi! All four addressed in 0555735:

  • HIGH (thread safety): added _ENGINE_CACHE_LOCK = threading.Lock(), wrap both lookup and write. The 'one engine per process per (id, URL, kwargs)' guarantee is now strict.
  • MEDIUM (self.id is None): skip caching when self.id is None so distinct unsaved instances don't collide. Added a second test test_get_sqla_engine_does_not_cache_unsaved_instances covering this.
  • NIT (type annotation): tightened dict[tuple[Any, ...], Engine]dict[tuple[int, str, str], Engine].
  • NIT (walrus): if cached := _ENGINE_CACHE.get(cache_key): return cached.
  • ⏭️ NIT (file-wide inline import cleanup): agreed, follow-up — out of scope for this PR.

Also addressed @sadpandajoe's blocker (the cache was gated by if not nullpool: but every production callsite uses the nullpool=True default, so the cache was dormant in prod) by removing the gate. See his thread for details. Updated the regression test to use the default-arg path so it actually exercises the production code path.

Pre-commit (ruff/mypy/pylint) all pass locally.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 23, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 66adb6e
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a1238a412dda00008e87a3d
😎 Deploy Preview https://deploy-preview-40237--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@aminghadersohi aminghadersohi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three concerns from the previous round are addressed: guards both the cache read and write (HIGH); the gate prevents unsaved instances from sharing cache entries, with a dedicated regression test (MEDIUM); and the type annotation is now with the walrus operator in the lookup (NITs). LGTM.

Copy link
Copy Markdown
Contributor

@aminghadersohi aminghadersohi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three concerns from the previous round are addressed: threading.Lock() guards both the cache read and write (HIGH); the self.id is not None gate prevents unsaved instances from sharing cache entries, with a dedicated regression test (MEDIUM); and the type annotation is now dict[tuple[int, str, str], Engine] with the walrus operator in the lookup (NITs). LGTM.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 23, 2026

Code Review Agent Run #daa67f

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 9a6f76c..0555735
    • superset/models/core.py
    • tests/unit_tests/models/core_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:databases Related to database configurations and connections size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sqlalchemy engine should be created once per process

5 participants