Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(models.core:Database): Allocate a SQLA engine once per process per URL. #27899

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pedro-r-marques
Copy link

SUMMARY

As per issue #27897, SqlAlchemy recommends that

"""
The typical usage of create_engine() is once per particular database URL, held globally for the lifetime of a single application process. A single Engine manages many individual DBAPI connections on behalf of the process and is intended to be called upon in a concurrent fashion. The Engine is not synonymous to the DBAPI connect() function, which represents just one connection resource - the Engine is most efficient when created just once at the module level of an application, not per-object or per-function call.
"""

The superset.models.core::Database class is currently allocating Engine objects whenever a Database object is instantiated which happens twice(?) per 'api/v1/chart/data' API access. This is not the intended usage of the SQL Alchemy API and causes mechanisms such as connection pooling not to work correctly.

Connection pooling is an important feature in order to control access to databases. For instance when using 'duckdb', it is recommended that one uses a small number of concurrent requests (or a single concurrent request) and instead take advantage of the inherent parallelism in the database engine. Other engines will have other ideal settings.

TESTING INSTRUCTIONS

Configure a connection pool using the DB_CONNECTION_MUTATOR hook.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, @pedro-r-marques, thanks for the PR!

There's been some discussion in the past regarding improving the engines and connection pooling (#8574), and your PR seems to be the first step in that direction.

tpl = cls._sqla_engines.get(sqlalchemy_url)
if tpl is not None:
engine, cparams = tpl
if cparams == params:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have params be part of the key as well, in some deterministic serialized form? A few databases that support user impersonation will add the user info to params, and in that case the engine manager would not be very useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Transform potentially nested dict into a tuple.
@michael-s-molina michael-s-molina linked an issue Apr 5, 2024 that may be closed by this pull request
3 tasks
@pedro-r-marques pedro-r-marques changed the title fix(models.core::Database) Allocate a SQLA engine once per process per URL. fix(models.core:Database): Allocate a SQLA engine once per process per URL. Apr 8, 2024
@pedro-r-marques
Copy link
Author

pedro-r-marques commented Apr 8, 2024

The CI failures above:

  • Translations was a CI error (network failure).
  • pre-commit check wants to reformat lines of code in a file that I touched but have nothing to do with the PR. What is the policy for the project ? should I add the change to make 'black' happy ? Or ignore it since it is unrelated to the intended change ?

superset/models/core.py Outdated Show resolved Hide resolved
@eschutho
Copy link
Member

@pedro-r-marques I restarted CI and fixed the formatting issue. We'll see if it passes now.

@pedro-r-marques
Copy link
Author

@eschutho The formatting issue is fixed. Unfortunatly the test-mysql test seems to be failing in test_chart_data_async. I've attempted to run the test multiple times in my setup (using the same mysql version and redis cache configuration as the CI) with no luck.

Any hints ?

From the failure code, I'm under the impression that the test is expecting result code 202 [ which would mean that the job has been submitted to a cellery worker ? ] and is instead getting a 200 [ meaning the result is being served from cache ? ]

I've tried running the test both at the version of the code in the PR as well as at HEAD. It always passes for me.

@ShubhamDalmia
Copy link

Any updates on this?

@villebro
Copy link
Member

villebro commented Jul 8, 2024

From the failure code, I'm under the impression that the test is expecting result code 202 [ which would mean that the job has been submitted to a cellery worker ? ] and is instead getting a 200 [ meaning the result is being served from cache ? ]

@pedro-r-marques the particular test is hopelessly flaky (we will likely need to remove it, as fixing it has proved to be difficult for the MySQL integration test). Can you rebase the PR to fix the conflict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sqlalchemy engine should be created once per process
5 participants