Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Ensure Flask framework leverages the Flask-SQLAlchemy session (Phase II) #26909

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Jan 31, 2024

SUMMARY

This is a follow up to #26200 in order to try to rid the system of any non-global Flask-SQLAlchemy sessions (outside of Alembic migrations) via a more aggressive approach. This is somewhat of a prerequisite in order for us to define a consistent "unit of work" model.

Note I suspect passing of the SQLAlchemy session exists in part for testing purposes—though in reality one should never add code complexity to aid with testing when mocking is an option. Specifically there exists a mocked session with pre-loaded data which is leveraged by a slew of tests, however the session fixture (which relies on the get_session fixture), clearly mocks the Flask-SQLAlchemy session,

mocker.patch("superset.db.session", in_memory_session)

thus there's no need to reference the fixture session directly as db.session points to the same in memory session.

The TL;DR is this PR should provide a decent amount of code cleanup by removing unnecessary redundancy. I've also updated numerous tests to use db.session everywhere (as opposed to session or session_with_data) given it's monkey patched (as mentioned previously). This helps to ensure consistency throughout the code base which aids with readability. Additionally it reiterates that there is really only one session—the global Flask-SQLAlchemy session—which should be used.

The only place where passing db.session still persists in for functions which are related to Alembic migrations (including tests) as the migrations currently use a session other than the Flask-SQLAlchemy one. You can read more about that in #26172.

I've also declared that the db, sesh, and session variable be a disallowed name (even though there maybe the odd false positive). This should help prevent developers from trying to pass around either the global Flask-SQLAlchemy session or referencing another session as well as ensuring that we don't convolute the db name. There still exists instances where this is used, however this is deemed an anti-pattern which we should strive to remove.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link

codecov bot commented Jan 31, 2024

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (e8e208d) 67.17% compared to head (77b87b0) 69.49%.

Files Patch % Lines
superset/utils/dashboard_import_export.py 0.00% 3 Missing ⚠️
superset/commands/dataset/importers/v1/utils.py 75.00% 2 Missing ⚠️
superset/commands/importers/v1/assets.py 80.00% 2 Missing ⚠️
superset/commands/importers/v1/examples.py 0.00% 2 Missing ⚠️
superset/cli/importexport.py 0.00% 1 Missing ⚠️
...perset/commands/dashboard/importers/v1/__init__.py 88.88% 1 Missing ⚠️
superset/connectors/sqla/utils.py 50.00% 1 Missing ⚠️
superset/models/dashboard.py 0.00% 1 Missing ⚠️
superset/tables/models.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26909      +/-   ##
==========================================
+ Coverage   67.17%   69.49%   +2.32%     
==========================================
  Files        1900     1900              
  Lines       74443    74433      -10     
  Branches     8293     8293              
==========================================
+ Hits        50009    51730    +1721     
+ Misses      22379    20648    -1731     
  Partials     2055     2055              
Flag Coverage Δ
hive 53.80% <31.96%> (?)
mysql 78.02% <86.06%> (-0.01%) ⬇️
postgres 78.12% <86.06%> (-0.01%) ⬇️
presto 53.75% <31.96%> (?)
python 83.08% <88.52%> (+4.82%) ⬆️
sqlite 77.64% <86.06%> (-0.01%) ⬇️
unit 56.42% <48.36%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@john-bodley john-bodley force-pushed the john-bodley--sip-99-non-celery-sessions-phase-2 branch from 25d5ce0 to 992da6a Compare January 31, 2024 03:10
@john-bodley john-bodley force-pushed the john-bodley--sip-99-non-celery-sessions-phase-2 branch from 992da6a to e045fd7 Compare January 31, 2024 05:56
@@ -131,7 +131,9 @@ def __str__(self) -> str:
return f"<TaggedObject: {self.object_type}:{self.object_id} TAG:{self.tag_id}>"


def get_tag(name: str, session: orm.Session, type_: TagType) -> Tag:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left tags as is. The way we use SQLAlchemy event handlers isn't great as i) these aren't invoked for batch operations, and ii) we're mutating records other than the record associated with that callback which requires the use of a new session. Ideally these operations should be handled via commands.

@john-bodley john-bodley force-pushed the john-bodley--sip-99-non-celery-sessions-phase-2 branch 10 times, most recently from 76a96b3 to 500fb4f Compare February 1, 2024 01:57
@john-bodley john-bodley marked this pull request as ready for review February 1, 2024 02:15
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the cleanup work @john-bodley!

I left some first-pass comments. I tried to highlight the occurrences of session parameters that were not being used anymore but I definitely missed some. I suggest doing a full scan for unused session and session_with_data parameters.

@@ -1748,7 +1748,7 @@ def test_export_dataset(self):
assert rv.status_code == 200

cli_export = export_to_dict(
session=db.session,
db.session.db.session,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
db.session.db.session,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! My sed search and replace definitely screwed up a few things, but I thought I had addressed all of these given that CI was passing. Having mypy and pylint checks enabled for our tests would definitely be beneficial.

@@ -39,7 +39,7 @@
@pytest.fixture
def chart(app_context, load_world_bank_dashboard_with_slices) -> Slice:
session: Session = app_context.app.appbuilder.get_session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
session: Session = app_context.app.appbuilder.get_session

@@ -38,14 +39,14 @@ def test_import_chart(mocker: MockFixture, session: Session) -> None:

mocker.patch.object(security_manager, "can_access", return_value=True)

engine = session.get_bind()
engine = db.session.get_bind()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove test_import_chart's session parameter as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from superset.utils.core import DatasourceType


@pytest.fixture
def session_with_data(session: Session) -> Iterator[Session]:
from superset.models.slice import Slice

engine = session.get_bind()
engine = db.session.get_bind()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks weird because the function is receiving a session via parameter to enhance it with data but in reality the parameter is not being used to process the request. Maybe we should remove the session parameter and rename the function to add_data_to_session or add_slice_to_session.

Copy link
Member Author

@john-bodley john-bodley Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-s-molina I'll revert the logic so that all fixtures augment the session fixture as opposed to db.session. Also I don't think these fixtures behave as the author(s) originally intended in terms of how the records are committed/rollbacked—though that's a headache for a different day.

yield session
session.rollback()
db.session.rollback()


def test_slice_find_by_id_skip_base_filter(session_with_data: Session) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_slice_find_by_id_skip_base_filter(session_with_data: Session) -> None:
def test_slice_find_by_id_skip_base_filter() -> None:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from sqlalchemy.orm.session import Session

from superset import db


def test_table_model(session: Session) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_table_model(session: Session) -> None:
def test_table_model() -> None:

from superset.utils.core import DatasourceType


@pytest.fixture
def session_with_data(session: Session):
from superset import db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before about enhancing the session with data.

db.session.add(saved_query)
db.session.add(dashboard_obj)
db.session.commit()
yield db.session


def test_create_command_success(session_with_data: Session, mocker: MockFixture):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_create_command_success(session_with_data: Session, mocker: MockFixture):
def test_create_command_success(mocker: MockFixture):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -30,7 +31,7 @@ def session_with_data(session: Session):
from superset.models.sql_lab import Query, SavedQuery
from superset.tags.models import Tag

engine = session.get_bind()
engine = db.session.get_bind()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before about enhancing the session with data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -117,8 +120,8 @@ def test_update_command_success_duplicates(
from superset.models.slice import Slice
from superset.tags.models import ObjectType, TaggedObject

dashboard = session_with_data.query(Dashboard).first()
chart = session_with_data.query(Slice).first()
dashboard = db.session.query(Dashboard).first()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove session_with_data parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley
Copy link
Member Author

john-bodley commented Feb 1, 2024

Thanks @michael-s-molina for the review. I think there's a pytest pattern which can be somewhat confusing related to fixtures. There are numerous tests which have a signature of the form,

def test_foo(session: Session, ...):
    ...

or

def test_foo(session_with_data: Session, ...):
    ...

though these are in fact SQLAlchemy sessions (which can be accessed directly, i.e., session_with_data.query(...) they are pytest fixtures and their primary purpose in the function signature is for the test to leverage the fixture (given they're scope is non-global), i,.e., an in-memory session which may contain various objects (example) and thus are still required. These sessions are accessible via the Flask-SQLAlchemy global session (db.session) given that it's been patched here. The TL;DR you can think of these fixtures as providing the tests with pre-loaded ephemeral data which isn't persisted to the database. The challenge with fixtures (in conjunction with monkey patching) is that it's not overly apparent when reading the code how/where they are actually used.

Per the PR description I wanted to make it really clear that we should always be querying the session via the Flask-SQLAlchemy interface. It's a clearer/simpler pattern in my opinion.

@john-bodley john-bodley force-pushed the john-bodley--sip-99-non-celery-sessions-phase-2 branch 5 times, most recently from e6aabd1 to 5e99587 Compare February 2, 2024 01:30
@john-bodley john-bodley force-pushed the john-bodley--sip-99-non-celery-sessions-phase-2 branch from 5e99587 to 0a7beef Compare February 12, 2024 19:50
@john-bodley john-bodley force-pushed the john-bodley--sip-99-non-celery-sessions-phase-2 branch from 0a7beef to 77b87b0 Compare February 12, 2024 20:25
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments @john-bodley.

The challenge with fixtures (in conjunction with monkey patching) is that it's not overly apparent when reading the code how/where they are actually used.

I'm not a fan of this pattern but I guess this is not in the scope of this PR so LGTM.

@john-bodley john-bodley merged commit 847ed3f into apache:master Feb 13, 2024
41 checks passed
@john-bodley john-bodley deleted the john-bodley--sip-99-non-celery-sessions-phase-2 branch February 13, 2024 17:20
sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.0.0 labels Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XXL 🚢 4.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants