Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Refactor SQL engine username logic #19914

Merged
merged 1 commit into from
May 13, 2022

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented May 1, 2022

SUMMARY

This PR fixes a number of issues (some cosmetic) with the SQL username. Specifically:

  1. The username was referred to as both username and user_name. The inconsistency added undue complexity.
  2. It a number of places the g.user.username was passed as the username argument of the get_sqla_engine function (or similar) even when it was the default fallback. This added unnecessary complexity, i.e., from reading the code it wasn't apparent when/why the username would be overridden. In reality it mostly isn't, i.e., it should always be the logged in user unless the query is being executed from Celery—where there is no user logged in.
  3. The logic introduced in fix: [alert] should run alert query from report account #17499 wasn't suffice, in actuality it likely added more complexity, as although it tried to ensure that the get_df for the alert was executed by the passed in user, if the underlying query used Jinja templating for fetching the the latest partition or similar those queries were executed by the user defined in the SQLAlchemy URI (if defined; falling back to the system user)—leveraging the SQLAlchemy inspector—per here.

The fix for (1) was to rename user_name to username where possible. For (2) and (3) the combined fix was to actually remove overriding the username in functions given the level of nesting, etc. and rely simply on using g.user.username, i.e., the logged in user, as the user (or effective user) for all queries. Given we're using globals via flask.g—Flask's versions of globals which persist for the duration of the request—why not update it rather than passing around another username everywhere. For instances where there was no user logged in: alerting, async SQL Lab queries, etc. we simply temporarily override the g.user using the provided username. Hopefully this approach simplified the code and brought more transparency.

The one doubt I had was with how the app context works with Celery and whether flask.g is thread safe, though per here it seems we create a new app context for every task.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI. Updated existing and added new unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley changed the title fix: Refactor SQL username logic fix: Refactor SQL engine username logic May 1, 2022
if not alive:
raise DBAPIError(None, None, None)

with override_user(self._actor):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as previously, though indented due to the context manager.

errors = database.db_engine_spec.extract_errors(ex, context)
raise DatabaseTestConnectionFailedError(errors) from ex

with override_user(self._actor):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as previously, though indented due to the context manager.

return database.get_sqla_engine(
schema=schema, nullpool=True, user_name=user_name, source=source
)
return database.get_sqla_engine(schema=schema, source=source)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullpool=True is the default.

)
return df

with override_user(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as previously, though indented due to the context manager.

stats_logger.incr("error_sqllab_unhandled")
query = get_query(query_id, session)
return handle_query_error(ex, query, session)
with override_user(current_app.app_builder.sm.find_user(username=username)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as previously, though indented due to the context manager.

@@ -149,37 +156,35 @@ def get_sql_results( # pylint: disable=too-many-arguments
rendered_query: str,
return_results: bool = True,
store_results: bool = False,
user_name: Optional[str] = None,
username: Optional[str] = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to pass in the username to the async Celery task and then override.

@john-bodley john-bodley requested review from betodealmeida, villebro and dpgaspar and removed request for betodealmeida May 2, 2022 01:19
@john-bodley john-bodley marked this pull request as ready for review May 2, 2022 02:34
@john-bodley john-bodley force-pushed the john-bodley0--fix-username branch 7 times, most recently from 6e69f87 to 0fd917d Compare May 2, 2022 23:37
self.logout()
self.login(username=(user_name or "admin"))
self.login(username=username)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that on like #372 username is truthy there's no need for the username or "admin" logic.

@codecov
Copy link

codecov bot commented May 3, 2022

Codecov Report

Merging #19914 (fff9ad0) into master (fff9ad0) will not change coverage.
The diff coverage is n/a.

❗ Current head fff9ad0 differs from pull request most recent head ea901ba. Consider uploading reports for the commit ea901ba to get more accurate results

@@           Coverage Diff           @@
##           master   #19914   +/-   ##
=======================================
  Coverage   66.36%   66.36%           
=======================================
  Files        1712     1712           
  Lines       64088    64088           
  Branches     6744     6744           
=======================================
  Hits        42529    42529           
  Misses      19846    19846           
  Partials     1713     1713           
Flag Coverage Δ
hive 53.65% <0.00%> (ø)
mysql 82.04% <0.00%> (ø)
postgres 82.10% <0.00%> (ø)
presto 53.51% <0.00%> (ø)
python 82.53% <0.00%> (ø)
sqlite 81.83% <0.00%> (ø)
unit 49.01% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fff9ad0...ea901ba. Read the comment docs.

@john-bodley john-bodley force-pushed the john-bodley0--fix-username branch 2 times, most recently from 00c93e0 to 31c405e Compare May 4, 2022 03:39
@john-bodley john-bodley requested review from etr2460 and ktmud May 6, 2022 17:18
@@ -1400,6 +1401,30 @@ def get_username() -> Optional[str]:
return None


@contextmanager
def override_user(user: Optional[User]) -> Iterator[Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be def override_user(user: Optional[User]) -> Iterator[None]: ?

else:
g.user = user
yield
delattr(g, "user")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar dpgaspar mentioned this pull request May 10, 2022
9 tasks
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've long been annoyed by the user_?name = g.user.username if g.user and hasattr(g.user, "username") else None snippet that is littered around the codebase. Super happy to see it removed and centralized behind a single codepath.

@john-bodley john-bodley force-pushed the john-bodley0--fix-username branch 2 times, most recently from 1792826 to 8d6dfb6 Compare May 12, 2022 23:52
@john-bodley john-bodley merged commit 449d08b into apache:master May 13, 2022
@john-bodley john-bodley deleted the john-bodley0--fix-username branch May 13, 2022 04:03
philipher29 pushed a commit to ValtechMobility/superset that referenced this pull request Jun 9, 2022
Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants