Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllab] Rendering query prior to sending to Celery #4669

Merged
merged 1 commit into from Mar 29, 2018

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Mar 22, 2018

We're running the Celery workers via

celery worker --app=superset.sql_lab:celery_app

rather than

superset worker

which throws a RuntimeError: Working outside of application context. when trying to use templates of the form '{{ presto.latest_partition("schema.table") }}' which doesn't have access to the Flask g variable which is used to determine whether the user has access to said table.

Note I'm not certain what logic should reside in Celery and what should be pre-computed on the Flask server, or whether the Celery worker should even be Flask aware, but this PR simply enforces that the template processing occurs within an application context.

to: @michellethomas @mistercrunch @timifasubaa

@mistercrunch
Copy link
Member

I thought there was custom logic already to not require g when in async mode (using the query's username field instead). What line is raising?

@john-bodley
Copy link
Member Author

john-bodley commented Mar 23, 2018

@mistercrunch here's the stack trace:

This typically means that you attempted to use functionality that needed
to interface with the current application object in a way.  To solve
this set up an application context with app.app_context().  See the
documentation for more information.
Traceback (most recent call last):
  File "/home/john_bodley/incubator-superset/superset/sql_lab.py", line 199, in execute_sql
    executed_sql, **tp)
  File "/home/john_bodley/incubator-superset/superset/jinja_context.py", line 109, in process_template
    return template.render(kwargs)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 1, in top-level template code
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/sandbox.py", line 427, in call
    return __context.call(__obj, *args, **kwargs)
  File "/home/john_bodley/incubator-superset/superset/jinja_context.py", line 129, in latest_partition
    table_name, schema, self.database)[1]
  File "/home/john_bodley/incubator-superset/superset/db_engine_specs.py", line 726, in latest_partition
    indexes = database.get_indexes(table_name, schema)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 813, in get_indexes
    return self.inspector.get_indexes(table_name, schema)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 738, in inspector
    engine = self.get_sqla_engine()
  File "/home/john_bodley/incubator-superset/superset/utils.py", line 100, in __call__
    value = self.func(*args, **kwargs)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 655, in get_sqla_engine
    effective_username = self.get_effective_user(url, user_name)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 643, in get_effective_user
    hasattr(g, 'user') and hasattr(g.user, 'username') and
  File "/home/john_bodley/env/lib/python3.6/site-packages/werkzeug/local.py", line 347, in __getattr__
    return getattr(self._get_current_object(), name)
  File "/home/john_bodley/env/lib/python3.6/site-packages/werkzeug/local.py", line 306, in _get_current_object
    return self.__local()
  File "/home/john_bodley/env/lib/python3.6/site-packages/flask/globals.py", line 44, in _lookup_app_object
    raise RuntimeError(_app_ctx_err_msg)
RuntimeError: Working outside of application context.

@codecov-io
Copy link

codecov-io commented Mar 23, 2018

Codecov Report

Merging #4669 into master will increase coverage by <.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4669      +/-   ##
==========================================
+ Coverage   71.87%   71.88%   +<.01%     
==========================================
  Files         204      204              
  Lines       15323    15320       -3     
  Branches     1177     1177              
==========================================
- Hits        11014    11013       -1     
+ Misses       4306     4304       -2     
  Partials        3        3
Impacted Files Coverage Δ
superset/sql_lab.py 75.55% <ø> (+0.95%) ⬆️
superset/views/core.py 71.15% <66.66%> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68dec24...d41d747. Read the comment docs.

@timifasubaa
Copy link
Contributor

LGTM?

@john-bodley
Copy link
Member Author

@mistercrunch even though in sql_lab.py get_sqla_engine is called with the user_name parameter, the Jinja template issue we're seeing is when one is using presto.latest_partition which doesn't pass through the username, and from reading through the code this may neither be feasible nor non-trivial.

How do you feel about the change? Note I'm somewhat perplex how other portions of execute_sql work on a Celery worker like get_session which references app as I presume that would also be working outside of an app context.

@mistercrunch
Copy link
Member

mistercrunch commented Mar 27, 2018

It's a bit hard to read the PR since Github doesn't highlight the right lines at all here. Crazy how an indent completely confuses their diffing function.

But even with the Flask context manager you still won't get the user info I'm guessing as the user is not authenticated on the celery side of things. I guess it just fixes that error, but probably does not impersonate the right user deep down the call stack.

There should not be an app context at that point, so my thought is we have to clean up the reliance on g.user and pass it down the stack instead.

@mistercrunch
Copy link
Member

Seems like the template processor needs to be aware of the username and the calls in db_engine_specs should relay the username to calls that interact with the engine in models.py.

Passing it through gets pretty deep and nasty. Maybe mimicking the Flask app context isn't a bad idea, but I'm unclear on how to do that.

At first I thought "would it be crazy would the celery worker to call REST endpoints?" That's probably a very bad idea.

Now I'm thinking "how about doing the template rendering ahead of time in the web request and putting the rendered template in the db for the worker to run?".

@john-bodley
Copy link
Member Author

@mistercrunch I also raised the idea of pre-computing information on the Flask server which is inline with your last comment.

I sense the Celery worker should be app agnostic and simply run the query.

@mistercrunch
Copy link
Member

Agreed, it took me a moment to catch up with your thinking process. Had to go through similar reasoning.

For context at the time where the decision was made to process the template on the worker, there was no impersonation going on, and it somewhat made sense to do more work on the worker. At this point I think it makes sense to render the template on the web server. It shouldn't affect query time, only pending time may appear a little longer.

@john-bodley
Copy link
Member Author

john-bodley commented Mar 27, 2018

@mistercrunch how does this existing logic work , i.e., get_session is called by the Celery worker to get the DB session, using the configuration logic defined in app.config, yet the app context shouldn't be defined.

@john-bodley john-bodley force-pushed the john-bodley-jinja-fix branch 3 times, most recently from dea1f82 to bd0876e Compare March 28, 2018 00:56
@mistercrunch
Copy link
Member

mistercrunch commented Mar 28, 2018

The whole conf parsing depends on Flask's way of doing this. This predates SQL Lab and the Celery workers.

It could be nice to further dissociate the flask app from the celery worker, but currently they share their configuration which depends on Flask.

Is it a problem that the app context is defined though?

@john-bodley john-bodley changed the title [sqllab] Using app context for processing Jinja template in async mode [sqllab] Rendering query prior to sending to Celery Mar 28, 2018
@john-bodley
Copy link
Member Author

@mistercrunch I've updated the logic to pass through the rendered query to the Celery task.

@mistercrunch
Copy link
Member

It would be nice to have a unit test that would cover both presto.latest_partition along with impersonation.

@john-bodley
Copy link
Member Author

@mistercrunch would we need to mock the Presto connection? I haven't seen any Presto specific tests.

@john-bodley john-bodley force-pushed the john-bodley-jinja-fix branch 2 times, most recently from a30329a to 6d031f8 Compare March 28, 2018 03:20
@mistercrunch
Copy link
Member

Yes you're right there would be mocking involved. It might be tricky and out-of-scope for this PR here. Happy to merge as is.

@john-bodley
Copy link
Member Author

@mistercrunch are you ok with merging this PR? A number of our users are running into this issue.

Copy link

@graceguo-supercat graceguo-supercat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@graceguo-supercat graceguo-supercat merged commit d49a0e7 into apache:master Mar 29, 2018
john-bodley added a commit to john-bodley/superset that referenced this pull request Mar 29, 2018
john-bodley added a commit to john-bodley/superset that referenced this pull request Mar 30, 2018
john-bodley added a commit to john-bodley/superset that referenced this pull request Mar 30, 2018
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants