[sqllab] Rendering query prior to sending to Celery #4669

john-bodley · 2018-03-22T20:18:51Z

We're running the Celery workers via

celery worker --app=superset.sql_lab:celery_app

rather than

superset worker

which throws a RuntimeError: Working outside of application context. when trying to use templates of the form '{{ presto.latest_partition("schema.table") }}' which doesn't have access to the Flask g variable which is used to determine whether the user has access to said table.

Note I'm not certain what logic should reside in Celery and what should be pre-computed on the Flask server, or whether the Celery worker should even be Flask aware, but this PR simply enforces that the template processing occurs within an application context.

to: @michellethomas @mistercrunch @timifasubaa

mistercrunch · 2018-03-23T01:01:20Z

I thought there was custom logic already to not require g when in async mode (using the query's username field instead). What line is raising?

john-bodley · 2018-03-23T01:33:42Z

@mistercrunch here's the stack trace:

This typically means that you attempted to use functionality that needed
to interface with the current application object in a way.  To solve
this set up an application context with app.app_context().  See the
documentation for more information.
Traceback (most recent call last):
  File "/home/john_bodley/incubator-superset/superset/sql_lab.py", line 199, in execute_sql
    executed_sql, **tp)
  File "/home/john_bodley/incubator-superset/superset/jinja_context.py", line 109, in process_template
    return template.render(kwargs)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 1, in top-level template code
  File "/home/john_bodley/env/lib/python3.6/site-packages/jinja2/sandbox.py", line 427, in call
    return __context.call(__obj, *args, **kwargs)
  File "/home/john_bodley/incubator-superset/superset/jinja_context.py", line 129, in latest_partition
    table_name, schema, self.database)[1]
  File "/home/john_bodley/incubator-superset/superset/db_engine_specs.py", line 726, in latest_partition
    indexes = database.get_indexes(table_name, schema)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 813, in get_indexes
    return self.inspector.get_indexes(table_name, schema)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 738, in inspector
    engine = self.get_sqla_engine()
  File "/home/john_bodley/incubator-superset/superset/utils.py", line 100, in __call__
    value = self.func(*args, **kwargs)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 655, in get_sqla_engine
    effective_username = self.get_effective_user(url, user_name)
  File "/home/john_bodley/incubator-superset/superset/models/core.py", line 643, in get_effective_user
    hasattr(g, 'user') and hasattr(g.user, 'username') and
  File "/home/john_bodley/env/lib/python3.6/site-packages/werkzeug/local.py", line 347, in __getattr__
    return getattr(self._get_current_object(), name)
  File "/home/john_bodley/env/lib/python3.6/site-packages/werkzeug/local.py", line 306, in _get_current_object
    return self.__local()
  File "/home/john_bodley/env/lib/python3.6/site-packages/flask/globals.py", line 44, in _lookup_app_object
    raise RuntimeError(_app_ctx_err_msg)
RuntimeError: Working outside of application context.

codecov-io · 2018-03-23T01:52:37Z

Codecov Report

Merging #4669 into master will increase coverage by <.01%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master    #4669      +/-   ##
==========================================
+ Coverage   71.87%   71.88%   +<.01%     
==========================================
  Files         204      204              
  Lines       15323    15320       -3     
  Branches     1177     1177              
==========================================
- Hits        11014    11013       -1     
+ Misses       4306     4304       -2     
  Partials        3        3

Impacted Files	Coverage Δ
superset/sql_lab.py	`75.55% <ø> (+0.95%)`	⬆️
superset/views/core.py	`71.15% <66.66%> (-0.03%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68dec24...d41d747. Read the comment docs.

timifasubaa · 2018-03-24T00:27:44Z

LGTM?

john-bodley · 2018-03-26T17:03:05Z

@mistercrunch even though in sql_lab.py get_sqla_engine is called with the user_name parameter, the Jinja template issue we're seeing is when one is using presto.latest_partition which doesn't pass through the username, and from reading through the code this may neither be feasible nor non-trivial.

How do you feel about the change? Note I'm somewhat perplex how other portions of execute_sql work on a Celery worker like get_session which references app as I presume that would also be working outside of an app context.

mistercrunch · 2018-03-27T07:51:46Z

It's a bit hard to read the PR since Github doesn't highlight the right lines at all here. Crazy how an indent completely confuses their diffing function.

But even with the Flask context manager you still won't get the user info I'm guessing as the user is not authenticated on the celery side of things. I guess it just fixes that error, but probably does not impersonate the right user deep down the call stack.

There should not be an app context at that point, so my thought is we have to clean up the reliance on g.user and pass it down the stack instead.

mistercrunch · 2018-03-27T08:09:15Z

Seems like the template processor needs to be aware of the username and the calls in db_engine_specs should relay the username to calls that interact with the engine in models.py.

Passing it through gets pretty deep and nasty. Maybe mimicking the Flask app context isn't a bad idea, but I'm unclear on how to do that.

At first I thought "would it be crazy would the celery worker to call REST endpoints?" That's probably a very bad idea.

Now I'm thinking "how about doing the template rendering ahead of time in the web request and putting the rendered template in the db for the worker to run?".

john-bodley · 2018-03-27T14:36:23Z

@mistercrunch I also raised the idea of pre-computing information on the Flask server which is inline with your last comment.

I sense the Celery worker should be app agnostic and simply run the query.

mistercrunch · 2018-03-27T15:27:31Z

Agreed, it took me a moment to catch up with your thinking process. Had to go through similar reasoning.

For context at the time where the decision was made to process the template on the worker, there was no impersonation going on, and it somewhat made sense to do more work on the worker. At this point I think it makes sense to render the template on the web server. It shouldn't affect query time, only pending time may appear a little longer.

john-bodley · 2018-03-27T23:09:29Z

@mistercrunch how does this existing logic work , i.e., get_session is called by the Celery worker to get the DB session, using the configuration logic defined in app.config, yet the app context shouldn't be defined.

mistercrunch · 2018-03-28T00:56:59Z

The whole conf parsing depends on Flask's way of doing this. This predates SQL Lab and the Celery workers.

It could be nice to further dissociate the flask app from the celery worker, but currently they share their configuration which depends on Flask.

Is it a problem that the app context is defined though?

john-bodley · 2018-03-28T00:58:37Z

@mistercrunch I've updated the logic to pass through the rendered query to the Celery task.

mistercrunch · 2018-03-28T01:03:07Z

It would be nice to have a unit test that would cover both presto.latest_partition along with impersonation.

john-bodley · 2018-03-28T01:41:52Z

@mistercrunch would we need to mock the Presto connection? I haven't seen any Presto specific tests.

mistercrunch · 2018-03-29T00:47:23Z

Yes you're right there would be mocking involved. It might be tricky and out-of-scope for this PR here. Happy to merge as is.

john-bodley · 2018-03-29T17:44:46Z

@mistercrunch are you ok with merging this PR? A number of our users are running into this issue.

graceguo-supercat

LGTM

(cherry picked from commit d49a0e7)

john-bodley force-pushed the john-bodley-jinja-fix branch from b3329ca to b19c249 Compare March 23, 2018 01:06

john-bodley force-pushed the john-bodley-jinja-fix branch from b19c249 to 95bd382 Compare March 23, 2018 01:36

john-bodley force-pushed the john-bodley-jinja-fix branch 3 times, most recently from dea1f82 to bd0876e Compare March 28, 2018 00:56

john-bodley force-pushed the john-bodley-jinja-fix branch from bd0876e to 6fe2ea1 Compare March 28, 2018 00:57

john-bodley changed the title ~~[sqllab] Using app context for processing Jinja template in async mode~~ [sqllab] Rendering query prior to sending to Celery Mar 28, 2018

john-bodley force-pushed the john-bodley-jinja-fix branch 2 times, most recently from a30329a to 6d031f8 Compare March 28, 2018 03:20

[sqllab] Using app context for Celery task

d41d747

john-bodley force-pushed the john-bodley-jinja-fix branch from 6d031f8 to d41d747 Compare March 29, 2018 02:27

graceguo-supercat approved these changes Mar 29, 2018

View reviewed changes

graceguo-supercat merged commit d49a0e7 into apache:master Mar 29, 2018

john-bodley added a commit to john-bodley/superset that referenced this pull request Mar 29, 2018

[sqllab] Using app context for Celery task (apache#4669)

f4c9d8d

(cherry picked from commit d49a0e7)

john-bodley mentioned this pull request Mar 29, 2018

John bodley cherry pick 4669 4686 airbnb/superset-fork#31

Closed

john-bodley added a commit to john-bodley/superset that referenced this pull request Mar 30, 2018

[sqllab] Using app context for Celery task (apache#4669)

bb984c7

(cherry picked from commit d49a0e7)

john-bodley mentioned this pull request Mar 30, 2018

John bodley cherry pick 4669 4686 airbnb/superset-fork#32

Closed

john-bodley added a commit to john-bodley/superset that referenced this pull request Mar 30, 2018

[sqllab] Using app context for Celery task (apache#4669)

75a2bdf

(cherry picked from commit d49a0e7)

john-bodley mentioned this pull request Mar 30, 2018

John bodley cherry picks airbnb/superset-fork#33

Merged

michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018

[sqllab] Using app context for Celery task (apache#4669)

b2e248b

wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018

[sqllab] Using app context for Celery task (apache#4669)

0a1fb5e

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0 labels Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sqllab] Rendering query prior to sending to Celery #4669

[sqllab] Rendering query prior to sending to Celery #4669

john-bodley commented Mar 22, 2018 •

edited

mistercrunch commented Mar 23, 2018

john-bodley commented Mar 23, 2018 •

edited

codecov-io commented Mar 23, 2018 •

edited

timifasubaa commented Mar 24, 2018

john-bodley commented Mar 26, 2018

mistercrunch commented Mar 27, 2018 •

edited

mistercrunch commented Mar 27, 2018

john-bodley commented Mar 27, 2018

mistercrunch commented Mar 27, 2018

john-bodley commented Mar 27, 2018 •

edited

mistercrunch commented Mar 28, 2018 •

edited

john-bodley commented Mar 28, 2018

mistercrunch commented Mar 28, 2018

john-bodley commented Mar 28, 2018

mistercrunch commented Mar 29, 2018

john-bodley commented Mar 29, 2018

graceguo-supercat left a comment

[sqllab] Rendering query prior to sending to Celery #4669

[sqllab] Rendering query prior to sending to Celery #4669

Conversation

john-bodley commented Mar 22, 2018 • edited

mistercrunch commented Mar 23, 2018

john-bodley commented Mar 23, 2018 • edited

codecov-io commented Mar 23, 2018 • edited

Codecov Report

timifasubaa commented Mar 24, 2018

john-bodley commented Mar 26, 2018

mistercrunch commented Mar 27, 2018 • edited

mistercrunch commented Mar 27, 2018

john-bodley commented Mar 27, 2018

mistercrunch commented Mar 27, 2018

john-bodley commented Mar 27, 2018 • edited

mistercrunch commented Mar 28, 2018 • edited

john-bodley commented Mar 28, 2018

mistercrunch commented Mar 28, 2018

john-bodley commented Mar 28, 2018

mistercrunch commented Mar 29, 2018

john-bodley commented Mar 29, 2018

graceguo-supercat left a comment

Choose a reason for hiding this comment

john-bodley commented Mar 22, 2018 •

edited

john-bodley commented Mar 23, 2018 •

edited

codecov-io commented Mar 23, 2018 •

edited

mistercrunch commented Mar 27, 2018 •

edited

john-bodley commented Mar 27, 2018 •

edited

mistercrunch commented Mar 28, 2018 •

edited