Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] Fixing regression introduced in #4396 #4500

Merged
merged 1 commit into from
Mar 6, 2018

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Feb 28, 2018

This PR fixes a regression introduced in #4396 which returned the error No data even when the query was unsuccessful for other reasons. This PR also refactors the No data logic (previously defined in two places) and also registered that the data was loaded if the query didn't fail. Note it seems like BaseViz.get_df(...) may never throw an exception give this (and thus I'm uncertain whether this logic is needed) hence the additional check that the query succeeded is necessary to confirm that the data was loaded.

@mistercrunch from our offline conversation I found this difficult to write specific test cases, i.e., we discovered this regression in Presto when the underlying table metadata had changed either a column or the table no-longer existed. When I tried to mock this behavior, I was able to exercise the issue as in SQLite the inconsistency was discovered whilst the query was being compiled rather than during query execution.

to: @mistercrunch
cc: @graceguo-supercat @michellethomas @timifasubaa

@@ -327,8 +322,9 @@ def get_df_payload(self, query_obj=None):
if query_obj and not is_loaded:
try:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/except block may no longer be required per the PR description. This would make the stacktrace obsolete as well.

superset/viz.py Outdated
if self.status != utils.QueryStatus.FAILED:
if df is None or df.empty:
raise Exception('No data')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note I'm not certain why we need to throw an exception here, as previously the error would be present in error_message.

@mistercrunch
Copy link
Member

I feel like we need some unit tests here making sure errors bubble up somehow. It should be easy to generate a No Data exception with a filter. For the other test case that is more representative (currently says No Data but should say something else) you could reference a metric that does not exist in the query, and change the query method to make sure it actually raises a specific message Metric referenced does not exist anymore and make sure it comes through.

@john-bodley
Copy link
Member Author

@mistercrunch for the later simply referencing an invalid metric doesn't actually cover the specific case we discovered as the compiler detects this, i.e., prior to this fix it wouldn't return No data. I can look more into this.

@john-bodley
Copy link
Member Author

@mistercrunch I've updated the logic and added a couple of unit tests which test the two scenarios.

@@ -151,9 +151,6 @@ def get_df(self, query_obj=None):
# If the datetime format is unix, the parse will use the corresponding
# parsing logic.
if df is None or df.empty:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the df.empty check is only required by the test_get_df_returns_empty_df unit test due to the mocking. Otherwise it would be safe to proceed with an empty pd.DataFrame.

if self.status != utils.QueryStatus.FAILED:
payload['data'] = self.get_data(df)
if df is None or df.empty:
payload['error'] = 'No data'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more consistent to log the payload error as opposed to throwing an exception here.

@@ -612,7 +611,7 @@ def query_obj(self):
return None

def get_df(self, query_obj=None):
return None
return pd.DataFrame()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for consistency, i.e. get_df(...) should return a pd.DataFrame().

@john-bodley john-bodley changed the title [payload] Fixing regression introduced in ##4396 [bugfix] Fixing regression introduced in ##4396 Mar 1, 2018
@john-bodley john-bodley changed the title [bugfix] Fixing regression introduced in ##4396 [bugfix] Fixing regression introduced in #4396 Mar 1, 2018
@john-bodley
Copy link
Member Author

@mistercrunch would you mind taking another look at this? I made a few small tweaks after adding the addition of a couple of unit tests.

@michellethomas
Copy link
Contributor

lgtm

@john-bodley john-bodley merged commit 48430a1 into apache:master Mar 6, 2018
@john-bodley john-bodley deleted the john-bodley-fix-pr-4396 branch March 6, 2018 00:51
@mistercrunch
Copy link
Member

Sorry for the delay, was traveling last week with limited attention. Thanks @michellethomas for merging it!

@mistercrunch
Copy link
Member

mistercrunch commented Mar 6, 2018

I think this broke the brittle markup viz in trunk. Has to do with the fact that the payload gets a error key that says No data and the frontend then refuses to render the viz.

@john-bodley
Copy link
Member Author

Sorry @mistercrunch. I think I may have a fix and will add an additional unit test for the markup test. I think the solution should be:

class MarkupViz(BaseViz):
    def get_df(self, query_obj=None):
       return None  # like before

and

def get_payload(self, query_obj=None):
    ...
    if df is not None and df.empty:  # previously if df is None or df.empty
        payload['error'] = 'No data'

@mistercrunch
Copy link
Member

Sounds about right, this area is pretty brittle around the no-query special cases...

john-bodley pushed a commit to john-bodley/superset that referenced this pull request Mar 6, 2018
mistercrunch pushed a commit that referenced this pull request Mar 7, 2018
john-bodley added a commit to john-bodley/superset that referenced this pull request Mar 7, 2018
mistercrunch pushed a commit to lyft/incubator-superset that referenced this pull request Mar 14, 2018
hughhhh pushed a commit to lyft/incubator-superset that referenced this pull request Apr 1, 2018
* Cherry pick apache#4581

* Add flask-compress cherry

* Add shortner fix

* Add Return __time in Druid scan apache#4504

* Picking cherry Fixing regression from apache#4500 (apache#4549)

* [bugfix] SQL Lab 'MySQL has gone away'

It appears the 'MySQL has gone away' is triggered by the line of code
I wrapped in a try block here.

This is a temporary fix, there will be another PR shortly getting to the
bottom of this.

Related:
https://github.com/lyft/druidstream/issues/40
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants