Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix]: Subquery time filter logic for sqla datasources (partitions) #3892

Closed
wants to merge 1 commit into from

Conversation

fabianmenges
Copy link
Contributor

When you create a group-by query using the UI it will create a subquery which will contain the time filter of the outer query. If you configured Superset to automatically align queries to partitions (e.g. using time_secondary_columns of overriding append_timefilter of BaseEngineSpec in db_engine_spec.py) the inner time filter will not receive the partition alignment logic from the outer filter. This fixes this.

Example:

We are using append_timefilter to automatically add a filter on the partition column _PARTITIONTIME when we query for the column timestamp. This works great for the outer query but not for the subquery.
With this fix the time filter will get correctly passed down to the subquery.

Before

SELECT `accountName` AS accountName,
       timestamp AS __timestamp,
                    COUNT(*) AS count
FROM social_metrics_service.`FlattenedSocialMetricsRecords` AS `FlattenedSocialMetricsRecords_1`
JOIN
  (SELECT `accountName` AS accountName__,
          COUNT(*) AS mme_inner__
   FROM social_metrics_service.`FlattenedSocialMetricsRecords` AS `FlattenedSocialMetricsRecords_1`
   WHERE timestamp >= 1510185600000
     AND timestamp <= 1510866016000
   GROUP BY accountName__
   ORDER BY mme_inner__ DESC
   LIMIT 50) AS anon_1 ON `accountName` = `accountName__`
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-11-09') AND TIMESTAMP('2017-11-16')
  AND timestamp >= 1510185600000
  AND timestamp <= 1510866016000
GROUP BY accountName,
         __timestamp
ORDER BY count DESC
LIMIT 50000

After

Note that the subquery also has the _PARTITIONTIME included

SELECT `accountName` AS accountName,
       timestamp AS __timestamp,
                    COUNT(*) AS count
FROM social_metrics_service.`FlattenedSocialMetricsRecords` AS `FlattenedSocialMetricsRecords_1`
JOIN
  (SELECT `accountName` AS accountName__,
          COUNT(*) AS mme_inner__
   FROM social_metrics_service.`FlattenedSocialMetricsRecords` AS `FlattenedSocialMetricsRecords_1`
   WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-11-09') AND TIMESTAMP('2017-11-16')
     AND timestamp >= 1510185600000
     AND timestamp <= 1510865767000
   GROUP BY accountName__
   ORDER BY mme_inner__ DESC
   LIMIT 50) AS anon_1 ON `accountName` = `accountName__`
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-11-09') AND TIMESTAMP('2017-11-16')
  AND timestamp >= 1510185600000
  AND timestamp <= 1510865767000
GROUP BY accountName,
         __timestamp
ORDER BY count DESC
LIMIT 50000

…titions) is only applied to the outer not the inner query.
inner_to_dttm or to_dttm,
)
subq = subq.where(and_(*(where_clause_and + [inner_time_filter])))
subq = subq.where(and_(*(time_filters + where_clause_and)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner_time_filter logic is there to support the time_compare related logic so that the same series exist are used for both queries. Look for time_compare in viz.py to see how it's used on the caller side...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another look at this.

@kristw kristw added !deprecated-label:bug Deprecated label - Use #bug instead inactive Inactive for >= 30 days and removed !deprecated-label:bug Deprecated label - Use #bug instead labels Jan 23, 2019
@stale stale bot closed this Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
!deprecated-label:bug Deprecated label - Use #bug instead inactive Inactive for >= 30 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants