Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postagg Theta Sketch Bug #2736

Closed
3 tasks done
joshwalters opened this issue May 9, 2017 · 6 comments
Closed
3 tasks done

Postagg Theta Sketch Bug #2736

joshwalters opened this issue May 9, 2017 · 6 comments

Comments

@joshwalters
Copy link
Contributor

joshwalters commented May 9, 2017

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.17.4

Expected results

Using a Theta sketch in a post agg should be valid, as it works for HyperUnique columns.

For example, if I have a longSum column A and a Theta sketch column B, I should be able to have a postagg metric that divides A by B.

A simple example JSON Druid query generated from a table view slice in Superset should generate a query like this:

{
  "aggregations": [
    {
      "fieldName": "A", 
      "type": "longSum", 
      "name": "A"
    },
    {
      "type": "thetaSketch", 
      "fieldName": "B", 
      "name": "B"
    }
  ], 
  "intervals": "2017-05-02T00:10:41+00:00/2017-05-09T00:10:41+00:00", 
  "dataSource": "table", 
  "granularity": "all", 
  "postAggregations": [
    {
      "type": "arithmetic", 
      "name": "A_DIV_B", 
      "fn": "/",
      "fields": [
        {
          "fieldName": "A", 
          "type": "fieldAccess", 
          "name": "A"
        }, 
        {
          "type": "thetaSketchEstimate", 
          "name": "B",
          "field": {
            "type": "fieldAccess", 
            "fieldName": "B", 
            "name": "B"
          }
        }
      ]
    }
  ], 
  "queryType": "timeseries"
}

Actual results

Using a Theta sketch in a postagg fails with a HTTP Error 500: Internal Server Error Druid Error: Unknown failure. This happens because the JSON query generated by Superset is invalid, it does not include the Theta sketch column in the aggregations field.

Here is the generated query (note that the Theta sketch column is missing in aggregations):

{
    "aggregations": [
        {
            "fieldName": "A", 
            "type": "longSum", 
            "name": "A"
        }
    ], 
    "intervals": "2017-05-02T00:03:45+00:00/2017-05-09T00:03:45+00:00", 
    "dataSource": "table", 
    "granularity": "all", 
    "postAggregations": [
        {
            "type": "arithmetic", 
            "name": "A_DIV_B", 
            "fn": "/",
            "fields": [
                {
                    "fieldName": "A", 
                    "type": "fieldAccess", 
                    "name": "A"
                }, 
                {
                    "type": "thetaSketchEstimate", 
                    "name": "B",
                    "field": {
                        "type": "fieldAccess", 
                        "fieldName": "B", 
                        "name": "B"
                    }
                }
            ]
        }
    ], 
    "queryType": "timeseries"
}

I believe the error is in this file: https://github.com/airbnb/superset/blob/master/superset/connectors/druid/models.py

It uses hyperUniqueCardinality but not thetaSketchEstimate.

Steps to reproduce

Have a Druid cluster with a Theta sketch column.

Add a postagg to divide some metric by the Theta sketch metric.

In the Edit Druid Datasource, you can create a new metric. Using postagg as the metric you can set the JSON as follows:

{
  "type"  : "arithmetic",
  "name"  : "A_DIV_B",
  "fn"    : "/",
  "fields": [
    {
           "type": "fieldAccess",
           "name": "A",
           "fieldName": "A"
    },
    {
            "type": "thetaSketchEstimate",
             "name": "B",
              "field": 
                {
                         "fieldName": "B",
                         "type": "fieldAccess",
                         "name": "B"
                  }
        }
   ]
}

Use a Table View slice on this new metric, and you will see a HTTP 500 error.

Issue discovered and investigated by: @jgodlew @ananthanithya

@ratb3rt
Copy link

ratb3rt commented May 10, 2017

This first needs pydruid to be updated to support theta sketches (druid-io/pydruid#72) and then support can be added in superset.

@kkalyan
Copy link
Contributor

kkalyan commented May 15, 2017

@joshwalters workaround until the fix is to include the aggregation required for the post aggregation as one of the metrics.

@joshwalters
Copy link
Contributor Author

@kkalyan Yes, that does work. It messes up the charts a bit (as we get a metric we don't want), but it does enable the post-agg to process.

@mistercrunch
Copy link
Member

Yes that's a bug as far as I'm concerned and needs to be improved.

@joshwalters
Copy link
Contributor Author

Just reviewed the code, @RichRadics fix (druid-io/pydruid#72) is needed to correct this issue in Superset. Once that is merged in, adding the fix to Superset should be a ~4 line change. If anyone is interested in seeing this fix, please comment in druid-io/pydruid#72

@mistercrunch
Copy link
Member

Notice: this issue has been closed because it has been inactive for 335 days. Feel free to comment and request for this issue to be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants