SubQueries Support #135

pantlavanya · 2018-07-19T13:03:31Z

Hello All,

How i can generate subqueries using pydruid because datasource field only take either str or list?
""" ValueError: Datasource definition not valid. Must be string or list of strings """

Below is the sample query. On which I am passing query output of 1st query to another query as datasource.

{
  "queryType": "groupBy",
  "dataSource":{
    "type": "query",
    "query": {
      "queryType": "groupBy",
      "dataSource": "druid_source",
      "granularity": {"type": "period", "period": "P1M"},
      "dimensions": ["source_dim"],
      "aggregations": [
        { "type": "doubleMax", "name": "value", "fieldName": "stream_value" }
      ],
      "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
    }
  },
  "granularity": "hour",
  "dimensions": ["source_dim"],
  "aggregations": [
    { "type": "longSum", "name": "outerquerryvalue", "fieldName": "value" }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
}

The text was updated successfully, but these errors were encountered:

mistercrunch · 2018-07-23T04:37:51Z

Not currently supported AFAICT

pantlavanya · 2018-10-17T10:50:52Z

Hi @mistercrunch,
Can I send a PR for this? I have find a way to do this.

mistercrunch · 2018-10-18T18:04:04Z

Please do!

pantlavanya · 2018-10-19T06:50:10Z

Hi @mistercrunch
Below is my PR. It may not be perfect, let me know you suggestions.

#139

pantlavanya · 2019-06-10T16:25:26Z

This issue is fixed. Below is the way you can define sub queries.

group = query.groupby(
    datasource=query.sub_query(datasource='twitterstream',
            granularity='hour',
            intervals='2018-01-01/2018-05-31',
            dimensions=["dim_key", "dim_key2"],
            filter=(Dimension('user_lang') == 'en') & (Dimension('user_name') == 'ram'),
            aggregations={"first_value": doublefirst("stream_value"),"last_value": doublelast("stream_value")},
            post_aggregations={'final_value': (HyperUniqueCardinality('last_value') - HyperUniqueCardinality('first_value'))}
    ),
    granularity='day',
    intervals='2018-01-01/2018-05-31',
    dimensions=["dim_key"],
    aggregations={"outer_final_value": doublesum("final_value")}
)

ashexpert · 2019-09-16T10:34:40Z

it still raise ValueError because query.sub_query return dict while in query.groupby check for the given datasource to be string or list of string. can you fix that too?

pantlavanya · 2019-09-16T11:02:46Z

Sure I can fix this, What your trying to do. Can you please give me an example?

pantlavanya · 2019-09-17T07:33:18Z

Hi @ashexpert,

I Understand your problem. Yes when you use sub_query, it return dict.
I will fix it.

I think your talking about this, If i am not wrong.

def parse_datasource(datasource, query_type):
    """
    Parse an input datasource object into valid dictionary

    Input can be a string, in which case it is simply returned, or a
    list, when it is turned into a UNION datasource.

    :param datasource: datasource parameter
    :param string query_type: query type
    :raise ValueError: if input is not string or list of strings or dict
    """
    if not (
        isinstance(datasource, six.string_types)
        or (
            isinstance(datasource, list)
            and all([isinstance(x, six.string_types) for x in datasource])
        )  or
        isinstance(datasource, dict)
    ):
        raise ValueError(
            "Datasource definition not valid. Must be string or dict or list of strings"
        )
    if isinstance(datasource, six.string_types):
        return datasource
    else:
        return {"type": "union", "dataSources": datasource}

pantlavanya · 2019-09-17T08:03:20Z

PR
#179

ashexpert · 2019-09-20T11:24:05Z

Hi @pantlavanya
yes that was exactly my problem
sorry i didn't answer but you figured it out by yourself

pantlavanya · 2019-09-20T11:25:19Z

Hi @mistercrunch
Can we get this merge. Thanks @ashexpert for pointing this.

#179

veerappans · 2022-09-01T13:19:56Z

This issue is fixed. Below is the way you can define sub queries.

group = query.groupby(
    datasource=query.sub_query(datasource='twitterstream',
            granularity='hour',
            intervals='2018-01-01/2018-05-31',
            dimensions=["dim_key", "dim_key2"],
            filter=(Dimension('user_lang') == 'en') & (Dimension('user_name') == 'ram'),
            aggregations={"first_value": doublefirst("stream_value"),"last_value": doublelast("stream_value")},
            post_aggregations={'final_value': (HyperUniqueCardinality('last_value') - HyperUniqueCardinality('first_value'))}
    ),
    granularity='day',
    intervals='2018-01-01/2018-05-31',
    dimensions=["dim_key"],
    aggregations={"outer_final_value": doublesum("final_value")}
)

This format does not work for me. Getting this error:
{"error":"Unknown exception","errorMessage":"Cannot deserialize instance of java.util.ArrayList<org.apache.druid.query.TableDataSource> out of START_OBJECT token\\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 73]

Can you please help.

veerappans · 2022-09-02T17:43:18Z

Its not forming the query right. Its adding extra union type which is making the syntax incorrect.

veerappans · 2022-09-08T21:15:04Z

@pantlavanya , can you please help. When I use the pydruid query format that you have given above, it is getting converted to native druid with an extra 'union' and druid query is failing. Would appreciate if you can help! Thanks.

pantlavanya closed this as completed Jun 10, 2019

pantlavanya reopened this Sep 20, 2019

pantlavanya closed this as completed May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SubQueries Support #135

SubQueries Support #135

pantlavanya commented Jul 19, 2018 •

edited by mistercrunch

Loading

mistercrunch commented Jul 23, 2018

pantlavanya commented Oct 17, 2018

mistercrunch commented Oct 18, 2018

pantlavanya commented Oct 19, 2018

pantlavanya commented Jun 10, 2019 •

edited

Loading

ashexpert commented Sep 16, 2019

pantlavanya commented Sep 16, 2019

pantlavanya commented Sep 17, 2019 •

edited

Loading

pantlavanya commented Sep 17, 2019

ashexpert commented Sep 20, 2019

pantlavanya commented Sep 20, 2019 •

edited

Loading

veerappans commented Sep 1, 2022 •

edited

Loading

veerappans commented Sep 2, 2022

veerappans commented Sep 8, 2022

SubQueries Support #135

SubQueries Support #135

Comments

pantlavanya commented Jul 19, 2018 • edited by mistercrunch Loading

mistercrunch commented Jul 23, 2018

pantlavanya commented Oct 17, 2018

mistercrunch commented Oct 18, 2018

pantlavanya commented Oct 19, 2018

pantlavanya commented Jun 10, 2019 • edited Loading

ashexpert commented Sep 16, 2019

pantlavanya commented Sep 16, 2019

pantlavanya commented Sep 17, 2019 • edited Loading

pantlavanya commented Sep 17, 2019

ashexpert commented Sep 20, 2019

pantlavanya commented Sep 20, 2019 • edited Loading

veerappans commented Sep 1, 2022 • edited Loading

veerappans commented Sep 2, 2022

veerappans commented Sep 8, 2022

pantlavanya commented Jul 19, 2018 •

edited by mistercrunch

Loading

pantlavanya commented Jun 10, 2019 •

edited

Loading

pantlavanya commented Sep 17, 2019 •

edited

Loading

pantlavanya commented Sep 20, 2019 •

edited

Loading

veerappans commented Sep 1, 2022 •

edited

Loading