Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubQueries Support #135

Closed
pantlavanya opened this issue Jul 19, 2018 · 14 comments
Closed

SubQueries Support #135

pantlavanya opened this issue Jul 19, 2018 · 14 comments

Comments

@pantlavanya
Copy link
Contributor

pantlavanya commented Jul 19, 2018

Hello All,

How i can generate subqueries using pydruid because datasource field only take either str or list?
""" ValueError: Datasource definition not valid. Must be string or list of strings """

Below is the sample query. On which I am passing query output of 1st query to another query as datasource.

{
  "queryType": "groupBy",
  "dataSource":{
    "type": "query",
    "query": {
      "queryType": "groupBy",
      "dataSource": "druid_source",
      "granularity": {"type": "period", "period": "P1M"},
      "dimensions": ["source_dim"],
      "aggregations": [
        { "type": "doubleMax", "name": "value", "fieldName": "stream_value" }
      ],
      "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
    }
  },
  "granularity": "hour",
  "dimensions": ["source_dim"],
  "aggregations": [
    { "type": "longSum", "name": "outerquerryvalue", "fieldName": "value" }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2020-01-03T00:00:00.000" ]
}
@mistercrunch
Copy link
Member

Not currently supported AFAICT

@pantlavanya
Copy link
Contributor Author

Hi @mistercrunch,
Can I send a PR for this? I have find a way to do this.

@mistercrunch
Copy link
Member

Please do!

@pantlavanya
Copy link
Contributor Author

Hi @mistercrunch
Below is my PR. It may not be perfect, let me know you suggestions.

#139

@pantlavanya
Copy link
Contributor Author

pantlavanya commented Jun 10, 2019

This issue is fixed. Below is the way you can define sub queries.

group = query.groupby(
    datasource=query.sub_query(datasource='twitterstream',
            granularity='hour',
            intervals='2018-01-01/2018-05-31',
            dimensions=["dim_key", "dim_key2"],
            filter=(Dimension('user_lang') == 'en') & (Dimension('user_name') == 'ram'),
            aggregations={"first_value": doublefirst("stream_value"),"last_value": doublelast("stream_value")},
            post_aggregations={'final_value': (HyperUniqueCardinality('last_value') - HyperUniqueCardinality('first_value'))}
    ),
    granularity='day',
    intervals='2018-01-01/2018-05-31',
    dimensions=["dim_key"],
    aggregations={"outer_final_value": doublesum("final_value")}
)

@ashexpert
Copy link

it still raise ValueError because query.sub_query return dict while in query.groupby check for the given datasource to be string or list of string. can you fix that too?

@pantlavanya
Copy link
Contributor Author

Sure I can fix this, What your trying to do. Can you please give me an example?

@pantlavanya
Copy link
Contributor Author

pantlavanya commented Sep 17, 2019

Hi @ashexpert,

I Understand your problem. Yes when you use sub_query, it return dict.
I will fix it.

I think your talking about this, If i am not wrong.

def parse_datasource(datasource, query_type):
    """
    Parse an input datasource object into valid dictionary

    Input can be a string, in which case it is simply returned, or a
    list, when it is turned into a UNION datasource.

    :param datasource: datasource parameter
    :param string query_type: query type
    :raise ValueError: if input is not string or list of strings or dict
    """
    if not (
        isinstance(datasource, six.string_types)
        or (
            isinstance(datasource, list)
            and all([isinstance(x, six.string_types) for x in datasource])
        )  or
        isinstance(datasource, dict)
    ):
        raise ValueError(
            "Datasource definition not valid. Must be string or dict or list of strings"
        )
    if isinstance(datasource, six.string_types):
        return datasource
    else:
        return {"type": "union", "dataSources": datasource}

@pantlavanya
Copy link
Contributor Author

PR
#179

@ashexpert
Copy link

Hi @pantlavanya
yes that was exactly my problem
sorry i didn't answer but you figured it out by yourself

@pantlavanya
Copy link
Contributor Author

pantlavanya commented Sep 20, 2019

Hi @mistercrunch
Can we get this merge. Thanks @ashexpert for pointing this.

#179

@veerappans
Copy link

veerappans commented Sep 1, 2022

This issue is fixed. Below is the way you can define sub queries.

group = query.groupby(
    datasource=query.sub_query(datasource='twitterstream',
            granularity='hour',
            intervals='2018-01-01/2018-05-31',
            dimensions=["dim_key", "dim_key2"],
            filter=(Dimension('user_lang') == 'en') & (Dimension('user_name') == 'ram'),
            aggregations={"first_value": doublefirst("stream_value"),"last_value": doublelast("stream_value")},
            post_aggregations={'final_value': (HyperUniqueCardinality('last_value') - HyperUniqueCardinality('first_value'))}
    ),
    granularity='day',
    intervals='2018-01-01/2018-05-31',
    dimensions=["dim_key"],
    aggregations={"outer_final_value": doublesum("final_value")}
)

This format does not work for me. Getting this error:
{"error":"Unknown exception","errorMessage":"Cannot deserialize instance of java.util.ArrayList<org.apache.druid.query.TableDataSource> out of START_OBJECT token\\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 73]

Can you please help.

@veerappans
Copy link

Its not forming the query right. Its adding extra union type which is making the syntax incorrect.

@veerappans
Copy link

@pantlavanya , can you please help. When I use the pydruid query format that you have given above, it is getting converted to native druid with an extra 'union' and druid query is failing. Would appreciate if you can help! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants