Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Druid SQL query examples for the Stats aggregator Native Queries #16277

Merged

Conversation

nrao57
Copy link
Contributor

@nrao57 nrao57 commented Apr 12, 2024

Adds Druid SQL query examples for the Stats aggregator Native Queries

Fixes #13148.

Description

This PR adds Druid SQL query examples for the Stats aggregator documents page. Currently, there are only examples for the Native queries.

This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.

Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @nrao57! I left some suggestions.

@@ -125,8 +126,24 @@ To acquire standard deviation from variance, user can use "stddev" post aggregat
}
```

#### Druid SQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to move the SQL query section before the native query section since SQL is more widely used and familiar to users. Same suggestion for all the SQL sections in this page.

#### Druid SQL

```SQL
There is no equivalent SQL for this query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this text outside the sql codeblock?

alias,
VARIANCE("index") AS index_var
FROM "testing"
WHERE TIME_IN_INTERVAL(__time, '2016-03-06T00:00:00/2016-03-06T23:59:59')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End date is exclusive, so I would think this should be start of the next day? Same for the native query:

Suggested change
WHERE TIME_IN_INTERVAL(__time, '2016-03-06T00:00:00/2016-03-06T23:59:59')
WHERE TIME_IN_INTERVAL(__time, '2016-03-06/2016-03-07')

Comment on lines 132 to 140
SELECT
DATE_TRUNC('day', __time),
VARIANCE("index_var")
FROM
"testing"
WHERE
TIME_IN_INTERVAL(__time, '2013-03-01T00:00:00.000/2016-03-20T00:00:00.000')
GROUP BY
DATE_TRUNC('day', __time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small simplification:

Suggested change
SELECT
DATE_TRUNC('day', __time),
VARIANCE("index_var")
FROM
"testing"
WHERE
TIME_IN_INTERVAL(__time, '2013-03-01T00:00:00.000/2016-03-20T00:00:00.000')
GROUP BY
DATE_TRUNC('day', __time)
SELECT
DATE_TRUNC('day', __time),
VARIANCE("index_var") AS index_var
FROM "testing"
WHERE TIME_IN_INTERVAL(__time, '2013-03-01/2016-03-20')
GROUP BY 1

@nrao57
Copy link
Contributor Author

nrao57 commented Apr 12, 2024

Hi @abhishekrb19!

I have updated the Pull Request to reflect your suggestions. Please let me know if there is anything else.


```SQL
SELECT
DATE_TRUNC('day', __time),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed now that it's moved up?

#### Druid SQL

```SQL
SELECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@nrao57
Copy link
Contributor Author

nrao57 commented Apr 12, 2024

My mistake 😅
I removed the old duplicate sections

Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments. Thanks for the updates @nrao57!

]
}
```

### TopN query

#### Druid SQL

There is no equivalent SQL for this query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, it's possible to have a topN query -- the groupBy query below with an ORDER BY on one dimension and a LIMIT clause should plan the query as a topN. You could verify this by running EXPLAIN PLAN FOR

@@ -107,6 +107,18 @@ To acquire standard deviation from variance, user can use "stddev" post aggregat

### Timeseries query

#### Druid SQL

```SQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also be nice to include a mention of the SQL functions VARIANCE and STDDEV in the above section where it provides the native JSON syntaxes

@nrao57
Copy link
Contributor Author

nrao57 commented Apr 13, 2024

I added the TopN Druid SQL query and a table describing the sql variance and stdev functions.
Let me know if anything needs to change.

docs/development/extensions-core/stats.md Outdated Show resolved Hide resolved
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
@abhishekrb19 abhishekrb19 merged commit a805c56 into apache:master Apr 15, 2024
8 checks passed
@nrao57 nrao57 deleted the issue-13148-add-stat-functions-examples branch April 15, 2024 16:57
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add examples to stats functions for Druid SQL
4 participants