Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more examples needed for druid sql #6108

Closed
pdeva opened this issue Aug 5, 2018 · 10 comments
Closed

more examples needed for druid sql #6108

pdeva opened this issue Aug 5, 2018 · 10 comments

Comments

@pdeva
Copy link
Contributor

pdeva commented Aug 5, 2018

druid sql is very highly under documented to the point of being unsuable.
at the very least, there should be example queries showing:

  1. how to select a timerange, eg can you do something as simple as Interval(begin, end) or is a full begin>= x and end <=y needed every time
  2. how to specify granularity
  3. how to select time series data.
  4. edges cases like what output is expected if data is missing in between
  5. explaination of generated json, like 'virtual columns' that are completely undocumented.
@gianm
Copy link
Contributor

gianm commented Aug 5, 2018

Hi @pdeva, the answers to your questions are,

  1. A time filter looks like __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'. This syntax adheres to the SQL standard.
  2. If you group by a TIME_FLOOR function, that's like specifying granularity (this is mentioned in the docs under "Query execution" where it says we'll use timeseries if we can for this function).
  3. Not sure what you mean by this?
  4. Not sure what you mean by this?
  5. They are undocumented, but they won't be for long. http://druid.io/docs/latest/misc/math-expr.html might help you understand what they're doing.

I agree an examples section would be nice; @jon-wei was working on some new tutorials for querying, maybe he can chime in with whether he had planned on adding SQL examples.

@jon-wei
Copy link
Contributor

jon-wei commented Aug 5, 2018

maybe he can chime in with whether he had planned on adding SQL examples.

I have one example query for "wikipedia top pages" in the new tutorials (the idea was more to show the tools/workflow vs being a tutorial on expressing queries in SQL, do you have any suggestions for common queries that would be good examples?

@pdeva
Copy link
Contributor Author

pdeva commented Aug 5, 2018

@gianm clarifying:
3. show atleast one query that selects data that is not a singular value but output as a time series. in all existing slides, examples i could find all queries output a singular value, vs something that would be shown on a graph as time series data.

  1. say a query for time series data between 1am and 10 am, grouped hourly. and assume there are no segments for time period 10am-11am. will the output contain a result for 10-11 with a ‘value’ of 0 or will 10-11 period be simply missing from output. this is an important edge case to know about when showing data in graph.

@gianm
Copy link
Contributor

gianm commented Aug 5, 2018

(3) sounds like it's really the same question as (2), and TIME_FLOOR is the answer. (4) adheres to standard SQL: result rows are only emitted where there is actual data. So there would be no row for 10am–11am.

@gianm
Copy link
Contributor

gianm commented Aug 5, 2018

I have one example query for "wikipedia top pages" in the new tutorials (the idea was more to show the tools/workflow vs being a tutorial on expressing queries in SQL, do you have any suggestions for common queries that would be good examples?

Maybe four queries: one that plans to a scan, one that plans to a timeseries, one that plans to a topN, and one that plans to a groupBy? And link to that from the SQL docs so people know where to go to find examples?

@pdeva
Copy link
Contributor Author

pdeva commented Aug 6, 2018

@gianm regarding 4. is the behavior of native queries (topn, timeseries, groupby) the same as sql for this case? in that will they omit 10-11 from the results too?

@gianm
Copy link
Contributor

gianm commented Aug 6, 2018

@gianm regarding 4. is the behavior of native queries (topn, timeseries, groupby) the same as sql for this case? in that will they omit 10-11 from the results too?

topN and groupBy will; for timeseries it depends on the skipEmptyBuckets setting.

@jon-wei
Copy link
Contributor

jon-wei commented Aug 14, 2018

The new query tutorial has more SQL examples:

http://druid.io/docs/latest/tutorials/tutorial-query.html (0.12.2/0.12.3)
https://staging-druid.imply.io/docs/latest/tutorials/tutorial-query.html (master)

explaination of generated json, like 'virtual columns' that are completely undocumented.

Virtual columns are now documented: http://druid.io/docs/latest/querying/virtual-columns.html

@vogievetsky
Copy link
Contributor

The docs have been tremendously updated for this 0.16.0 release.

Furthermore the web console will not WRITE QUERIES FOR YOU 🚀

image

I am considering this 'solved'. If you have some more specific ideas for how to improve the docs please do not hesitate to file a new issue ❤️

@WyattQi
Copy link

WyattQi commented Apr 26, 2020

druid __time how to transfer datetime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants