Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Add toStartOfInterval() function #1201

Closed
AlexMAS opened this issue Sep 5, 2017 · 5 comments
Closed

Improvement: Add toStartOfInterval() function #1201

AlexMAS opened this issue Sep 5, 2017 · 5 comments
Assignees

Comments

@AlexMAS
Copy link
Contributor

AlexMAS commented Sep 5, 2017

Hello!

ClickHouse has a good kind of functions like toStartOf*(time) which round down a time to the specified part (minute, hour, day and etc.). It's quite enough but it would be good if there were an universal function which could accept an interval (in seconds, for example).

toStartOfInterval(time, interval [, units])

where time - a DateTime column, interval - an integer time interval for rounding, units - an optional time unit (by default seconds, possible values: seconds, minutes, hours, days and etc.).

For example, the function toStartOfFiveMinute(time) could be represented as toStartOfInterval(time, 5, 'minutes'). And consequently we could group our data by user defined time interval.

I think it will be powerful and useful feature to analyze time-series data.

@AlexMAS
Copy link
Contributor Author

AlexMAS commented May 16, 2018

Hello again! :)

I would like to add one more example which illustrates what I want to see in the nearest future (I hope). TimeScale database offers time_bucket() function which works as described above. Using this function you can aggregate data for example for every 5 minutes:

SELECT
    time_bucket('5 minutes', time) AS period,
    avg(value) AS avgValue
FROM history
GROUP BY period

Finally, I expect behaviour like in the example below.

Table

create table history
(
    sensorId String,
    date Date default toDate(time),
    time DateTime,
    value Float64
)
ENGINE = MergeTree(date, (sensorId, date, time), 8192)

Request

SELECT
    toStartOfInterval(time, 5, 'minutes') AS period,
    avg(value) AS avgValue
FROM history
GROUP BY period
ORDER BY period DESC
LIMIT 10

Result

         period      | avgValue
---------------------+----------
 2018-05-01 12:00:00 |   72202
 2018-05-01 12:05:00 |  837725
 2018-05-01 12:10:00 |  412237
 2018-05-01 12:15:00 | 1173393
 2018-05-01 12:20:00 |   90104
 2018-05-01 12:25:00 |  784596
 2018-05-01 12:30:00 |  574134
 2018-05-01 12:35:00 |  960404
 2018-05-01 12:40:00 |  963104
 2018-05-01 12:45:00 |  160104

P.s. Now I need to do an aggregation for every 1, 5, 10, and etc. minutes and think about some workaround to do that. Maybe you have some ideas how to do it now? Also I'm interesting is it real to implement described functionality? If 'yes' then when we can expect this?

@filimonov
Copy link
Contributor

It's very easy to workaround.

Use that as a template

toDateTime(intDiv(toUnixTimestamp(timestamp), 300) * 300)

replace 300 with your time interval (in seconds).

But that question is quite popular, may be the shortcut function should be added.

@sundy-li
Copy link
Contributor

Maybe the function like roundTime( time, '5m' ) , I think it's much better than toStartOfInterval(time, 5, 'minutes')

@AlexMAS
Copy link
Contributor Author

AlexMAS commented May 16, 2018

@sundy-li I don't mind. I just used the existing 'naming convention'.

@vitlibar
Copy link
Member

Implemented syntax toStartOfInterval(time_or_data, INTERVAL x unit [, time_zone]) (see #4304 (comment) for examples).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants