-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WITH FILL improvement (PARTITION BY, POPULATE) #33203
Comments
AFAIS, in |
We may not have |
Not sure, I understand where from It doesn't eliminate of necessity of |
After some discussions, I think (0) Currently, a column WITH FILL is independent on the order of any other columns. It depends on starting value either in column or SELECT rowNumberInAllBlocks() + 1 AS row_number, * FROM values('sensor_id UInt64, timestamp DateTime64(3, \'UTC\'), value Float64', (234, '2021-12-01 00:00:01', 1), (234, '2021-12-01 00:00:07', 2), (432, '2021-12-01 00:00:02', 0), (432, '2021-12-01 00:00:10', 3)) ORDER BY sensor_id ASC, timestamp ASC WITH FILL STEP 2 INTERPOLATE ( sensor_id AS sensor_id, value AS value ) Query id: 2aadaa59-44d2-4e43-9429-a903bbd0fd5f ┌─row_number─┬─sensor_id─┬───────────────timestamp─┬─value─┐ │ 1 │ 234 │ 2021-12-01 00:00:01.000 │ 1 │ │ 0 │ 234 │ 2021-12-01 00:00:03.000 │ 1 │ │ 0 │ 234 │ 2021-12-01 00:00:05.000 │ 1 │ │ 2 │ 234 │ 2021-12-01 00:00:07.000 │ 2 │ │ 3 │ 432 │ 2021-12-01 00:00:02.000 │ 0 │ │ 0 │ 432 │ 2021-12-01 00:00:07.000 │ 0 │ │ 0 │ 432 │ 2021-12-01 00:00:09.000 │ 0 │ │ 4 │ 432 │ 2021-12-01 00:00:10.000 │ 3 │ └────────────┴───────────┴─────────────────────────┴───────┘ (1) Kinda special case of current behavior, which is probably unexpected if we consider grouping by sorting prefix.
In the example below, there is a filling (marked in bold) between SELECT rowNumberInAllBlocks() + 1 AS row_number, * FROM values('sensor_id UInt64, timestamp DateTime64(3, \'UTC\'), value Float64', (234, '2021-12-01 00:00:00', 1), (234, '2021-12-01 00:00:02', 2), (432, '2021-12-01 00:00:05', 0), (432, '2021-12-01 00:00:07', 3)) ORDER BY sensor_id ASC, timestamp ASC WITH FILL INTERPOLATE ( sensor_id AS sensor_id, value AS value ) ┌─row_number─┬─sensor_id─┬───────────────timestamp─┬─value─┐ │ 1 │ 234 │ 2021-12-01 00:00:00.000 │ 1 │ │ 0 │ 234 │ 2021-12-01 00:00:01.000 │ 1 │ │ 2 │ 234 │ 2021-12-01 00:00:02.000 │ 2 │ │ 0 │ 234 │ 2021-12-01 00:00:03.000 │ 2 │ │ 0 │ 234 │ 2021-12-01 00:00:04.000 │ 2 │ │ 3 │ 432 │ 2021-12-01 00:00:05.000 │ 0 │ │ 0 │ 432 │ 2021-12-01 00:00:06.000 │ 0 │ │ 4 │ 432 │ 2021-12-01 00:00:07.000 │ 3 │ └────────────┴───────────┴─────────────────────────┴───────┘ @alexey-milovidov Would you add some thoughts? |
+1 for this feature |
+1 |
@alexey-milovidov Are you able to provide any thoughts on this. It would be an EXTREMELY useful for our application - the lack of this feature is a pretty big blocker atm. |
@LukeHarrimanHQ Currently, we agreed to implement this proposal from here as default behavior (i.e. w/o Please let's know if it'd suffice for your use case |
+1 for this feature. I'm currently solving a similar use case as follows, but it would be super nice to have this feature to make things more straight forward. Create a utility table containing all hours for 30 years.
Populate hours table
Create a metrics table, this is our main data table
Insert some test data
Select multiple time-series and 0 fill missing.
|
Running into this- I had hoped to SELECT * FROM VALUES('id String, ts Date',
('aaa', '2023-04-20'),
('aaa', '2023-04-25'),
('bbb', '2023-04-22'),
('bbb', '2023-04-25')
) ORDER BY id, ts WITH FILL
|
Just for transparency, - some groundwork was done related to this issue: |
Thanks @devcrafter! Can't wait. |
@devcrafter Thanks for your implementation. I have a scenarios which is hard to achieve even with your fix. I need to do forward fill for some metrics along a timeline across multiple categories. Let me elaborate with an example. I would like to get ffilled output as below With your fix, I can get below output with fill step 1 However, in real-world, the interval between the adjacent ones could be huge, it's not feasible to fill with step 1, but other step size couldn't guarantine the later time slot is filled for each category. |
this filling is inconsistent - time values between first and second rows have step 2, and next steps are 3 - you can have with step 3 next: |
@yakov-olkhovskiy a real world example is trading stocks. I could open a position for a stock at any time, and reduce / increase / close each position later on, I want to get total position of all stocks along a timeline. |
@canopenerda I don't quite understand what you want to achieve with |
@yakov-olkhovskiy I just described a forward fill scenario which is not possible to achieve in clickhouse. If WITH FILL could interpolate along values present in the filled column instead of interpolate in lock step. |
@canopenerda we do have interpolate feature for other than filled columns (see |
I guess, this example would be more appreciative:
It works and it shows for each time real opened positions for this time - so you need WINDOW here. But I failed to found the way without setting cartesian with ugly unions combination. |
@yakov-olkhovskiy I'm aware of the INTERPOLATE expression and the problem is that WITH FILL only supports fixed STEP which would not works in the stock scenario I depicted. By the way, I'm not saying there is something wrong with current implemenation of WITH FILL and INTERPOLATE, I'm just facing a scenario where WITH FILL is the closest I can get within clickhouse. If you know any other workaround, really appreciate it. |
@weres-sa Thanks for your solution. It works if the number of stocks is relatively small and stable. |
@canopenerda we are positioning ClickHouse well suited for time series and we are always opened for suggestions for improvements, so if you have any - please step forward. The idea of not-fixed STEP sounds quite reasonable. |
Why not just?
|
Use case
Improve usability of WITH FILL.
Allow to utilize it with multiple time series.
Backfill some columns with previous value instead of default.
Describe the solution you'd like
Current behavior:
It's a bit unexpected, that WITH FILL produce different result set depends on sorting condition.
PARTITION BY expr
POPULATE column_name
POPULATE was implemented as INTERPOLATE function in that PR #35349
The text was updated successfully, but these errors were encountered: