-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limits of aggregate functions when used with windows #2614
Comments
maybe @hawkfish can have a look? |
There is a limit of 1024 (the standard vector size) right now for windowed aggregation that does not use segment trees. We can remove this (and I've been meaning to!), but it would require some work to avoid excess copying. |
@the42 I think I have a way to do this, but it would help if I could have a rough idea of the data. Nothing important, but just the statistics of the |
I've made a change that removes this limitation for non-segment tree aggregation, and it seems to improve performance in the existing benchmarks for fixed frames. Variable seems about the same. |
@hawkfish that sounds great. Will this change be generally available possibly in the next public release? As your performance test doesn't show any adverse side-effects I guess you no longer need some stats about the data I want to analyze. |
I expect it will be in the next release. I have generated some data that I think looks like yours, but I don't think it matters much. It looks like you have about 15 minutes worth of timestamp data and want the moving median of a column with small integer values? |
select timestamp
, median(laufzeit) over w
, count(*) over w
from flog
window w as (order by timestamp asc range between interval 55 seconds preceding and interval 58 seconds following)
order by 3 desc
limit 1;
----
2020-10-15 16:53:22 51.000000 3403 |
Make the bias argument explicit so the caller can optimise data usage.
Switch to using and appending chunks.
Add wide frame test and fix reference/copy transition.
Yeah, I abbreviated somehow my report as I originally did a min(laufzeit), quantile_cont(laufzeit, [0.25, 0.5, 0.75]), max(laufzeit) to mimic a running boxplot. The window function should dampen server spikes which results in longer runtime (=duration to fulfill a request = laufzeit) yet a prolonged excessive load of the server needs inspection. I have yet to figure out what plotting lib will give me a nice visualization, vega, shiny R .... BTW the performance is fantastic! And my dataset even spans a longer period of time, yet with bigger gaps in the data, therefore the range between interval comes in handy. |
Move coverage test to coverage suite.
This should be fixed now, correct? Can this be closed? |
Yes. |
Using v0.3.1 88aa81c 64 bit on Windows 10.
I have a table like
it contains ~ 26000 rows.
When I run
select median(laufzeit) from flog
the query succeeds.When I run
the query succeeds:
When I change the interval so that the number of rows within a frame would exceed 1000 (my guess), I get the following error:
Error: INTERNAL Error: INTERNAL Error Cannot compute window aggregation: bounds are to large
This errors for the
median
orquantile_cont
aggregate, but works eg. foravg
.Issue
1000 seems like an arbitrary and low limit. Can this limit be raised?
And if I am doing sthg. wrong how can I calculate the running median of Laufzeit = duration?
The text was updated successfully, but these errors were encountered: