New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot create aggregate without a GROUP BY #430
Comments
Workaround: Step 1: Create a dummy column: CREATE STREAM TWITTER2 AS SELECT 1 AS FOO,* FROM TWITTER; Step 2: Group by dummy column: SELECT FOO,COUNT(*) FROM TWITTER2 WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY FOO; Optionally, instantiate as a table and view window / aggregate: CREATE TABLE TWEETS_PER_HOUR AS SELECT FOO,COUNT(*) AS TWEET_COUNT FROM TWITTER2 WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY FOO;
SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS') AS WINDOW_START , TWEET_COUNT FROM TWEETS_PER_HOUR;
2017-11-01 15:00:00.000 | 165
2017-11-01 16:00:00.000 | 72
... |
My suggestion:
|
I'd strongly +1 adding workaround info to the error, if this can't be implemented quickly. Seasoned SQL hackers will be used to 'tricks' like this, but most others won't (and will totally expect KSQL to support this, and be disappointed if it doesn't). |
One of the issue is that we are dealing with partitioned data, so in order to make this work we'd need to take all the data from the partitions and pipe it into a single topic partition to perform the count etc operations. For some operations, i.e, |
After talking to @dguy offline, here's my understanding:
Given that we are busy with other work right now, I'd suggest:
|
WDYT @rmoff ? |
+1 Sounds like a good approach to me (better error message immediately with workaround, longer term provide the functionality OOTB). |
@miguno @hjafarpour If a verbose error message with the workaround is not an option, then I would advocate just giving a URL to this issue, so that people can find the workaround (and track progress on implementing the feature) for themselves. So instead of:
|
@dguy what's the issue with calculating average? Can not each partition track a I'm assuming this would require enhancements to KStreams, but I can't see any reason why it wouldn't work. (except issues with numerical overflows of course). |
It doesn't need anything on the streams side it is just another object you write to a topic that gets consumed and can be used to aggregate. It just isn't currently supported in KSQL |
Was wondering if there's any update on this issue after half a year? |
Will there be any update on this issue? I think it will be really useful to have this soon in KSQL. |
Hey @rmoff , just checking in on this. Is the work around still the solution to the problem? In that case, I can just send in a dummy constant value in my original topic and use that for group by in aggregates. |
/cc @MichaelDrogalis @derekjn ☝️ |
We don't have this one queued up yet, but hope to soon. Patches always welcome in the meantime. 🙏 |
Hi, I wanted to create a count or sum aggregation without group by and I found this workaround. Is it possible to receive default initial value at the beginning of each time window? Now it is not possible to process the absence of events. In case that there was no tweet within an hour (see above @rmoff s example) the subscriber will not be notified about no activity or won't be able to reset the aggregate to 0 in a UI text block. Example: In this case the UI would show the previous 22 message, because it is not aware about the start of the current or end of the previous time window. EDIT: Thank you Tomas |
Hello @rmoff / @mjsax what if I need to select a column but do not want to have it as part of GROUP BY. Example: |
KSQL insists on having a
GROUP BY
column, even if logically it makes sense without.For example: how many tweets were there in each hourly period?
The text was updated successfully, but these errors were encountered: