Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for dataflow framework #3187

Open
12 of 19 tasks
discord9 opened this issue Jan 18, 2024 · 2 comments
Open
12 of 19 tasks

Tracking issue for dataflow framework #3187

discord9 opened this issue Jan 18, 2024 · 2 comments
Assignees
Labels
Tracking issue A tracking issue for a feature.
Milestone

Comments

@discord9
Copy link
Contributor

discord9 commented Jan 18, 2024

What problem does the new feature solve?

Being able to do simple continuous aggregation.

What does the feature do?

  • not a complete streaming-processing system. Only a must subset functionalities are provided.
  • can handle most aggregate operators within one table(i.e. Sum, avg, min, max and comparison operators). But others (join, trigger, txn etc.) are not the target feature.
    Framework

Implementation challenges

  • Persisent intermediate state for operator
  • Write more operator suitable for stream computation
  • Limit the scope of system boundary to simple aggregation.

Implementation Progress

@discord9 discord9 added the Tracking issue A tracking issue for a feature. label Feb 26, 2024
@fengjiachun fengjiachun added this to the v0.8 milestone Feb 28, 2024
@killme2008 killme2008 pinned this issue Mar 5, 2024
@tisonkun
Copy link
Contributor

tisonkun commented Mar 6, 2024

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):

CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)

Then we can use avg_over_5m as a normal table reference in query?

@discord9
Copy link
Contributor Author

discord9 commented Apr 8, 2024

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):

CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)

Then we can use avg_over_5m as a normal table reference in query?

Yes, sorry for the late reply, github's layout for issues is really terrible, this task also create a result table avg_over_5m and write to it with negligible delay, so naturally one can use avg_over_5m in normal query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tracking issue A tracking issue for a feature.
Projects
None yet
Development

No branches or pull requests

3 participants