Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing the concept of terminating a query #3967

Open
big-andy-coates opened this issue Nov 23, 2019 · 7 comments
Open

Consider removing the concept of terminating a query #3967

big-andy-coates opened this issue Nov 23, 2019 · 7 comments
Projects

Comments

@big-andy-coates
Copy link
Contributor

With reference to the PR that introduced TERMINATE ALL syntax.

We might look to remove the concept of terminating queries at all. We offer no way of restarting a terminated query, so why off a way to terminate?

If we choose to introduce a way to restart a query, then terminating actually means something. Restarting would be useful for re-kicking a failed query. However, there are probably better ways of handling failed queries. After all, a traditional db does not expose the state of the processing used to build a materialized view.

Removing terminate would address issues such as:

@big-andy-coates big-andy-coates added this to the 6.0 milestone Nov 23, 2019
@big-andy-coates big-andy-coates added this to To Do (P0) in ksqlDB v1.0 via automation Nov 23, 2019
@agavra
Copy link
Contributor

agavra commented Nov 25, 2019

I think terminates are useful in the cases of long-running INSERT INTO. You don't necessarily want to drop the source, but you want the INSERT INTO to stop populating into it.

@big-andy-coates big-andy-coates modified the milestones: 6.0, 0.7.0 Dec 2, 2019
@big-andy-coates
Copy link
Contributor Author

Thinking on this more, I don't think we need to explicitly terminate a query. Our CREATE TABLE AS SELECT style statements are akin to materialized views in the rdbs world. In the rdbs there is no concept of a persistent query exposed to the user. Instead, if you create a MV, when you drop the MV any 'process' in the background that is updating the MV is automatically stopped.

The only thorn in our side is the INSERT INTO query which, as @almog points out, may still benefit from TERMINATE. However, there is an alternative... we remove INSERT INTO!

INSERT INTO is the black sheep of the family. It outputs a persistent query to an existing sink that's been created some other way. It was added to allow multiple queries to be started that all write to the same sink topic. However, I think this would better be represented using a SQL UNION, or more correctly a UNION ALL, e.g.

CREATE STREAM OUTPUT (...) WITH (...);
INSERT INTO OUTPUT SELECT * FROM SOURCE1 ...;
INSERT INTO OUTPUT SELECT * FROM SOURCE2 ...;

Becomes:

CREATE STREAM OUTPUT AS 
   SELECT * FROM SOURCE1 ...
   UNION ALL
   SELECT * FROM SOURCE2 ..
   ;

Which once again brings us to a 1-2-1 relationship between persistent query and MV. So now when the user drops OUTPUT we can stop the persistent query, and we no longer need TERMINATE.

@agavra agavra moved this from To Do (P0) to To Do in ksqlDB v1.0 Dec 3, 2019
@PeterLindner
Copy link

An added benefit of UNION ALL would be deterministic output ordering with flow control.

In contrast the record ordering with INSERT INTO depends on the starting point of the queries and the speed of the consumers

@derekjn
Copy link
Contributor

derekjn commented Dec 3, 2019

+1 from me on removing TERMINATE.

@agavra raises a good point about TERMINATE being useful for long-running INSERT statements, although I'm not sure that this is something that belongs in the syntax. This is more of an operational function, which I think are best served by function calls on catalog data. For example, here's how you can kill a query in Postgres:

-- pg_stat_activity tracks queries that are currently running
SELECT pg_terminate_backend(pid) FROM pg_stat_activity;

@rodesai
Copy link
Contributor

rodesai commented Dec 6, 2019

Another possible use case for terminate would be to leave a table around for pull queries (I know we can't query tables directly yet, but surely we plan to at some point), but terminate the queries that populate it (e.g. it's like a snapshot table).

@vcrfxia
Copy link
Contributor

vcrfxia commented Dec 12, 2019

Do other streaming systems not support some form of INSERT INTO? I'm surprised since without it the graph of relationships between sources (streams/tables) must always be acyclic and I imagine there are use cases where having some sort of cyclic control flow makes sense. (Will have to think harder to come up with a concrete example, will report back if/when I do.)

@agavra
Copy link
Contributor

agavra commented Dec 12, 2019

Another possible use case for terminate would be to leave a table around for pull queries (I know we can't query tables directly yet, but surely we plan to at some point), but terminate the queries that populate it (e.g. it's like a snapshot table).

I was thinking about this and I think we can handle this with a query upgrade. We simply upgrade the source to have no query associated with it (but keep the DDL).

I imagine there are use cases where having some sort of cyclic control flow makes sense

That's pretty trippy @vcrfxia - let me know if you come up with anything!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
ksqlDB v1.0
  
To Do
Development

No branches or pull requests

6 participants