-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DROP [STREAM|TABLE] should support termination of query started during creation. #2177
Comments
Duplicate of #1317? |
Related to #2112, which is more about a |
@big-andy-coates: Thanks for putting this together. At large, this make sense to me. Impact on dependencies?Could you please clarify the behavior for a stream's/table's dependencies:
Relation of this ticket to other ticketsAs you pointed out above, we also have the related ticket #2112 (~ My questions here:
I think your argument is that any "collateral damage" of this ticket (beyond terminating its own query from a CSAS/CTAS) should be limited to what I listed as (1) further above in this comment, whereas #2112 is more about (2)? Regarding existing RDBMS: What's their behavior for Syntax feedbackRegarding the syntax: I'd prefer an approach based on For what it's worth, here's what some RDBMS have chosen to implement. PostgreSQL (docs): Uses
MySQL (docs): Considers
SQL Server: From what I understand it doesn't support the DROP syntax directly. Instead, it says:
Note: There's also a related option, which is the |
I thought I'd made that pretty clear in my description - apparently not! In answer to Q1:
In answer to Q2:
Regarding Cascade drop:
Yeah,
I guess the closest thing an RDBMS will have is the cascade dropping of views built of a table. Which I guess is similar to having a hierarchy of streams and tables. So, like KSQL, they'll allow users to drop the leaves of the dependency graph, (.i.e. those things that have nothing reading from them), and will use a cascade drop to delete everything downstream of the entity being dropped. This is the same as (2). RDMBS's obviously don't have long running queries like KSQL, so (1) doesn't apply. Because (1) is all about queries, not tables/streams.
I read that differently: it is supported, but you need to start at the leaves first. i.e. if you have table b with a foreign key to table a, then you need to drop table b before you can drop table a. This is just referential integrity of the dbs metadata. |
Just having a chat with @hjafarpour about this and it struck me that this solution doesn't help when it comes to Affectively, what most people want to do in a script is drop and recreate on each statement. The above discussion focuses on drop and recreate of CSAS/CSAT statements. But won't work for So how does a user terminate and restart an insert into query? INSERT INTO D SELECT a, b, func(c) FROM S; ???
Maybe what we need is a way for users to query KSQL for query ids! SELECT QueryId FROM RUNNING_QUERIES WHERE SOURCE = 'S' AND DESTINATION = 'D' AND SQL CONTAINS 'Insert Into'; Then we can add the ability to select into a variable and/or subqueries: TERMINATE ALL (SELECT QueryId FROM RUNNING_QUERIES WHERE SOURCE = 'S' AND DESTINATION = 'D' AND SQL CONTAINS 'Insert Into'); Thoughts? |
Thanks for the follow-up, Andy.
Yeah. Handling And that's why then, in this ticket:
Oh, yes, of course you can do the manual work of first |
I think this is a valid concern, but: This ticket is about dropping tables/streams (CTAS/CSAS), and that we are making the case that the user should optionally be able to force the termination of any INSERT INTO queries that are writing into the table/stream. Your new concern above is about how users could stop/restart (drop and recreate) an INSERT INTO statement in isolation, i.e. without touching the target stream or table. Right? |
I'm just realising that any solution we come up with for CSAS / CTAS should also be generic enough to work with The only solution that I can think of right now that can handle CSAS and CTAS and INSERT INTO is some queryable metadata that allows users to get at the queryids. @hjafarpour @rodesai @dguy @apurvam @vcrfxia @rmoff can you think of any other statements we need to worry about? What's your opinion? |
Late to the convo but here're my thoughts:
I think it would be better to retain
This would still leave the stream object and make it a two-command process. I can't see a scenario where you'd want to do this, and not just the whole thing in one go
We should still support an option to drop a stream and all child dependencies. This would happen when redeploying a pipeline. So maybe we want Thinking about it, what's the reason not to automagically clean up queries if a drop is requested against a stream? I mean, should we default to "FORCE DROP" (i.e. drop a stream also terminates dependent queries)? Keep the simplest syntax for the most common operation, and then introduce a |
I think we should default to automatically terminating the implicit query created when the stream was created, when it is later dropped. However, if you're suggesting we should also terminate downstream queries, then I'd disagree - the reason we added referential integrity in the first place was to stop users dropping a query a down stream query relies on. |
What about the following flow then:
The one open question would then be how to handle any INSERT INTO statements for both (1) and (2), notably because for (2) the downstream streams/tables might have their own INSERT INTOs. |
It seems we have the following things to worry about when dropping a stream or table:
If any of the above exist then, at the moment, the So maybe it's enough to just have |
As I mentioned in my previous comment, I think Now, regarding UX, it's obvious that a How is this handled in RBDMS that support CASCADE? Is there any such safety measure? |
Oracle, for example, has a recycle bin concept.
I don't think we should be too nanny-state about this. DROP CASCADE should
do just that. Maybe we could have a dry run flag available for people who
are scared? ;) But mandating it feels like not such a nice workflow.
…On Mon, 10 Dec 2018 at 15:54, Michael G. Noll ***@***.***> wrote:
@big-andy-coates <https://github.com/big-andy-coates> :
So maybe it's enough to just have CASCADE handle both 2 & 3?
As I mentioned in my previous comment, I think CASCADE should certainly
cover (3), and my question was whether it should also cover (2).
Personally, I think CASCADE should cover (2) as well, and it should also
cover the recursive scenario where any downstream streams/tables have their
own INSERT INTOs.
Now, regarding UX, it's obvious that a DROP CASCADE is rather impactful
and potentially dangerous -- a human mistake can be costly. Should we
provide any functionality like a dry-run option to explain to the user what
exactly would happen if the DROP CASCADE would be carried out? Think:
"First, do a dry-run DROP CASCADE. If the results look good to you, you can
run the real DROP CASCADE."
How is this handled in RBDMS that support CASCADE? Is there any such
safety measure?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2177 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADgGHgKgDylDv9HK09X9aEyHWukZNJAxks5u3nWxgaJpZM4Ys8S_>
.
|
That's what I meant. A normal IMHO, the minimum work would be DROP CASCADE without such a dry-run option, and the dry-run option would be the extra. |
+1
…On Tue, 11 Dec 2018 at 08:41, Michael G. Noll ***@***.***> wrote:
Maybe we could have a dry run flag available for people who are scared? ;)
That's what I meant. A normal DROP CASCADE would "just do it". But
cautious people might want to have the option to do a dry run first (via an
optional flag).
IMHO, the minimum work would be DROP CASCADE without such a dry-run
option, and the dry-run option would be the extra.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2177 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADgGHmOPq9jnP_o_S9QKRJEe4Z2y805Yks5u32GlgaJpZM4Ys8S_>
.
|
Sounds sensible. What would the dry run syntax look like? DROP STREAM x CASCADE DRY RUN ??? |
|
#5987 looks to address this issue partially by terminating only the persistent query that belongs to the CAS statement. If other queries (inserts, cas, transient) are reading the stream/table then DROP will fail until any reference is terminated/dropped manually. Marking #5987 as duplicate. I will use this issue to track the work as seems it is most popular feature to implement. |
Why not add another optional |
@mingfang That would be useful too, but DROP STREAM is more natural and SQL standard than adding the The query running in the background for such stream/table is only used by the stream. A user shouldn't have to interact with the query. That's the idea of terminating the query automatically so users interact with the STREAM only (no queries). |
I know I'm pretty late to the party, have no clues about the internals, not much knowledge about ksql itself, but an outsider's opinion :). I've read almost all tickets & PRs about TERMINATE ALL, DROP ... CASCADE, INSERT vs. UNION, and I have a personal issue with DROP S/T while a query is still running, which brought me here in the first place. So please, bear with me if I miss something. IMHO you should take a step back and look from a higher level: The persistent query is not a first-class citizen in the KSQL language like streams or tables. I think this is the main reason why you have all these issues. We have CREATE STREAM ... Imagine the possibilities: CREATE QUERY name AS SELECT To me, an e.g. CREATE STREAM AS SELECT is nice as it consolidates 2 steps in one (create stream, create query) but I wouldn't bother so much about writing those two statements if I'd have to. Btw. I personally think that DROP ... CASCADE is a bad style (also in SQL) and a dinosaur. If you want to drop entire functional areas in a production database you would do this rather carefully and step by step. If you want to do this in your dev database you start it from scratch and apply your migrations. |
If I create a stream that starts a query in the background, e.g.
Then I should be able to drop this stream without needing to find the query Id and terminate the query. e.g. I should be able to do:
This currently complains, due to inter-stream referential integrity checking. (The query is seen as a source to stream 'S', so 'S' can't be dropped while the query is running).
The issue here is that the query id is non-deterministic, being based on an internal counter. So its not possible to submit a script to the rest api that drops and recreates a stream, as you can't add suitable
TERMINATE
call.The complexity comes about when we have
INSERT INTO S
queries running into the same stream/table. I think it is OK/correct in such a situation that theDROP STREAM S
to complain. However, we should have a way of overriding this so that even that query is stopped.I would propose that
DROP STREAM S
should automatically stop any query that was started as part of the originalCREATE STREAM S AS
statement. But the presence of other queries that insert into the stream should cause it to fail, unless the user requests us to stop them, e.g. we could support something like:-- use if you want any other source queries to automatically be stopped FORCE DROP STREAM S;
The
DROP
command will also fail if this stream is used as a source for other queries. This should continue to cause the DROP to fail.I guess an alternative would be some syntax to allow your to terminate a queries based on the stream/table they output too, e.g.
-- terminate all queries that output to stream S. TERMINATE ALL FOR STREAM S;
This approach differs from having a straight
DROP STREAM S
automatically stopping the query started by its CSAS statement in that with the aboveTERMINATE FOR STREAM
syntax the user can't choose to drop only if the only query running is the one auto created during the CSAS statement.I prefer the implicit terminate approach because it differentiates between those queries created during the CSAS/CTAS and those that weren't. So you can do this:
Straight create / drop example:
With insert into example:
With downstream dependencies example:
Ephemeral queries
Ephemeral / temporary queries, (e.g. from a user issuing a
SELECT * FROM BLAH;
statement in the CLI), should not be included when determining if an entity can be dropped, i.e. an ephemeral query should not stop an entity being dropped.To do so would be the same as not allowing a DB Admin of a RDBMS system to not be able to drop a table if any system was currently selecting from it.
Such ephemeral queries should terminate and, ideally, report a suitable message to the user.
The text was updated successfully, but these errors were encountered: