Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: don't allow multiple changefeeds with the same sink #30439

Closed
rolandcrosby opened this issue Sep 19, 2018 · 3 comments
Closed
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc

Comments

@rolandcrosby
Copy link

rolandcrosby commented Sep 19, 2018

Right now, you can create multiple changefeeds that output data to the same Kafka topic, as follows.

SQL session:

root@:26257/defaultdb> create table foo (bar text);
CREATE TABLE

Time: 13.470194ms

root@:26257/defaultdb> create changefeed for table foo into 'kafka://localhost:9092';
-[ RECORD 1 ]
job_id | 384437575409795073

Time: 27.383393ms

root@:26257/defaultdb> create changefeed for table foo into 'kafka://localhost:9092';
-[ RECORD 1 ]
job_id | 384437585173282817

Time: 31.02842ms

root@:26257/defaultdb> create changefeed for table foo into 'kafka://localhost:9092';
-[ RECORD 1 ]
job_id | 384437588237549569

Time: 23.118139ms

root@:26257/defaultdb> create changefeed for table foo into 'kafka://localhost:9092';
-[ RECORD 1 ]
job_id | 384437590001811457

Time: 27.917453ms

root@:26257/defaultdb> create changefeed for table foo into 'kafka://localhost:9092';
-[ RECORD 1 ]
job_id | 384437591439802369

Time: 23.085301ms

root@:26257/defaultdb> insert into foo values ('test');
INSERT 1

Time: 12.081446ms

root@:26257/defaultdb>

CockroachDB will happily send all changes to Kafka through all of the changefeeds at once:

roland@gceworker-roland:~/confluent-5.0.0$ ./bin/confluent consume foo --property print.key=true
This CLI is intended for development only, not for production
https://docs.confluent.io/current/cli/index.html

[384437625013927937]	{"bar": "test", "rowid": 384437625013927937}
[384437625013927937]	{"bar": "test", "rowid": 384437625013927937}
[384437625013927937]	{"bar": "test", "rowid": 384437625013927937}
[384437625013927937]	{"bar": "test", "rowid": 384437625013927937}
[384437625013927937]	{"bar": "test", "rowid": 384437625013927937}

While I suppose this is allowable in the spirit of at-least-once messaging, I can't imagine a user would ever want to do this intentionally. At the time a changefeed is created, we should check to make sure that only one changefeed is being written to a given Kafka topic at a time.

Jira issue: CRDB-4832

@rolandcrosby rolandcrosby added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture labels Sep 19, 2018
@rolandcrosby rolandcrosby changed the title cdc: don't allow multiple changefeeds with the same sink changefeedccl: don't allow multiple changefeeds with the same sink Sep 20, 2018
@danhhz danhhz added this to 2.1.x in [DEPRECATED] CDC Sep 27, 2018
@danhhz
Copy link
Contributor

danhhz commented Oct 10, 2018

Fixing this requires looking at the jobs entries for all non-terminal changefeeds and erroring if any of them have the same sink uri. Defining "same" here has some complications. The easy choice is exact string match, but the spirit of the issue is that we shouldn't be talking to the same cluster twice, and it's easy to do that without an exact string match (query params on the uri, selecting a different node as the bootstrap, load balancer, etc). Better would be to talk to the kafka cluster and resolve some sort of unique cluster id (does kafka have that)? This better solution would require some code per-sink.

Note that #28670 will also have to examine all non-terminal changefeed entries (to reject if there's one with the same name).

This is definitely a ux surprise and a footgun, but it seems unlikely enough to me that even the exact sink uri match fix doesn't seem high priority enough to address in 2.1. Moving to 2.2

@danhhz danhhz moved this from 2.1.x to 2.2 in [DEPRECATED] CDC Oct 10, 2018
@danhhz danhhz moved this from 2.2 to Later in [DEPRECATED] CDC Jan 28, 2019
@mwang1026 mwang1026 moved this from Later to UX improvements in [DEPRECATED] CDC May 29, 2020
@mileswilson
Copy link

but the spirit of the issue is that we shouldn't be talking to the same cluster twice

Why is this the case? Or will this be dependent on the topic_prefix?

@miretskiy
Copy link
Contributor

I think the complexity that would need to be added to do something like this is simply
not worth it. Not to mention the fact that it would do nothing when running
multiple clusters (as lots of customers do).

Closing this issue; but @amruss feel free to reopen if disagree.

[DEPRECATED] CDC automation moved this from UX improvements to Closed Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc
Projects
[DEPRECATED] CDC
  
Closed
Development

No branches or pull requests

5 participants