Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-2514 #43

Closed
wants to merge 9 commits into from
Closed

DBZ-2514 #43

wants to merge 9 commits into from

Conversation

smiklosovic
Copy link
Member

No description provided.

@gunnarmorling
Copy link
Member

Hey @smiklosovic, thanks a lot, in particular for doing the suggested commit reorganization. Will try and take a look asap. On a very high level, I am surprised by the high ratio of added vs. removed lines, it seems to indicate that there's quite a lot of code which isn't shared between 3.x and 4.x? Maybe this will get clearer for me when actually diving into the code.

@bingqinzhou, @fuxiao224, if you have some time, can you take a look as well? I think the most interesting part for you probably will be: will it break things with Cassandra 3.x? IIUC right, it does not, this change proposes a significant reorganization of the connector's code base though, with two separate connectors for Cassandra 3.x and 4.x, and a core module for the parts shared between them.

@smiklosovic
Copy link
Member Author

smiklosovic commented Dec 1, 2021

The high ratio of changed lines is because almost everything from 3 went to core. It has nothing to do with 4 as 4 is just pulling stuff from core after extraction was done.

To reiterate: common stuff for 3 and 4 is all moved to core and 3 and 4 are just depending on it. There is virtually nothing more to extract, core is Cassandra version agnostic (minus dependency on DatabaseDescriptor and some classes which need to be present in core in order to compile).

I think that current 3 impl is just a drop-in replacement of what is there now.

FYI as I moved to Cassandra Driver of version 4, you need to use different way to connect to Cassandra node, the family of properties related to connection to that node was removed and all is done via configuration file mechanism in Cassandra 4 driver (this is configurable via your properties mechanism too, you will find it if you take a closer look to configuration).

https://docs.datastax.com/en/developer/java-driver/4.2/manual/core/configuration/

@bingqinzhou
Copy link
Member

I'll take a look soon later this week or early next week. Currently I'm a bit concerned about 119 files being changed.
@smiklosovic Can you list the major changes you're trying to make in the description section please? From my understanding, you are working on the ticket to enable the use of Java 11 on Cassandra Connector. But with a quick look at this PR, it seems to be changed to use a different version of cassandra-driver ? Want to understand why this is related to the Java 11 migration. Thanks!

@smiklosovic
Copy link
Member Author

smiklosovic commented Dec 2, 2021

Hi guys,

no offence but regardless of other changes this branch incorporates, once I started with the extraction I very soon realised that the previous code-base is not prepared for that extraction too much. It was not "elastic" enough to just move the code here and there, I had to figure out how to make it happen because things were just wired in there without any significant effort which would make that extraction easier in the future. So yeah, that is why there is a lot changes but they are trully mostly about imports and re-organisation the code to make this happen.

By the way, 119 files changed related to the first commit are literally just about moving the from one directory to the other, what's the deal there? All the real changes are in the second commit instead.

I have changed driver version from 3 to 4 (details are in the ticket). There were also some problems with mappers when one wanted to reuse them across Cassandra versions 3 and 4 (this is very low level and I could go there if you want).

I have summarised all details in the ticket itself in a very elaborative manner so I do not see why I should repeat it all here, honestly. Everything is there, in details.

Next, it seems that the code was not / is not prepared for better unit testing. One reason I just didnt provide
the latest approach how Cassandra 4 is processing the logs is that I was basically unable to test it reliably with the current way of how the project / code is structured.

In my opinion, the outcome of this set of changes should be if we agree on the general approach I took and if we do, I will need to focus more on testing part and implementing Cassandra 4 way of handling things.

Yes, this is more about Cassandra 3 and Cassandra 4 instead of supporting Java 11 but I do not see any problem having this effort put together. I can definitely investigate how to make Cassandra 3 / 4 impls to be built / run with Java 11.

If you think the ticket does not fit the changes (or other way around), why dont we create a new ticket with Cassandra 4 support and we put these changes under it if that is your concern?

Thanks

@smiklosovic smiklosovic force-pushed the main-modularised branch 2 times, most recently from 422ab4d to 8bbc8e6 Compare December 7, 2021 10:19
@bingqinzhou
Copy link
Member

For people who use the latest version of this connector, will they be able to run this connector on Cassandra 3.X cluster or not?

@smiklosovic
Copy link
Member Author

smiklosovic commented Dec 9, 2021

Hi @bingqinzhou ,

cassandra-3 impl should be a drop-in replacement but the only user-facing change is that this is on Datastax Cassandra driver of version 4.11.1. That means that the configuration of the connection is not done in debezium properties file but it is done in a dedicated file the driver can understand. There is new configuration property introduced in the current configuration which points to a file path of the driver configuration where everything is put.

example configuration, this is application.conf file and users are meant to put a path to this file to Debezium configuration property cassandra.driver.config.file.

datastax-java-driver {
  basic {
     request.timeout = 20 seconds
     contact-points = [ "spark-master-1:9042" ]
     load-balancing-policy {
         local-datacenter = "dc1"
     }
  }
  advanced {
    auth-provider {
      class = PlainTextAuthProvider
      username = admin
      password = admin
    }
    ssl-engine-factory {
      truststore-path = /submit/cassandra/ssl/server-truststore.jks 
      truststore-password = cassandra
      keystore-path = /submit/cassandra/ssl/spark-master-1-server-keystore.p12
      keystore-password = cassandra
      cipher-suites = [ "TLS_RSA_WITH_AES_256_CBC_SHA" ]
    }
  }
}

Reference: https://docs.datastax.com/en/developer/java-driver/4.2/manual/core/configuration/reference/

@gunnarmorling
Copy link
Member

Hey @bingqinzhou, do you think you folks could take a closer look at this after the holidays? Would be great if we could converge on the path forward here for the upcoming 1.9 release (Alpha1 due some time in January).

@bingqinzhou
Copy link
Member

@smiklosovic

Can you link the ticket you mentioned which includes the full context about this PR?

I read your comments above but they don't answer the following questions clearly:

  1. Without this change, can this connector run on Cassandra 4.x nodes?
  2. With this change, can this connector still run on Cassandra 3.x nodes?

This PR includes major refactor to the original code base, which I don't think is necessary at this moment. If I understand the situation correctly, the goal of this PR is to upgrade Cassandra Driver version for this connector. In that case, I only want to keep the code changes which are closely related to the version upgrade. Currently, many of us have been running this connector stably on PRD for a while. Making major refactoring to the code base brings risk to break original functions or debugging patterns for us. If you feel it's necessary to make such major refactor, please create a separate ticket and PR for us to discuss about later.

@smiklosovic
Copy link
Member Author

smiklosovic commented Jan 31, 2022

Hi @bingqinzhou ,

I was on a leave whole January.

  1. no
  2. yes, but with the caveat of configuring respective bits as I wrote in my previous reply.

https://issues.redhat.com/browse/DBZ-2514?focusedCommentId=19260916&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-19260916

No, the goal is not to update Cassandra driver. You have that part wrong. The whole philosophy how logs are processed was changed in Cassandra 4 and this needs to accommodate as well as the internals of Cassandra 4 are simply different from Cassandra 3 and it is just not applicable to Cassandra 4 anymore so we need to make a difference by having two modules. More to it, it is more like in order to support Cassandra 4 Debezium connector, one needs to use Cassandra driver of version 4. But since the internals of this plugin are so hard-wired, it will basically render us to use driver of version 4 for Cassandra 3 as a side effect.

I think we are moving in circles here, I have explained what I did with the refactorisation and it is done like that. This change introduces refactoring into cassandra-3 and cassandra-4 modules with one core module both implementations are depending on. You can not do it like one module would cover both Cassandra versions. That is just not possible. So we need to have two modules for that.

If you do not want to proceed with this, feel free to reject this and we will implement it on our own. I have a feeling we are actually doing you a favor and if no route is viable ... well, what should we do?

@ahmedjami
Copy link
Contributor

Hi @ALL,
First, thanks a lot @smiklosovic for this PR.

Personally, I have started testing the new implementation from the upstream and hope that will be discussed and merged soon here into the main project.

Cassandra-4 is released a few months ago and a lot of users have already started to upgrade to this new version, for those who use Debezium, it's very important to have a new version that supports Cassandra-4.

Cheers

@smiklosovic
Copy link
Member Author

smiklosovic commented Jan 31, 2022

BTW I dont understand why this project can not just branch off current main to support the original plugin if there is a concern with a "too big change". What is wrong with taking care of 1.7.0 or what have you while having a plugin of version 2.0.0 where this refactoring would take place? I just dont consider the argument "this is too big change" relevant. That is what Git is for? It is not like people can not go back to older plugin and patch things if they find it necessary. Nobody prevents people from checking out the current code and applying fixes. So if the older plugin works for you, let's just stick with that one, I guess? One just does not have to use "the latest and the greatest".

@ahmedjami
Copy link
Contributor

@gunnarmorling @bingqinzhou
What do you think about what @smiklosovic suggests?
IMHO, it would be interesting to have a first version that will be tested and improved by the community.

We, like you @bingqinzhou, have Debezium that works well on production with Cassandra-3 but with this version of connector we will not have the "chance" to plan an upgrade to Cassandra-4 soon...
Cheers,

@gunnarmorling
Copy link
Member

Hey all, sorry for the silence on my end. I need to dive into this again, as it has been a while. I'll get back to this by the end of this week. From my PoV, having a) support for Cassandra 4.x is a top priority, b) not breaking the existing 3.x support is equally important, and c) the ability to build with Java 11 is also critical but a bit less important than a) and b). I'll reply with some more thoughts on possible branching/versioning strategies once I've looked more closely and hopefully remembered the details. What I can say is that I'd love to have support for Cassandra 4.x in Debezium 1.9 (i.e. the current minor version that's being worked on and which is due by end of Q1).

@bingqinzhou
Copy link
Member

bingqinzhou commented Feb 3, 2022

Hi @smiklosovic, thanks for the explanation. I'm not able to test the change at this moment since we haven't and don't have the plan to upgrade to Cassandra 4.x yet. But if @ahmedjami has tested, it I'm good with merging this PR if it looks good to @gunnarmorling as well.

@smiklosovic
Copy link
Member Author

I am cooperating closely with @ahmedjami to have it merge-worthy. I ll keep you guys informed. Thank you very much.

@gunnarmorling
Copy link
Member

Hey @smiklosovic, so I'm trying to build this locally, but I'm getting several exceptions, indicating ZooKeeper is started more than once ("java.net.BindException: Address already in use"). The CI build here is failing too (see above). Could you rebase this PR to the current main branch, resolving the merge conflict in the POM too? I'll comment in a sec on some things in the POM which will be needed to make this work with the latest main core (1.9.0-SNAPSHOT). Thanks!

Copy link
Member

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smiklosovic, some minor comments inline for now, just to make that thing build with the current version of Debezium core:

  • Use Debezium 1.9.0-SNAPSHOT
  • Use Logback for testing instead of log4j 1.2.x

You can take a logback config from the connectors in the debezium-core repo as template for changing the log config. Thx!

cassandra-3/pom.xml Outdated Show resolved Hide resolved
cassandra-3/pom.xml Outdated Show resolved Hide resolved
cassandra-3/pom.xml Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
@smiklosovic
Copy link
Member Author

@gunnarmorling interesting, are you sure there isnt another zookeeper running? i am just building it fine locally. I ll update it to 1.9-SNAPSHOT.

@gunnarmorling
Copy link
Member

gunnarmorling commented Feb 4, 2022

LOL, yes, I do have another ZK running indeed. Not sure why I didn't check that. Thx for pointing it out :) That said, why do we need ZK here to begin with?

Never mind, I see we test this differently than the other connectors, by starting AK and ZK in embedded mode.

@smiklosovic
Copy link
Member Author

Yeah I am not sure I am doing that right but my idea to have everyhing embedded, kafka, zookeeper, cassandra so you do not need to run any helper Docker containers or so ...

@gunnarmorling
Copy link
Member

Yeah, so I think that's ok for now. Eventually, I'd prefer though if this connector also uses containers for starting up Cassandra, similar to what we do for all the other connectors. It'll avoid issues like we currently have with the Cassandra 3.x connector which only can be built with JDK 1.8 as it it runs Cassandra 3.x in embedded mode. Assuming the driver itself can be used with later Java versions, we could move Debezium entirely beyond Java 1.8 by running Cassandra in a container (which then would have Java 1.8, but fully isolated from Debezium). But we can take that separately, so as not to further increase the scope of this change which is big already.

@gunnarmorling
Copy link
Member

@bingqinzhou, @smiklosovic, @ahmedjami, some thoughts the overall strategy here:

  • Support for Cassandra 4.x is vital for Debezium, as otherwise we risk the connector to become outdated, so it's very good to see this work by @smiklosovic, thank you so much for this!
  • We should try to minimize the risk for breaking the existing connector for 3.x; we do have tests for that connector though, so making sure they all still pass after this refactoring is a first line of defence against any potential regressions; in addition, it'd be awesome if for instance @bingqinzhou and @ahmedjami could take that reworked 3.x connector (i.e. build it from this PR), run it in your testing/staging environments and let us know how it goes; this may reveal any glitches we've missed before (in which case we should look into expanding the test suite in those areas)
  • We don't have much capacity to actively keep older branches alive; so e.g. right now, while 1.9 is in the workings, we still do 1.8.x releases, but 1.7.x releases only to a very limited degree, if at all (essentially only if it's required to fix a critical bug in the supported downstream Red Hat product based on Debezium). This means we cannot really keep the pre-refactoring 3.x connector alive for a longer time. That is, I don't mind any of the committers pushing fixes to that branch, but we probably would not publish any official releases based on that branch down the road. Also from a project perspective, we're all better off when combining forces and focusing on maintaining a single version

This all is to say I personally don't see any strong reason for not moving forward with this change (though I still need to dive a bit further into the code changes, and I may have some suggestions for adjustments); on the contrary, it's an important next step for this connector. As said, getting some testing of the reworked 3.x connector would be great for increased confidence. Thanks, all!

@smiklosovic
Copy link
Member Author

I am letting you that that driver 4.14.0 was released with the fix. Is there any movement here in this matter from your side?

@ahmedjami
Copy link
Contributor

I tested it with Cassandra-4 and I confirm that it works :) DBZ detects new CDC tables when it's already running.

I still have a few more tests on staging env to do next week then I think @gunnarmorling can merge this PR.

Greetings,
Ahmed

@ahmedjami
Copy link
Contributor

Hi @smiklosovic @gunnarmorling
I have finished with all our tests and I can you confirm that:

  • All tests are green for Cassandra 3.x and we didn't see any regression.
  • Tests passed for the new version Cassandra 4.x.

PR can be merged :)

@bingqinzhou In our case, the delay with Cassandra 3.x is about 10min since we force a nodetool flush every 10min. With Cassandra 4.x, this delay has become less than 1min based on tests done locally and it should be confirmed in PROD :)

Greetings

@gunnarmorling
Copy link
Member

Ok, cool. So I'll take another look and I think we'll have this in for the CR1 release due in three weeks (Beta1 is due tomorrow, and I won't find the time before then to merge it), so it should be part of the 1.9 release train. Thanks a lot for reporting back, @ahmedjami!

@bingqinzhou
Copy link
Member

Hi @smiklosovic @gunnarmorling I have finished with all our tests and I can you confirm that:

  • All tests are green for Cassandra 3.x and we didn't see any regression.
  • Tests passed for the new version Cassandra 4.x.

PR can be merged :)

@bingqinzhou In our case, the delay with Cassandra 3.x is about 10min since we force a nodetool flush every 10min. With Cassandra 4.x, this delay has become less than 1min based on tests done locally and it should be confirmed in PROD :)

Greetings

@ahmedjami Great news! Thank you for confirming this :)

@smiklosovic
Copy link
Member Author

@gunnarmorling any progress here? :)

@ahmedjami
Copy link
Contributor

Hi @gunnarmorling,
Do you think that this MR will still be part of the 1.9 release? :)
Thanks

@gunnarmorling
Copy link
Member

gunnarmorling commented Mar 18, 2022 via email

@gunnarmorling
Copy link
Member

Rebased locally and applied. Huge thank you for this massive piece of work, @smiklosovic! Really cool to see that Debezium does support Cassandra 4.x with that now. Also thanks a lot for your patience with this, it took quite a while for getting this in (apologies for any delays caused by me).

I will log a follow-up issue for reworking the 3.x tests so that they use Cassandra in a container, allowing us to move the build of all the Debezium connectors to Java 11.

@smiklosovic
Copy link
Member Author

wow this is great !

@fuxiao224
Copy link
Contributor

fuxiao224 commented Feb 28, 2023

Hi @smiklosovic @gunnarmorling I have finished with all our tests and I can you confirm that:

  • All tests are green for Cassandra 3.x and we didn't see any regression.
  • Tests passed for the new version Cassandra 4.x.

PR can be merged :)

@bingqinzhou In our case, the delay with Cassandra 3.x is about 10min since we force a nodetool flush every 10min. With Cassandra 4.x, this delay has become less than 1min based on tests done locally and it should be confirmed in PROD :)

Greetings

Hi @ahmedjami ! Thanks for sharing this! May I know why we still need to force run nodetool flush even with Cassandra 4? I thought Cassandra 4.0 has already fixed the issue and we can just change memtable_flush_period_in_ms to flush changes periodically. ref:https://stackoverflow.com/questions/55573249/how-to-flush-cassandra-cdc-changes-periodically-to-disk
But I tried this ^ locally (with only 3-5 records change) and it doesn't work as what I expected. So I want to get more insights from you. Thanks a lot!

@smiklosovic
Copy link
Member Author

smiklosovic commented Feb 28, 2023

Hi @fuxiao224 , please read this blog post https://cassandra.apache.org/_/blog/Learn-How-CommitLog-Works-in-Apache-Cassandra.html

Cassandra can only delete a segment after all its mutations are persisted in SSTables. Knowing if a file does not hold any mutations that haven’t been flushed yet requires a bit of bookkeeping.

Cassandra can only delete a segment after all its mutations are persisted in SSTables.

Deletion of a segment = saving commit log to cdc_raw dir.

All mutations persisted in sstables = everything if flushed.

Since a commitlog contains mutations from all tables, flushing some particular table to sstable does not mean that there will not be any other mutations in a commit log which are not flushed yet. I think that you might achieve what you want if you set that memtable_flush_period_in_ms for all tables in database.

@ahmedjami
Copy link
Contributor

Hi,
We have removed the nodetool flush "cron" after upgrading to Cassandra 4.

You have to check that you insert sufficient data so the commitLog will be discarded and the index file (<segment_file>_cdc.idx in your cdc_raw_directory) will contains the word COMPLETED.
Debezium reads ths index file and waits until it is "COMPLETED" and then it will replay all of the commitLog.

So, we need a completed commitLog before seeing new messages into kafka topic :)

@fuxiao224
Copy link
Contributor

Hi, We have removed the nodetool flush "cron" after upgrading to Cassandra 4.

You have to check that you insert sufficient data so the commitLog will be discarded and the index file (<segment_file>_cdc.idx in your cdc_raw_directory) will contains the word COMPLETED. Debezium reads ths index file and waits until it is "COMPLETED" and then it will replay all of the commitLog.

So, we need a completed commitLog before seeing new messages into kafka topic :)

Hi @ahmedjami Thanks again for sharing! I wonder what is the current delay time after removing the nodetool flush cron job?
So if there's not much traffic in the database, the delay can still be long even with Cassandra 4?

@ahmedjami
Copy link
Contributor

So if there's not much traffic in the database, the delay can still be long even with Cassandra 4?

It depends on traffic and how much time the commitLog takes to reach the limit defined by the parameter commitlog_segment_size: https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L544

Note that you could also change the value of this parameter, In our test environment we set it to 8M since we have less traffic than the production.

@ahmedjami
Copy link
Contributor

for the low traffic database or the low traffic tables in a database, would the data stream becomes nearly real-time,

I haven't tested it yet but this new parameter seems to be useful for you as DBZ shouldn't wait for the COMPLETED into the index file and process commitLog in "real time" : https://github.com/debezium/debezium-connector-cassandra/blob/main/core/src/main/java/io/debezium/connector/cassandra/CassandraConnectorConfig.java#L310

@fuxiao224
Copy link
Contributor

Thank you very much @ahmedjami ! The info ^ is very helpful. I'll run some tests and dig more into it.

@smiklosovic
Copy link
Member Author

smiklosovic commented Mar 1, 2023

@ahmedjami @fuxiao224

This is not completely true. It works differently.

You have to check that you insert sufficient data so the commitLog will be discarded and the index file 
(<segment_file>_cdc.idx in your cdc_raw_directory) will contains the word COMPLETED. Debezium reads ths index file and 
waits until it is "COMPLETED" and then it will replay all of the commitLog.

So, we need a completed commitLog before seeing new messages into kafka topic :)

There is this work merged: #69

What it does is that it will not wait for COMPLETED to be there. Instead, Cassandra updates cdc index file and it will write there the position in the file where mutations are. And then after some time when new mutations are added, it will again update that position.

So what we did was that we started to parse this index file and as soon as the position changed, we read it from the last position to the new (you can seek). So we are basically reading this near-realtime. We read it every 10 seconds by default:

https://github.com/debezium/debezium-connector-cassandra/blob/main/core/src/main/java/io/debezium/connector/cassandra/CassandraConnectorConfig.java#L310

https://github.com/debezium/debezium-connector-cassandra/blob/main/core/src/main/java/io/debezium/connector/cassandra/CassandraConnectorConfig.java#L325

This number should not be abused to be set to something like 1 second. There is some performance penalty to seek so often. 10s is nice compromise. So, as soon as your data comes to commit log and index file position is changed, we scan new mutations not later that 10 seconds after that.

What you described is what we did initially. But there was always this room for improvement and I am glad somebody implemented this.

You do not need to call flush with this nearly-realtime feature. Every single mutation goes to the commit log first. Then to memtable. If you flush it, memtable goes to sstable and commitlog is "changed" so it knows what mutations were flushed.

Flushing in a lot of cases writes all data to sstables in such way that cdc is eligible for deletion. But it is not always the case. If you want to process mutations more realtime-ish, you should try that stuff I posted.

@nitinitt
Copy link

nitinitt commented Mar 2, 2023

Hey @smiklosovic @fuxiao224
We had worked on #69 for real-time processing of CDC Cassandra 4 Commit Logs exactly as described above. If you reduce the delay of 10sec, then there is extra CPU consumed and that is the trade-off. 10sec was a decent number, and we have included some of our benchmarks at: https://medium.com/walmartglobaltech/walmarts-cassandra-cdc-solution-6fc650031a3, SubSection: Debezium Cassandra4 CDC Performance.
These numbers were relevant to our payload size, and our VM's running in Azure and you can perform the same experiments to measure CPU usage if interested.

Also there is debezium documentation around this feature at: https://debezium.io/documentation/reference/stable/connectors/cassandra.html#reading-the-commitlog
Cassandra Documentation: https://cassandra.apache.org/doc/latest/cassandra/operating/cdc.html#overview

@ahmedjami
Copy link
Contributor

Hi @smiklosovic
Thanks for explaining :)

What I understand is that this new implementation (real-time processing) works only when commit.log.real.time.processing.enabled = true, this is set to false by default.

How DBZ works without the real-time option ? It's reading the index file every 10s, waiting for the COMPLETED before processing the complete commitLog ? Or I'm missing something?

Thanks again for explaining ;)

@smiklosovic
Copy link
Member Author

@ahmedjami

as I described above already:

What it does is that it will not wait for COMPLETED to be there. Instead, Cassandra updates cdc index file and it will write there the position in the file where mutations are. And then after some time when new mutations are added, it will again update that position.

So what we did was that we started to parse this index file and as soon as the position changed, we read it from the last position to the new (you can seek). So we are basically reading this near-realtime. We read it every 10 seconds by default:

@ahmedjami
Copy link
Contributor

Thanks @smiklosovic for your reply.

What you described is also what DBZ does with DEFAULT_COMMIT_LOG_REAL_TIME_PROCESSING_ENABLED = false ?

We did not change this parameter and we thought that we did not enable the real-time-processing.

@smiklosovic
Copy link
Member Author

smiklosovic commented Mar 2, 2023

It is false by default so it acts as previously. You need to set it to true to enable this "realtime-ish" behavior.

How DBZ works without the real-time option ? It's reading the index file every 10s, waiting for the COMPLETED before processing the complete commitLog ? Or I'm missing something?

Yes, this is true.

Only if it set to true you get that realtime behavior.

@fuxiao224
Copy link
Contributor

Hi @smiklosovic @ahmedjami
Thanks for answering my questions and sharing the useful info ^.
I wonder if we have a place where we can post and discuss Cassandra CDC related questions/topics?
I have some additional Cassandra CDC error handling related questions I want to discuss and get advise. Thanks a lot!

@smiklosovic
Copy link
Member Author

@jpechane ^^^^

@fuxiao224
Copy link
Contributor

Hi @smiklosovic !
I recently tested commit.log.real.time.processing.enabled = true locally, but it still doesn't work. After setting this parameter to true, I think Cassandra CDC is processing the index file and reading commitlog file, because I see the logs like:
INFO io.debezium.connector.cassandra.Cassandra4CommitLogRealTimeParser - Polling for completeness of idx file for: ...
INFO io.debezium.connector.cassandra.Cassandra4CommitLogRealTimeParser - Resume to read the partial file:...
DEBUG io.debezium.connector.cassandra.AbstractCassandra4CommitLogParser - Starting to read commit log segments ...
DEBUG org.apache.cassandra.db.commitlog.CommitLogReader - Reading ...
However, I still don't see the mutations in Kafka topic. 😢
I also see this log: DEBUG io.debezium.connector.cassandra.Cassandra4CommitLogReadHandlerImpl - not tracked by cdc... It looks like for some reason, the mutation is not tracked by CDC? But I have no idea why. Do I miss anything?
Thanks in advance.

@jpechane
Copy link
Contributor

@fuxiao224 Please use Debezium chat and mailing list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants