Fix handling of topics with periods #70

jingw · 2016-06-02T17:36:55Z

Currently the filters / offset extraction rely on splitting on dot. If there's a topic called namespace.topic, it'll extract namespace and compare that with the topic name. This breaks recovery, where it looks for the max offset.

Also fixed typo in HdfsSinkConnecorConstants and deleted unused CommittedFileWithEndOffsetFilter.

Closes #45

ConfluentJenkins · 2016-06-02T17:36:57Z

Can one of the admins verify this patch?

ghost · 2016-06-02T17:36:58Z

Hey @jingw,
thank you for your Pull Request.

It looks like you haven't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

jingw · 2016-06-02T17:39:57Z

Our legal folks filled out the CLA. Please let me know if I still need to do something for that.

ewencp · 2016-06-04T21:34:10Z

src/main/java/io/confluent/connect/hdfs/FileUtils.java

+    if (!m.matches()) {
+      throw new IllegalArgumentException(filename + " does not match COMMITTED_FILENAME_PATTERN");
+    }
+    return Long.parseLong(m.group(4));


Should we maybe change this to a named constant in HdfsSinkConnectorConstants since it could change if, e.g. any grouping was adjusted in COMMITTED_FILENAME_PATTERN?

ewencp · 2016-06-04T21:36:29Z

@jingw Looks great. I left a couple of comments about cleanup and duplicated code in a test, but should all be trivial to cleanup and get committed.

ewencp · 2016-06-04T21:36:48Z

test this please

Currently the filters / offset extraction rely on splitting on dot. If there's a topic called `namespace.topic`, it'll extract `namespace` and compare that with the topic name. This breaks recovery, where it looks for the max offset. Also fixed typo in HdfsSinkConnecorConstants and deleted unused CommittedFileWithEndOffsetFilter.

jingw · 2016-06-06T17:09:19Z

I made the suggested cleanups. Let me know if you need anything else. Thanks!

ewencp · 2016-06-06T18:08:37Z

ok to test

ewencp · 2016-06-06T18:21:01Z

LGTM, thanks for the contribution!

blbradley · 2016-09-09T19:48:27Z

Did this make it into 3.0.1? I have a topic like okcoin.streaming.btcusd.trades and I get this traceback upon recovery:

[2016-09-09 14:36:25,225] ERROR Task cryptocoin-hdfs-sink-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:142)
java.lang.NumberFormatException: For input string: "streaming"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Integer.parseInt(Integer.java:615)
        at io.confluent.connect.hdfs.filter.TopicPartitionCommittedFileFilter.accept(TopicPartitionCommittedFileFilter.java:37)
        at io.confluent.connect.hdfs.FileUtils.fileStatusWithMaxOffset(FileUtils.java:141)
        at io.confluent.connect.hdfs.FileUtils.fileStatusWithMaxOffset(FileUtils.java:130)
        at io.confluent.connect.hdfs.TopicPartitionWriter.readOffset(TopicPartitionWriter.java:390)
        at io.confluent.connect.hdfs.TopicPartitionWriter.resetOffsets(TopicPartitionWriter.java:453)
        at io.confluent.connect.hdfs.TopicPartitionWriter.recover(TopicPartitionWriter.java:203)
        at io.confluent.connect.hdfs.DataWriter.recover(DataWriter.java:239)
        at io.confluent.connect.hdfs.DataWriter.open(DataWriter.java:281)
        at io.confluent.connect.hdfs.HdfsSinkTask.open(HdfsSinkTask.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:428)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:54)
        at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:464)
        at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:234)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$2.onSuccess(AbstractCoordinator.java:255)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$2.onSuccess(AbstractCoordinator.java:250)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
        at org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:459)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:445)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:702)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:681)
        at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:266)
        at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:366)
        at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:975)
        at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:316)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:222)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:170)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:142)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

blbradley · 2016-09-09T20:01:12Z

It did not. See you at the next release!

blbradley · 2016-09-16T18:39:46Z

I put an image (from automated build) on Docker Hub from the 3.0.1 images that has a custom build with this patch applied in case anyone needs this fix.

Radeep · 2018-12-07T14:26:52Z

Is it an option to mention custom DB name for Hive instead of using the Kafka topic name.. If this is already covered, could you briefly describe the procedure ..

…failure MINOR: Fix compilation error due to new method in SinkTaskContext

ewencp reviewed Jun 4, 2016
View reviewed changes

ewencp merged commit 6096733 into confluentinc:master Jun 6, 2016

jingw deleted the fix branch June 6, 2016 18:56

ewencp mentioned this pull request Jun 9, 2016

Bwilliams fix for topics with dot #73

Closed

jingw mentioned this pull request Jul 22, 2016

Fix lint annoyances #94

Merged

drdee mentioned this pull request Aug 18, 2016

Fix handling of topics with periods Shopify/kafka-connect-hdfs#2

Merged

This was referenced Oct 14, 2016

Fix topics with periods for hive #136

Closed

Fix topics with periods for hive #136 #137

Closed

kkonstantine mentioned this pull request Oct 11, 2017

Customizable Hive Database and Table names #44

Closed

serssp pushed a commit to serssp/kafka-connect-hdfs that referenced this pull request Dec 29, 2018

Merge pull request confluentinc#70 from rhauch/fix-sink-task-context-…

5a4edc4

…failure MINOR: Fix compilation error due to new method in SinkTaskContext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of topics with periods #70

Fix handling of topics with periods #70

jingw commented Jun 2, 2016 •

edited

Loading

ConfluentJenkins commented Jun 2, 2016

ghost commented Jun 2, 2016

jingw commented Jun 2, 2016

ewencp Jun 4, 2016

ewencp commented Jun 4, 2016

ewencp commented Jun 4, 2016

jingw commented Jun 6, 2016

ewencp commented Jun 6, 2016

ewencp commented Jun 6, 2016

blbradley commented Sep 9, 2016

blbradley commented Sep 9, 2016

blbradley commented Sep 16, 2016

Radeep commented Dec 7, 2018

Fix handling of topics with periods #70

Fix handling of topics with periods #70

Conversation

jingw commented Jun 2, 2016 • edited Loading

ConfluentJenkins commented Jun 2, 2016

ghost commented Jun 2, 2016

jingw commented Jun 2, 2016

ewencp Jun 4, 2016

Choose a reason for hiding this comment

ewencp commented Jun 4, 2016

ewencp commented Jun 4, 2016

jingw commented Jun 6, 2016

ewencp commented Jun 6, 2016

ewencp commented Jun 6, 2016

blbradley commented Sep 9, 2016

blbradley commented Sep 9, 2016

blbradley commented Sep 16, 2016

Radeep commented Dec 7, 2018

jingw commented Jun 2, 2016 •

edited

Loading