Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification Required: S3 Source Connector doesn't fetch new files #1570

Open
ajithcnambiar opened this issue Oct 12, 2023 · 7 comments
Open

Comments

@ajithcnambiar
Copy link

ajithcnambiar commented Oct 12, 2023

Usecase:
I'm trying out the S3 source connector. The S3 bucket will be periodically updated, and I want the new files to be sourced to the Kafka topic, without duplicates, without deleting from the existing S3 bucket, and without moving to a new bucket.

Test
My test was with the deleteAfterRead false and with idempotency enabled (with the Kafka type repository), with the below configuration:

Configuration:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: aws-s3-source-connector
  namespace: kafka
  labels:
    strimzi.io/cluster: kafka-connect
spec:
  class: org.apache.camel.kafkaconnector.awss3source.CamelAwss3sourceSourceConnector
  tasksMax: 1
  config:
    camel.kamelet.aws-s3-source.accessKey: <access-key>
    camel.kamelet.aws-s3-source.secretKey: <secret-key>
    camel.kamelet.aws-s3-source.region: <region>
    camel.kamelet.aws-s3-source.deleteAfterRead: false
    camel.kamelet.aws-s3-source.bucketNameOrArn: arn:aws:s3:::<bucket-name>

    camel.idempotency.enabled: true
    camel.idempotency.repository.type: kafka
    camel.idempotency.expression.type: header
    camel.idempotency.expression.header: CamelAwsS3Key
    camel.idempotency.kafka.topic: idem-topic
    camel.idempotency.kafka.bootstrap.servers: <kafka-servers>:9092
    camel.idempotency.kafka.poll.duration.ms: 150

    topics: bucket-topic

ISSUE:
The new files are not fetched. From the Kafka connect DEBUG logs, it looks like the first few files (10 files or so) are fetched during each poll to S3.

Other info:

  • My reference was this old thread and the idempotency blog to achieve the intended use case.
  • Could it be because maxMessagesPerPoll is 10? But then there seems to be no configuration property to set this for S3 source connector? 🤔

Versions tested
camel-aws2-s3-kafka-connector 0.11.5
camel-aws-s3-source-kafka-connector 3.20.6
camel-aws-s3-source-kafka-connector 4.0.0

Question
Please let me know if the intended use case can be realized. And if so, what am I missing? Kindly advise 🙏

@oscerd
Copy link
Contributor

oscerd commented Oct 13, 2023

If you want to achieve you need to deleteAfterRead or increase the max messages per poll. Your configuration will always poll the same 10 files since you don't move/delete them. Another possibility for achieving you what you are looking for is using the following connector: https://github.com/apache/camel-kafka-connector/tree/camel-kafka-connector-4.0.0/connectors/camel-aws-s3-cdc-source-kafka-connector with this one you should be able to consume new files without deleting them. Here the docs: https://camel.apache.org/camel-kafka-connector/next/reference/connectors/camel-aws-s3-cdc-source-kafka-source-connector.html

@ajithcnambiar
Copy link
Author

thanks for the quick response.

max messages per poll

Is this value configurable for the connector?

@oscerd
Copy link
Contributor

oscerd commented Oct 13, 2023

No, it's not exposed, I can add that, but in your case whatever is value it won't cover your case unless you delete after read or move.

@ajithcnambiar
Copy link
Author

It would be great if it's exposed.
In my scenario, the update to S3 happens once/twice a day. So if I have a way to configure the max to a value like 1000 - it would solve my use case.

@oscerd
Copy link
Contributor

oscerd commented Oct 13, 2023

Opened an issue on camel-kamelets project.

@ajithcnambiar
Copy link
Author

Thanks a lot for apache/camel-kamelets#1692 🙏
I'm just curious, when can we expect a version of the connector with this configuration?

@oscerd
Copy link
Contributor

oscerd commented Oct 17, 2023

We first need to release camel-kamelets 4.1.0. I'm planning to do it this week or beginning of the next, then we can upgrade in ckc and release thanks to @valdar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants