Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka source: Use Segment's kafka-go client #105

Merged
merged 13 commits into from
Jan 28, 2022
Merged

Conversation

hariso
Copy link
Contributor

@hariso hariso commented Jan 21, 2022

Description

List of major changes:

  1. The Kafka source connector now uses Segment's Kafka client.
  2. Test packages are now kafka_test. This is because the tests (which were in the kafka package), used the mock package, which now imports something from kafka package, which created an import cycle Go complained about.
  3. Offsets are now managed by the Kafka brokers. In the current plugin interfaces, we have the Read() method, which should make it possible to read a record from an arbitrary position. To achieve that with Confluent's Kafka client, we were manually managing offsets. Practically, though, never needed to read records from arbitrary positions. Furthermore, in the new plugin interfaces, we won't have to do that at all: the plugin will stream messages to Conduit.

I'm about to update the plugin docs in a bit, and also do a bit more of cleanup.

Fixes part of #52

Quick checks:

  • I have followed the Code Guidelines.
  • There is no other pull request for the same update/change.
  • I have written unit tests.
  • I have made sure that the PR is of reasonable size and can be easily reviewed. -- Well, kinda. I'm not sure if the PR can be made smaller.

@hariso hariso requested review from a team and ahmeroxa January 21, 2022 12:47
@neovintage neovintage added this to the v0.2.0 milestone Jan 24, 2022
Copy link
Contributor

@dylanlott dylanlott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to build and run the docker mage locally and received messages into my local Kafka from a file source.

When I was building the docker image, I compared the current image size of main to this version and this PR reduces the final build image size by ~79 MB. 👍 🍾

# which uses librdkafka, a C library under the hood, so we set CGO_ENABLED=1.
# Soon we should switch to another, CGo-free, client, so we'll be able to set CGO_ENABLED to 0.
RUN GOOS=linux GOARCH=amd64 CGO_ENABLED=1 make build
RUN GOOS=linux GOARCH=amd64 CGO_ENABLED=0 make build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious so I checked the builds. This change, combined with changing to alpine from bitnami/minideb, nets a 79mb reduction in our out-the-door docker image. Nice! 🎉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I can't really call it an improvement. I reverted to what you put in there, before I made things worse with a CGo dependency.:D

"github.com/google/uuid"
skafka "github.com/segmentio/kafka-go"
)

// todo try optimizing, the test takes 15 seconds to run!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some tests fail locally and found that it was because I had -timeout set to a pretty aggressive 10s and when I went digging for the failure found this. Do we just have to wait on Kafka?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That appears to be the case. From what I could understand from logs, tearing down (i.e. simply closing the client), takes cca. 5 seconds.:S

underTest := Source{Consumer: consumerMock, Config: cfg}
rec, err := underTest.Read(context.TODO(), pos)
underTest := kafka.Source{Consumer: consumerMock, Config: cfg}
rec, err := underTest.Read(context.TODO(), []byte(groupID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these context.TODOs going to be changed in the future or should they be Background's instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll change this.

Acks skafka.RequiredAcks
// Required acknowledgments when writing messages to a topic:
// Can be: 0, 1, -1 (all)
Acks kafka.RequiredAcks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to make this configurable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it is already nm

@ahmeroxa
Copy link

I'm confused by how the GroupID is being generated/handled here. It looks like we're using the position as the GroupID.

GroupID should be configurable (since a customer might want to use a specific consumer group ID). If it's not set, then we can generate one internally but it needs to be remain consistent throughout the entire lifespan of the connector.

@hariso
Copy link
Contributor Author

hariso commented Jan 28, 2022

I'm confused by how the GroupID is being generated/handled here. It looks like we're using the position as the GroupID.

GroupID should be configurable (since a customer might want to use a specific consumer group ID). If it's not set, then we can generate one internally but it needs to be remain consistent throughout the entire lifespan of the connector.

We talked about this offline, but for the rest: Group ID currently is generated the first time a source is read, and then re-used throughout a connector's lifespan. Ali mentioned that some Kafka deployments will use fixed consumer groups (e.g. DevOps will explicitly create Groups with certain ACLs etc.), so I'll make the group ID configurable.

@hariso hariso mentioned this pull request Jan 28, 2022
4 tasks
Copy link

@ahmeroxa ahmeroxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@hariso hariso enabled auto-merge (squash) January 28, 2022 16:24
auto-merge was automatically disabled January 28, 2022 16:28

Pull Request is not mergeable

@hariso hariso merged commit 1067a17 into main Jan 28, 2022
@hariso hariso deleted the haris/segment-consumer branch January 28, 2022 16:36
@neovintage neovintage added the housekeeping Small improvements or chores label Jan 28, 2022
@neovintage neovintage added the roadmap Planned for a future release label Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping Small improvements or chores roadmap Planned for a future release
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants