Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plugin support for streaming packets to Kafka #20

Merged
merged 17 commits into from
Apr 25, 2022
Merged

Add plugin support for streaming packets to Kafka #20

merged 17 commits into from
Apr 25, 2022

Conversation

r8ygun
Copy link
Contributor

@r8ygun r8ygun commented Apr 21, 2022

There are some assumptions that I've made in this PR that probably need to be addressed before this can be marked as ready for review.

  1. With the S3 Plugin, we added the PCAP header to each file so that the objects in S3 were valid files. It would seem sensible to do the same with Kafka but we might be more constrained. For example, how do we know when to start a new "file"? We probably don't want to add the header to every 65 KB message unless we also want files to be that size too, but that seems odd. Maybe there should be some fileSize property and a new header is added to the first packet after fileSize amount of data has been received. There will be more work to do if this is the behaviour that you want.
  2. Packets that are destined for the same file need to share a message key to guarantee ordering. I've hard coded this as "packetstreamer" right now, but I'm happy to hear suggestions on other approaches. If we follow the approach suggestion in point 1, we could generate a new value for each "file" that could be used as the key.

@r8ygun r8ygun mentioned this pull request Apr 23, 2022
Copy link
Contributor

@vadorovsky vadorovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on that. Looking forward to see the final so look solution! 🙂

contrib/config/receiver-kafka.yaml Outdated Show resolved Hide resolved
pkg/plugins/kafka/kafka.go Outdated Show resolved Hide resolved
@r8ygun
Copy link
Contributor Author

r8ygun commented Apr 24, 2022

There is now a fileSize property that is set in the config. We will publish each "message" to Kafka, but after fileSize amount of data is received, we will start a new "file" (with accompanying header). Each "file" has its own unique ID that is used as the Kafka message key when publishing packets that will be contained in that "file".

@r8ygun r8ygun marked this pull request as ready for review April 24, 2022 23:50
@r8ygun r8ygun requested a review from vadorovsky April 24, 2022 23:50
@vadorovsky vadorovsky merged commit 76b1786 into deepfence:main Apr 25, 2022
@vadorovsky
Copy link
Contributor

Thanks, LGTM.

We need to do better job with handling errors, but it needs to be done for S3 as well, so I'd be happy to make that change for both plugins at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants