Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstraction for using Codecs and serialization for Kafka wire format #45

Open
zmccoy opened this issue Aug 17, 2020 · 14 comments
Open

Abstraction for using Codecs and serialization for Kafka wire format #45

zmccoy opened this issue Aug 17, 2020 · 14 comments

Comments

@zmccoy
Copy link
Contributor

zmccoy commented Aug 17, 2020

Our current codecs for Avro are purely for the Avro bytes format, but KafkaStreams4s should be able to read / write the wire format https://docs.confluent.io/current/schema-registry/serdes-develop/index.html#wire-format as to be able to work in a normal kafka system. I haven't given a ton of thought to the abstractions in this case, but wanted to have this issue out there.

@zmccoy
Copy link
Contributor Author

zmccoy commented Aug 18, 2020

I'm poking around at this right now.

@LukaJCB
Copy link
Contributor

LukaJCB commented Aug 19, 2020

@zmccoy
Copy link
Contributor Author

zmccoy commented Aug 19, 2020

@LukaJCB It does! What isn't great is that that I think we'd need essentially that code's abstractions to make this usable. At this moment I feel like I am working on a poor man's copy of that. Each time I've tried to trim an inner class from that I understand why they did it heh. Slowly working on a mvp.

@zmccoy
Copy link
Contributor Author

zmccoy commented Aug 19, 2020

I'm struggling with if we want to do pretty much exactly what they have so that you can create your schema registry and settings safely etc with KafkaStreams4s or would we rely on someone using both libraries together and only using the Ser/De portions. I can easily see applications only using KafkaStreams4s and not needing consumer/producer semantics outside of the streams api. This latter case then would rely on the user wrapping all of that ceremony up.

@LukaJCB
Copy link
Contributor

LukaJCB commented Aug 19, 2020

maybe this could be moved to a module in vulcan itself? Not sure what the best coa is here, maybe we should create an issue in vulcan or fs2-kafka to see what they think

@zmccoy
Copy link
Contributor Author

zmccoy commented Aug 19, 2020

Yeah, I think that's the best choice to keep things without possible regression. I can make an issue on fs2-kafka and tie it back here today. I'm going to keep making a small example of what they're doing to see how it could tie into KafkaStreams4s along with working with both Avro implementations.

One thing I'm seeing so far is knowing whether a ser/de we are going to use is for a key or not which would change the inner kafka ser/de used which in KafkaStreams4s we'll need to know or "pin" which one ends up being used.

@zmccoy
Copy link
Contributor Author

zmccoy commented Aug 19, 2020

Got a minimal version of what the fs2-kafka-encoding of this looks like going here: https://github.com/compstak/KafkaStreams4s/compare/master...zmccoy:kafka-avro-format?expand=1#diff-33e9d5e2212f25b9d29a6a7e756f130dR54-R55

I have some questions noted there as to how to tie the serialize and deserialize methods on the corresponding classes into how the streams portion works along with making sure the key serialization/deserialization is being used, hopefully implicitly, for the keys only.

Before we made an issue into fs2-kafka I wanted to make sure what we'd like in the end is possible without something snagging.

@zmccoy
Copy link
Contributor Author

zmccoy commented Aug 21, 2020

Just wanted to put an update on here: I've gotten quite a bit farther than what is above, but the ergonomics of its usage right now while keeping the key and values separate internally make it a bit of a pain to use. Kafka-summit is this coming week so I won't get back to it for a few days. (https://github.com/compstak/KafkaStreams4s/compare/master...zmccoy:kafka-avro-format?expand=1). The design leaves a lot to be desired at this moment as it doesn't hide enough information away from the interfaces.

@zmccoy
Copy link
Contributor Author

zmccoy commented Sep 8, 2020

The fix for this is going to need to wait for: #50 and then I should be able to open some stuff up :)

@keirlawson
Copy link

keirlawson commented Mar 11, 2021

Hey, my team could really use this feature with our streams/vulcan integration, any way I could help out getting this finished?

@zmccoy
Copy link
Contributor Author

zmccoy commented Mar 11, 2021

@keirlawson Hey! I have it partially completed if you want to take that branch over, I got pulled off of doing this at $work and never got back around to it, which is totally my bad. The current portion of it is here: https://github.com/compstak/KafkaStreams4s/compare/master...zmccoy:kafka-avro-format?expand=1.

@keirlawson
Copy link

Ok, will take a look, any pointers as to what is outstanding from your perspective?

@zmccoy
Copy link
Contributor Author

zmccoy commented Mar 11, 2021

Getting #50, in master now, merged into that branch which then allows for us to have our keys and values have different SerDes. A lot of that should be following the types. I'll need to dive back into it and see where I was at for a bit to have a better outline of what's really left. The ergonomics of usage was after the portion of allowing keys and values separate SerDes and using them.

IvanDyachenko added a commit to IvanDyachenko/marketplace-scraper that referenced this issue Mar 15, 2021
IvanDyachenko added a commit to IvanDyachenko/marketplace-scraper that referenced this issue Mar 16, 2021
IvanDyachenko added a commit to IvanDyachenko/marketplace-scraper that referenced this issue Mar 19, 2021
IvanDyachenko added a commit to IvanDyachenko/marketplace-scraper that referenced this issue Mar 19, 2021
* Reuse SchemaRegistryClient for multiple Serializers and Deserializers (#242)
* Add org.apache.kafka %% kafka-streams % 2.7.0 to library dependencies
* Setting auto.register.schemas to false disables
  auto-registration of the event type, so that it does not
  override the union as the latest schema in the subject.
* Setting use.latest.version to true сauses the Avro serializer
  to look up the latest schema version in the subject(which
  will be the union) and use that for serialization. Otherwise,
  if set to false, the serializer will look for the event type
  in the subject and fail to find it.
* Links:
    - https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-avro.html#multiple-event-types-in-the-same-topic
* OZON models:
  - Rename SearchResultsV2 to CategorySearchResultsV2.
  - Move InStock, OutOfStock, PreOrder and CannotBeShipped models into model.ozon package.
  - Sale model is added. This model describes number of sold ozon.Items.
  - Sale.aggregate method is implememented. This method calculcate number of sold items accroding to https://gitlab.com/dalytics/analytics/-/issues/26.
  - Item.AddToCart model is added. This model describes 'add to cart' action.
* Enricher module is implemented.
* Add KafkaStreamsConfig which contains some useful configuration parameters for Kafka Stream application.
* Implement VulcanSerde as a temporary solution until compstak/KafkaStreams4s#45
* Update docker-compose.yaml
* Clean up
* Improve comments
* Update Scala CD GitHub Actions Workflow
@keirlawson
Copy link

Apologies, I never got round to implementing this and our team have now moved away from Streams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants