Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Schema Registry support #984

Closed
6 tasks done
ahmeroxa opened this issue Apr 3, 2023 · 8 comments
Closed
6 tasks done

Feature: Schema Registry support #984

ahmeroxa opened this issue Apr 3, 2023 · 8 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@ahmeroxa
Copy link

ahmeroxa commented Apr 3, 2023

Subtasks

Feature description

Native support for the Confluent Schema Registry API should be added.

This will allow the Kafka connector to translate incoming encoded messages (e.g. Avro) to structured data so that they may be processed by processors as well as be emitted downstream in other formats (e.g. OpenCDC/JSON).

Similarly, the connector will be able to produce data to destination Kafka clusters in encoded formats and register the schema automatically with a provided Schema Registry so that it may be used by downstream consumers.

My request would be to initially ship support for at least Avro but I suspect that supporting the full suite will not be much more effort (Avro, protobuf and JSON Schema).

This would additionally add support for the Apicurio Registry as it exposes a Confluent Schema Registry compatible API.

@ahmeroxa ahmeroxa added feature New feature or request triage Needs to be triaged labels Apr 3, 2023
@lovromazgon
Copy link
Member

Should this be a feature specific to the Kafka connector or should this be included in core Conduit? The connector currently reads the message and produces it as raw data to Conduit, I'm thinking that a processor could pick the correct schema from the registry and parse the raw data into structured data.

@ahmeroxa
Copy link
Author

ahmeroxa commented Apr 4, 2023

I think ideally having the functionality as a processor would be great as it would make it possible to use a Schema Registry with any connector.

@lovromazgon lovromazgon transferred this issue from ConduitIO/conduit-connector-kafka Apr 4, 2023
@lovromazgon
Copy link
Member

Here are some takeaways from our discussion:

  • Conduit should provide a processor that can take a schema from a schema registry and parse the raw payload into a structured payload.
  • Schemas should be cached in Conduit (we should fetch a schema from the registry only the first time we encounter it).
  • The user should be able to configure the schema subject based on record metadata (maybe using a Go template? e.g. my-static-prefix-{{record.Metadata["topic"]}}).
  • By default, the processor should use a predefined metadata field as the schema subject (e.g. schema-subject), unless configured otherwise by the user (as mentioned in the previous point). Connectors can populate this metadata field if the resource that they connect provides the info or if there's a convention (e.g. the Kafka connector can follow the same convention as Kafka Connect).
  • If a schema is not found the processor should fail.

@simonl2002 simonl2002 added this to the 1.0 milestone Apr 7, 2023
@neovintage neovintage modified the milestones: 1.0, 0.7.0 Apr 11, 2023
@maha-hajja maha-hajja removed the triage Needs to be triaged label Apr 12, 2023
@lovromazgon lovromazgon self-assigned this Apr 17, 2023
@gedw99
Copy link

gedw99 commented Jun 20, 2023

This seems very good. Is this on roadmap or stalled ?

@lovromazgon
Copy link
Member

It's actively being worked on and should be included in the next release.

@lovromazgon
Copy link
Member

Done, documentation is prepared here ConduitIO/conduit-site#37.

@gedw99
Copy link

gedw99 commented Jul 14, 2023

hey,

@lovromazgon @ahmeroxa

Not sure if your interested, but NATS JetStream also has translate support too.
So for example it can translate incoming encoded messages (e.g. Avro), etc.

Code: https://github.com/search?q=repo%3Anats-io%2Fnatscli%20translate&type=code

This shows it off a bit: https://github.com/metatexx/msgcvt that i use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Archived in project
Development

No branches or pull requests

6 participants