Add support for Confluent Schema Registry in the druid-avro-extension module#3529
Add support for Confluent Schema Registry in the druid-avro-extension module#3529himanshug merged 1 commit intoapache:masterfrom ncolomer:feature-confluent-schema-registry
Conversation
|
@himanshug can you take a look? |
|
@ncolomer we (the Druid project) generally don't release patches of old versions unless there's some critical issue; so, this feature would be slated for 0.9.3. You could always build a custom Druid distro in the meantime though. |
There was a problem hiding this comment.
does this bring kafka jars as well?
if yes, it doesn't look like the code here needs kafka jars in any way... is it possible to depend on something else that brings the SchemaRegistry stuff without kafka jars?
There was a problem hiding this comment.
hola @himanshug, as seen with command mvn -pl extensions-core/avro-extensions dependency:tree, the kafka-schema-registry-client only pulls org.slf4j:slf4j-log4j12:jar:1.7.6 transitive dependency.
|
can you add some UTs? there are plenty of examples in same module. thanks. |
There was a problem hiding this comment.
schema registry repo contains the avro serializer/deserializer , is it possible to use those instead of us knowing the format of the message?
There was a problem hiding this comment.
following my previous comment, sure I could have used other higher-level confluent lib (such as io.confluent:kafka-avro-serializer) but it would have required pulling all kafka stuff... that's why I chose to rely on kafka-schema-registry-client and implement the deserialization logic myself (which is not that complicated in the end). anyway I'm open to any suggestion here :)
|
Yup, I'll add some |
|
@himanshug tests added, see c597384 and |
|
👍 |
|
@himanshug any more comments? |
|
@ncolomer thanks |
|
@ncolomer did you sign the CLA? |
|
@fjy done, thanks |
|
is this really supposed to work with |
|
okay, for future reference it only works for non-union nested fields |
|
Hi, apologies but is there any example on how to set this up exactly? I have both the |
I think as of today (0.19.0) that streaming Avro isn't yet supported by the new In the future, we'll need to add streaming Avro |
|
Hi @gianm, thanks for that fast reply to my depressive comment. Ok now I also understand the docs better with the background information that this is currently only supported in the legacy {
"type": "kafka",
"dataSchema": {
"dataSource": "ticks",
"parser": {
"type": "avro_stream",
"avroBytesDecoder": {
"type": "schema_registry",
"url": "<schema registry url>"
},
"parseSpec": {
"format": "avro",
"flattenSpec": {
"fields": [
{"name": "instrument", "type": "path", "expr": "$.id.instrument"},
{"name": "currency", "type": "path", "expr": "$.id.currency"}
]
},
"timestampSpec": {
"column": "timestamp",
"format": "millis"
},
"dimensionsSpec": {
"dimensions": [
"instrument",
"currency",
{"name": "value", "type": "double"}
]
}
}
}
},
"ioConfig": {
"type": "kafka",
"topic": "ticks",
"consumerProperties": {
"bootstrap.servers": "<bootstrap server addresses>"
}
},
"tuningConfig": {
"type": "kafka",
"logParseExceptions": true
}
}The above assumes that there is a topic named {
"type": "record",
"name": "Tick",
"namespace": "<some namespace>",
"fields": [{
"name": "id",
"type": {
"type": "record",
"name": "Id",
"fields": [{
"name": "instrument",
"type": "string"
}, {
"name": "currency",
"type": "string"
}]
}
}, {
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
}
}, {
"name": "value",
"type": "double"
}]
} |
|
Thank you for sharing @maxstreese! I, too, am looking forward to supporting this functionality in the new |
This PR adds support for deserializing Confluent's Schema Registry encoded avro (see documentation) in the druid-avro-extension.
Schema Registry's binary prefix is different from schemarepo and only contains the schema ID (1
nullbyte + 4-byteintID).This submission only adds the
io.confluent:kafka-schema-registry-client:3.0.1dependency to thedruid-avro-extensionmodule (no transitive ones).It was tested on some of our Avro-encoded Kafka topics.
Ideally, we'd like to backport this to the 0.9.1.x branch since we use Imply Druid's distribution (currently stuck to Druid 0.9.1.1). Is another PR necessary?