Add support for Confluent Schema Registry in the druid-avro-extension module by ncolomer · Pull Request #3529 · apache/druid

ncolomer · 2016-10-03T13:29:23Z

This PR adds support for deserializing Confluent's Schema Registry encoded avro (see documentation) in the druid-avro-extension.

Schema Registry's binary prefix is different from schemarepo and only contains the schema ID (1 null byte + 4-byte int ID).

This submission only adds the io.confluent:kafka-schema-registry-client:3.0.1 dependency to the druid-avro-extension module (no transitive ones).

It was tested on some of our Avro-encoded Kafka topics.

Ideally, we'd like to backport this to the 0.9.1.x branch since we use Imply Druid's distribution (currently stuck to Druid 0.9.1.1). Is another PR necessary?

fjy · 2016-10-03T19:34:28Z

@himanshug can you take a look?

gianm · 2016-10-03T19:44:37Z

@ncolomer we (the Druid project) generally don't release patches of old versions unless there's some critical issue; so, this feature would be slated for 0.9.3. You could always build a custom Druid distro in the meantime though.

himanshug · 2016-10-05T15:33:22Z

extensions-core/avro-extensions/pom.xml

does this bring kafka jars as well?
if yes, it doesn't look like the code here needs kafka jars in any way... is it possible to depend on something else that brings the SchemaRegistry stuff without kafka jars?

hola @himanshug, as seen with command mvn -pl extensions-core/avro-extensions dependency:tree, the kafka-schema-registry-client only pulls org.slf4j:slf4j-log4j12:jar:1.7.6 transitive dependency.

himanshug · 2016-10-05T15:33:58Z

can you add some UTs? there are plenty of examples in same module. thanks.

himanshug · 2016-10-05T15:36:14Z

...o-extensions/src/main/java/io/druid/data/input/avro/SchemaRegistryBasedAvroBytesDecoder.java

schema registry repo contains the avro serializer/deserializer , is it possible to use those instead of us knowing the format of the message?

following my previous comment, sure I could have used other higher-level confluent lib (such as io.confluent:kafka-avro-serializer) but it would have required pulling all kafka stuff... that's why I chose to rely on kafka-schema-registry-client and implement the deserialization logic myself (which is not that complicated in the end). anyway I'm open to any suggestion here :)

ncolomer · 2016-10-05T15:59:05Z

Yup, I'll add some

ncolomer · 2016-11-03T16:36:13Z

@himanshug tests added, see c597384 and SchemaRegistryBasedAvroBytesDecoderTest.java file
EDIT: rebased branch with latest master commit

fjy · 2016-11-08T01:42:53Z

👍

fjy · 2016-11-08T01:43:01Z

@himanshug any more comments?

himanshug · 2016-11-08T22:11:06Z

@ncolomer thanks

fjy · 2016-11-08T22:11:36Z

@ncolomer did you sign the CLA?
http://druid.io/community/cla.html

ncolomer · 2016-11-09T08:37:19Z

@fjy done, thanks

…he#3529)

kosii · 2017-01-04T14:22:18Z

is this really supposed to work with flattenSpec as stated in #3714 ?

kosii · 2017-01-10T08:25:43Z

okay, for future reference it only works for non-union nested fields

…he#3529)

maxstreese · 2020-08-07T18:06:53Z

Hi, apologies but is there any example on how to set this up exactly? I have both the druid-kafka-indexing-service as well as the druid-avro-extensions loaded and can create a spec that discovers my cluster and topic just fine but with the UI there is no Avro parser to be found anywhere. And as for the defining the spec in JSON directly I could not find any example or documentation so far.

gianm · 2020-08-12T15:52:34Z

Hi, apologies but is there any example on how to set this up exactly? I have both the druid-kafka-indexing-service as well as the druid-avro-extensions loaded and can create a spec that discovers my cluster and topic just fine but with the UI there is no Avro parser to be found anywhere. And as for the defining the spec in JSON directly I could not find any example or documentation so far.

I think as of today (0.19.0) that streaming Avro isn't yet supported by the new inputFormat API or by the web console UI. You can still use the legacy parser API, though. It's documented here: https://druid.apache.org/docs/latest/ingestion/data-formats.html#avro-stream-parser

In the future, we'll need to add streaming Avro inputFormat and web console UI support.

maxstreese · 2020-08-14T14:13:58Z

Hi @gianm,

thanks for that fast reply to my depressive comment. Ok now I also understand the docs better with the background information that this is currently only supported in the legacy parser. To me personally the docs seem to not reflect this properly. At least I got quite confused. But in any case I was able to make it work so let me share my final config for some dummy topic in case someone else stumbles across this:

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "ticks",
    "parser": {
      "type": "avro_stream",
      "avroBytesDecoder": {
        "type": "schema_registry",
        "url": "<schema registry url>"
      },
      "parseSpec": {
        "format": "avro",
        "flattenSpec": {
          "fields": [
            {"name": "instrument", "type": "path", "expr": "$.id.instrument"},
            {"name": "currency", "type": "path", "expr": "$.id.currency"}
          ]
        },
        "timestampSpec": {
          "column": "timestamp",
          "format": "millis"
        },
        "dimensionsSpec": {
          "dimensions": [
            "instrument",
            "currency",
            {"name": "value", "type": "double"}
          ]
        }
      }
    }
  },
  "ioConfig": {
    "type": "kafka",
    "topic": "ticks",
    "consumerProperties": {
      "bootstrap.servers": "<bootstrap server addresses>"
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "logParseExceptions": true
  }
}

The above assumes that there is a topic named ticks in your cluster which contains data encoded with the following Avro schema:

{
    "type": "record",
    "name": "Tick",
    "namespace": "<some namespace>",
    "fields": [{
        "name": "id",
        "type": {
            "type": "record",
            "name": "Id",
            "fields": [{
                "name": "instrument",
                "type": "string"
            }, {
                "name": "currency",
                "type": "string"
            }]
        }
    }, {
        "name": "timestamp",
        "type": {
            "type": "long",
            "logicalType": "timestamp-millis"
        }
    }, {
        "name": "value",
        "type": "double"
    }]
}

gianm · 2020-08-14T14:37:23Z

Thank you for sharing @maxstreese! I, too, am looking forward to supporting this functionality in the new inputFormat API too. It should make things simpler in the docs as well.

fjy added this to the 0.9.3 milestone Oct 4, 2016

himanshug reviewed Oct 5, 2016

View reviewed changes

ncolomer mentioned this pull request Oct 6, 2016

Import and use the druid-avro-extension druid-io/tranquility#199

Open

Add support for Confluent Schema Registry in the avro extension

482bd85

himanshug merged commit 37ecffb into apache:master Nov 8, 2016

fjy mentioned this pull request Nov 23, 2016

Getting IllegalArgumentException when using Kafka Indexing service with Confluent Schema registry #3714

Closed

fundead pushed a commit to fundead/druid that referenced this pull request Dec 7, 2016

Add support for Confluent Schema Registry in the avro extension (apac…

1fc78e5

…he#3529)

dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017

Add support for Confluent Schema Registry in the avro extension (apac…

b3f0f08

…he#3529)

bananaaggle mentioned this pull request Jan 25, 2021

add schema registry support for protobuf #10775

Closed

9 tasks

seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Feb 25, 2022

apache#3529 Move RowSignature into common module

73550f0

Conversation

ncolomer commented Oct 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjy commented Oct 3, 2016

Uh oh!

gianm commented Oct 3, 2016

Uh oh!

himanshug Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

ncolomer Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

himanshug commented Oct 5, 2016

Uh oh!

himanshug Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

ncolomer Oct 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncolomer commented Oct 5, 2016

Uh oh!

ncolomer commented Nov 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjy commented Nov 8, 2016

Uh oh!

fjy commented Nov 8, 2016

Uh oh!

himanshug commented Nov 8, 2016

Uh oh!

fjy commented Nov 8, 2016

Uh oh!

ncolomer commented Nov 9, 2016

Uh oh!

kosii commented Jan 4, 2017

Uh oh!

kosii commented Jan 10, 2017

Uh oh!

maxstreese commented Aug 7, 2020

Uh oh!

gianm commented Aug 12, 2020

Uh oh!

maxstreese commented Aug 14, 2020

Uh oh!

gianm commented Aug 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ncolomer commented Oct 3, 2016 •

edited

Loading

ncolomer Oct 5, 2016 •

edited

Loading

ncolomer commented Nov 3, 2016 •

edited

Loading