-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protobuf schema registry #10839
Add protobuf schema registry #10839
Conversation
Hi @bananaaggle, thanks for the contribution it seems like a useful addition and I would like to review it, but it is impossible if you keep closing and opening as a new PR. Could you please just stick to one PR and do any updates in place so that we can review it? |
Sorry, I will stick to one PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
distribution/bin/check-licenses.py
Outdated
@@ -225,6 +225,9 @@ def build_compatible_license_names(): | |||
compatible_licenses['Apache License v2.0'] = 'Apache License version 2.0' | |||
compatible_licenses['Apache License, version 2.0'] = 'Apache License version 2.0' | |||
compatible_licenses['Apache 2.0 License'] = 'Apache License version 2.0' | |||
compatible_licenses['Apache License, 2.0'] = 'Apache License version 2.0' | |||
compatible_licenses['Confluent Community License'] = 'Confluent Community License' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, what requires the confluent community license? I don't see anything in licenses.yaml mentioning it. Last i knew, the clients to confluent things were still apache licensed (i'm not sure we can distribute anything with this license in Druid packages, would have to check into that, but I suspect not since it has field of use restrictions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the source files for kafka-protobuf-provider
here, https://github.com/confluentinc/schema-registry/tree/master/protobuf-provider/src/main/java/io/confluent/kafka/schemaregistry/protobuf
and there appears to be a mix of Apache and Confluent Community License headers, which I assume it means both licenses apply.
Unfortunately, I think this means that we cannot distribute this jar, but I think that instead we can provide instructions in the documentation on how users can download it (see https://www.apache.org/legal/resolved.html#optional), similar to how we do with the MySQL connector.
The pom.xml
will need to mark this jar as provided, and you will need to remove this license from this python script, and remove kafka-protobuf-provider
from license.yaml
. I think maybe it is ok if tests still depend on it, since it wouldn't cause packaging to distribute it? but i'm unsure exactly...
private Parser<String, Object> parser; | ||
private final List<String> dimensions; | ||
|
||
@JsonCreator | ||
public ProtobufInputRowParser( | ||
@JsonProperty("parseSpec") ParseSpec parseSpec, | ||
@JsonProperty("descriptor") String descriptorFilePath, | ||
@JsonProperty("protoMessageType") String protoMessageType | ||
@JsonProperty("protoBytesDecoder") ProtobufBytesDecoder protobufBytesDecoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that this change is not incompatible with previous druid versions, I think we should probably consider leaving the old constructor properties in place and marking them as deprecated instead of removing them, and then enforce that only one of protoBytesDecoder
or descriptor
/protoMessageType
can be set, and then automatically construct a 'FileBasedProtobufBytesDecoder' if the old properties are set and the new is null.
Then in a future version we can consider dropping them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll implement it.
} | ||
} | ||
|
||
@SuppressWarnings("checkstyle:RightCurly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this suppression for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll delete it.
DynamicMessage message = DynamicMessage.parseFrom(descriptor, rawMessage); | ||
return message; | ||
} | ||
catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this "try" should block be split up error handling so that error messaging for failing to fetch the schema from the registry, which doesn't exactly seem like a parse exception, can be distinguished from an actual parse exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll split it.
…pt to old version
@clintropolis Hi, I change code follow your suggestion two weeks ago, could you please help me review this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry very for the delay @bananaaggle! The code changes lgtm 👍, but I think we need to fix the license stuff. One minor nitpick is that some of the exception handling could maybe be a bit more specific than catching Exception
probably?
Also, could you update the documentation for the protobuf extension, https://github.com/apache/druid/blob/master/docs/development/extensions-core/protobuf.md, to include the new stuff, and instructions to get the jar to use it? You can do something similar to here: https://github.com/apache/druid/blob/master/docs/development/extensions-core/mysql.md#installing-the-mysql-connector-library.
distribution/bin/check-licenses.py
Outdated
@@ -225,6 +225,9 @@ def build_compatible_license_names(): | |||
compatible_licenses['Apache License v2.0'] = 'Apache License version 2.0' | |||
compatible_licenses['Apache License, version 2.0'] = 'Apache License version 2.0' | |||
compatible_licenses['Apache 2.0 License'] = 'Apache License version 2.0' | |||
compatible_licenses['Apache License, 2.0'] = 'Apache License version 2.0' | |||
compatible_licenses['Confluent Community License'] = 'Confluent Community License' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the source files for kafka-protobuf-provider
here, https://github.com/confluentinc/schema-registry/tree/master/protobuf-provider/src/main/java/io/confluent/kafka/schemaregistry/protobuf
and there appears to be a mix of Apache and Confluent Community License headers, which I assume it means both licenses apply.
Unfortunately, I think this means that we cannot distribute this jar, but I think that instead we can provide instructions in the documentation on how users can download it (see https://www.apache.org/legal/resolved.html#optional), similar to how we do with the MySQL connector.
The pom.xml
will need to mark this jar as provided, and you will need to remove this license from this python script, and remove kafka-protobuf-provider
from license.yaml
. I think maybe it is ok if tests still depend on it, since it wouldn't cause packaging to distribute it? but i'm unsure exactly...
It is a good idea. I've separated this jar and write document for its usage. |
@clintropolis Hi, I've written document for this feature. Can you help me refine it? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just a few minor suggestions on docs, thanks 👍
@@ -72,22 +72,47 @@ message Metrics { | |||
} | |||
``` | |||
|
|||
### Descriptor file | |||
### When use descriptor file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### When use descriptor file | |
### When using a descriptor file |
@@ -56,7 +56,7 @@ Here is a JSON example of the 'metrics' data schema used in the example. | |||
|
|||
### Proto file | |||
|
|||
The corresponding proto file for our 'metrics' dataset looks like this. | |||
The corresponding proto file for our 'metrics' dataset looks like this. You can use Protobuf parser with a proto file or Confluent schema registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corresponding proto file for our 'metrics' dataset looks like this. You can use Protobuf parser with a proto file or Confluent schema registry. | |
The corresponding proto file for our 'metrics' dataset looks like this. You can use Protobuf parser with a proto file or [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). |
} | ||
``` | ||
|
||
This feature uses Confluent's Protobuf provider which is not included in the Druid distribution and must be installed separately. You can fetch it and its dependencies from Maven Central at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature uses Confluent's Protobuf provider which is not included in the Druid distribution and must be installed separately. You can fetch it and its dependencies from Maven Central at: | |
This feature uses Confluent's Protobuf provider which is not included in the Druid distribution and must be installed separately. You can fetch it and its dependencies from the Confluent repository and Maven Central at: |
} | ||
``` | ||
|
||
### When use schema registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### When use schema registry | |
### When using Schema Registry |
## Create Kafka Supervisor | ||
|
||
Below is the complete Supervisor spec JSON to be submitted to the Overlord. | ||
Make sure these keys are properly configured for successful ingestion. | ||
|
||
### When use descriptor file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### When use descriptor file | |
### When using a descriptor file |
|
||
Next, we use the `protoc` Protobuf compiler to generate the descriptor file and save it as `metrics.desc`. The descriptor file must be either in the classpath or reachable by URL. In this example the descriptor file was saved at `/tmp/metrics.desc`, however this file is also available in the example files. From your Druid install directory: | ||
|
||
``` | ||
protoc -o /tmp/metrics.desc ./quickstart/protobuf/metrics.proto | ||
``` | ||
|
||
### When use schema registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### When use schema registry | |
### When using Schema Registry |
|
||
Next, we use the `protoc` Protobuf compiler to generate the descriptor file and save it as `metrics.desc`. The descriptor file must be either in the classpath or reachable by URL. In this example the descriptor file was saved at `/tmp/metrics.desc`, however this file is also available in the example files. From your Druid install directory: | ||
|
||
``` | ||
protoc -o /tmp/metrics.desc ./quickstart/protobuf/metrics.proto | ||
``` | ||
|
||
### When use schema registry | ||
|
||
At first make sure your schema registry version is later than 5.5. Next, we post this schema to schema registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first make sure your schema registry version is later than 5.5. Next, we post this schema to schema registry. | |
Make sure your Schema Registry version is later than 5.5. Next, we can post a schema to add it to the registry: |
@clintropolis Hi, thanks for your review. I've refine document for this feature. What else should I do before merging? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks 👍
I think this is g2g after CI. If you are interested in doing a follow-up PR or two, the protobuf extension is one of the few ingestion formats that has not been ported to the newer Another thing, after #10929 goes in it would probably be possible to add an integration test for the functionality added in this PR. A fair warning, our integration tests are a bit rough around the edges to work with, but I think most of the parts would be there (I guess the protobuf jar we can't include would need added to the Dockerfile similar to mysql). I'm not sure we actually have any integration tests using protobuf currently. |
Yeah, I will learn something about InputFormat interface and implement it with protobuf. As for integration test, I'm not very familiar with this component, but I think I can finish it. |
@JsonCreator | ||
public SchemaRegistryBasedProtobufBytesDecoder( | ||
@JsonProperty("url") String url, | ||
@JsonProperty("capacity") Integer capacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I forgot about one thing, could you add support for supplying config and headers to the schema registry client? See https://github.com/apache/druid/pull/10314/files#diff-29d909859e829594d32e7d39a72d233c9947cedf9b35ddf9e4566547799adb0fR51-R52 for details (and docs could be more or less copied too probably https://github.com/apache/druid/pull/10314/files#diff-b5e5f4cf38f24309777ff1750a3ed7d5f55492afe2e8eded4f7f71337b2c4a90R1041)
Hi, @clintropolis. I've added support for supplying config and headers. And I change document about it, can you review it? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes lgtm, thanks 👍
CI is failing asking for druid-protobuf-extensions
to have
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
<version>2.0.1</version>
<scope>provided</scope>
</dependency>
added to the pom.xml, not really sure which of the changes caused that to happen...
Confluent's Schema Registry supports for deserializing protobuf now. So I add this feature to druid-protobuf-extension module. This module only work when user provide a .desc file in old version. Now it can get schema from schema registry dynamically.
This submission adds the io.confluent:kafka-schema-registry-client:6.0.1/io.confluent:kafka-protobuf-provider:6.0.1 dependency to the druid-protobuf-extension module.
Old spec is:
New spec is:
and
Description
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
This PR has:
Key changed/added classes in this PR
MyFoo
OurBar
TheirBaz