-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managing schema registry's schema references #49
Managing schema registry's schema references #49
Conversation
Hi @FrancescoPessina thanks a lot for your PR, I appreciate it. I've checked the proposed solution and I think that I discover a bit more. First of all, to save space reference in the target schema will not be de-referenced, it means that if you apply the same approach as I did you will inject it in and it can/will potentially be registered as a new schema version with a de-referenced schema. I think since a schema reference is introduced and Avrora already has something similar we should come to a common behavior that will work for both cases. As a guess, we can keep a reference, but internally store schema as de-referenced and in case if we can't have a reference (old confluent) we can inject it. |
@Strech thank you for your reply :) I'm sorry but I don't understand what are you proposing. The approach in this PR takes the schema with the reference and de-references it only to parse the message read from Kafka. How would you change this approach? |
@FrancescoPessina The issue will happen with such sequence:
My proposal is to keep track of references without de-reference.
|
Ok, I've understood a bit better :)
Still in Java, the Avro maven plugin works the same way of this PR: to decode Avro messages uses the de-referenced schema (with the About your first point, how can I prevent this instruction to fail when is received a Json with the reference? The problem is that the decoder looks for a type (the type referenced) which is not a native type, so I think the On registering side yes, this should be fixed a bit. Both referenced and referencing schemas should be registered. |
@FrancescoPessina I've read about references a bit more and what I can confirm
This already possible with current settings
You are right and it's already used here Lines 159 to 164 in 81e1831
So I refresh a bit of how the references have done now. Lines 125 to 133 in 81e1831
If we can collect references anyway, we also can resolve them via registry I guess. I think it's still not puzzled in my head to complete the solution, but I think we can then leverage on the existing way or enhance it. And then it should work for both cases with maybe some new settings (or existing) UPD1: Here avrora/lib/avrora/storage/registry.ex Line 36 in 5ca985c
|
@Strech ok, I dug a bit more into the code and understood how reference lookup works.
This collects the references contained into a schema inspecting the schema itself. For example, from this schema:
the references extracted will be
But there is one problem: the subject name could be slightly different from the Avro record name. For example, using the TopicRecordNameStrategy (see https://www.confluent.io/blog/put-several-event-types-kafka-topic/) the schema name would be something like So, we have to pass down to |
@RafaelCamarda I think I have an idea. Sine the reference lookup will be defined in the schema registry storage, we can use a closure functionality to resolve the naming issue and not expose the references anywhere, the potential code might look like this avrora/lib/avrora/storage/registry.ex Line 29 in f51856a
def get(key) when is_binary(key) do
with {:ok, schema_name} <- Name.parse(key),
{name, version} <- {schema_name.name, schema_name.version || "latest"},
{:ok, response} <- http_client_get("subjects/#{name}/versions/#{version}"),
{:ok, id} <- Map.fetch(response, "id"),
{:ok, references} <- Map.fetch(response, "$ref") # <---- Meta-code of extracting references
{:ok, version} <- Map.fetch(response, "version"),
{:ok, schema} <- Map.fetch(response, "schema") do
# Meta-code of mapping a subject name to reference name
references = %{
"io.confluent.Payment" => "topic-io.confluent.Payment"
}
lookup_function = fun r -> do
Logger.info("Called reference lookup with reference: " <> inspect(r))
Avrora.Storage.Registry.get(Map.get(references, r)) # <--- Meta-code of getting a real reference subject name
end
{:ok, schema} = Schema.parse(schema)
Logger.debug("obtaining schema `#{schema_name.name}` with version `#{version}`")
{:ok, %{schema | id: id, version: version}}
end
end This approach allows us to keep reference knowledge within the |
This PR enables Avro schema parsing with new schema registry's schema references feature.