Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of inner records in AvroSchemaParser #3640

Closed
janvyhnanek opened this issue Sep 11, 2023 · 5 comments
Closed

Incorrect handling of inner records in AvroSchemaParser #3640

janvyhnanek opened this issue Sep 11, 2023 · 5 comments
Labels
Bug Something isn't working Impacts Documentation Use this to tag Pull Requests that introduce changes that should be documented.

Comments

@janvyhnanek
Copy link

We are migrating to newer version of apicurio-registry-serdes-avro-serde, but it seems there is a breaking change in handling of inner record definitions in AvroSchemaParser from version 2.2.x, it happens in the newest version 2.4.5.Final as well.

If there is an inner record in an avro schema, matching of schema by content during serialization fails, because serializator is tring to match schema without inner record definition. The problematic code is in the method io.apicurio.registry.serde.avro.AvroSchemaParser#getSchemaFromData(java.lang.Record), where setRawSchema sets modified schema without inner record definition.

Is this a bug or an intentional feature in the new versions of serde library? I cannot imagine that we would register every inner record as a separate schema in the registry so that it could be used as a reference, that would be not usable for us since we have large amount of complex structures. To overcome this, we have implemented our implementation of SchemaParser but we would appreciate if there was a standard solution.

Example

This is the original schema with inner record definition, registered in registry and used by serializator to generate java class:

{
  "type": "record",
  "name": "InnerNamespaceTestSchema",
  "namespace": "com.test.namespace.v1",
  "fields": [
    {
      "name": "testField",
      "type": {
        "type": "string",
        "avro.java.string": "String"
      }
    },
    {
      "name": "testField2",
      "type": {
        "type": "record",
        "name": "TestInnerType",
        "fields": [
          {
            "name": "testInnerField",
            "type": {
              "type": "string",
              "avro.java.string": "String"
            }
          }
        ]
      }
    }
  ]
}

And this is schema sent to schema registry during serialization when trying to match it by content. Note that there is only reference to record TestInnerType and not its definition as in the original schema.

{"type":"record","name":"InnerNamespaceTestSchema","namespace":"com.test.namespace.v1","fields":[{"name":"testField","type":{"type":"string","avro.java.string":"String"}},{"name":"testField2","type":"TestInnerType"}]}
@janvyhnanek janvyhnanek added the Bug Something isn't working label Sep 11, 2023
@apicurio-bot
Copy link

apicurio-bot bot commented Sep 11, 2023

Thank you for reporting an issue!

Pinging @andreaTP to respond or triage.

@carlesarnal
Copy link
Member

Hello @janvyhnanek,

This is expected in the latest releases, so artifact references can be re-used across artifacts. That said, we do understand that there are use-case like yours, so we implemented a mode for having the old behaviour, you must set apicurio.registry.dereference-schema=true in your configuration. @smccarthy-ie this is something we must document.

@janvyhnanek let me know if you have any problem.

@EricWittmann EricWittmann added the Impacts Documentation Use this to tag Pull Requests that introduce changes that should be documented. label Sep 14, 2023
@janvyhnanek
Copy link
Author

Thanks, with this settings it works as in previous versions.

I just have a remark concerning this feature, references to other schema would be useful for us as well but only sometimes - if it is really a reference to an external schema. I know, it might be a bit tricky to implement parser since avro does not contain explicit information about imports like in other descriptors (e.g. xsd). The decision would have to be made probably by discovering if referencing schema contains the definition of the referenced schema. What do you think about it?

thanks

@carlesarnal
Copy link
Member

Sorry for the delay, I'm catching up on my GH notifications. If resolving the external schema means resolving it in an external system for us that would be usually a no-go from a security perspective, even if we had the information to resolve it. If you're interested in having a broader conversation about the possibilities here, we can open a discussion for it, for now, I'm closing this issue as the problem you had has been resolved.

@janvyhnanek
Copy link
Author

Hi, my question was about references to schemas in the same registry, I have created a new discussion for it: #4182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Impacts Documentation Use this to tag Pull Requests that introduce changes that should be documented.
Projects
None yet
Development

No branches or pull requests

3 participants