Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Registry considers avro.java.string as part of the schema comparison #868

Open
markush81 opened this issue Aug 15, 2018 · 16 comments

Comments

Projects
None yet
8 participants
@markush81
Copy link

commented Aug 15, 2018

Given

{
  "type": "record",
  "name": "Ping",
  "namespace": "de.markush.kafka",
  "fields": [
    {
      "name": "id",
      "type": "string",
      "logicalType": "uuid"
    },
    {
      "name": "created_at",
      "type": "string",
      "logicalType": "iso8601Timestamp"
    }
  ]
}

is published to Schema-Registry

when

Java Producer using generated POJOs with avro-maven-plugin tries to serialize this events with KafkaAvroSerialzerthe lookup for the schema fails because of the fact, that SCHEMA$ property in generated classes looks a little different

then

org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema: {"type":"record","name":"Ping","namespace":"de.markush.kafka","fields":[{"name":"id","type":{"type":"string","avro.java.string":"String"},"logicalType":"uuid"},{"name":"created_at","type":{"type":"string","avro.java.string":"String"},"logicalType":"iso8601Timestamp"}]}
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
	at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:203) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:229) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:296) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:284) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getIdFromRegistry(CachedSchemaRegistryClient.java:132) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getId(CachedSchemaRegistryClient.java:264) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:82) ~[kafka-avro-serializer-5.0.0.jar:na]
	at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53) ~[kafka-avro-serializer-5.0.0.jar:na]
	at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65) ~[kafka-clients-1.0.2.jar:na]
	at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55) ~[kafka-clients-1.0.2.jar:na]
	at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:791) ~[kafka-clients-1.0.2.jar:na]
	at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:768) ~[kafka-clients-1.0.2.jar:na]
	at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:285) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
	at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:357) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
	at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:188) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
	at de.markush.kafka.schemaregistryexamples.producer.PingProducer.ping(PingProducer.java:29) ~[classes/:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_172]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_172]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_172]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_172]
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-5.0.8.RELEASE.jar:5.0.8.RELEASE]
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.0.8.RELEASE.jar:5.0.8.RELEASE]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_172]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_172]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_172]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_172]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_172]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_172]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]

SCHEMA$

{  
   "type":"record",
   "name":"Ping",
   "namespace":"de.markush.kafka",
   "fields":[  
      {  
         "name":"id",
         "type":{  
            "type":"string",
            "avro.java.string":"String"
         },
         "logicalType":"uuid"
      },
      {  
         "name":"created_at",
         "type":{  
            "type":"string",
            "avro.java.string":"String"
         },
         "logicalType":"iso8601Timestamp"
      }
   ]
}

it has these avro.java.string properties.
(I still do not get why these are needed at all and because of this the schema is treated being not equal. Btw. consumers can handle this situation.)

Remark

I guess that #28 is related, but due to having the Map<MD5, SchemaIdAndSubjects> schemaHashToGuid AFAIK it is not enough to just change io.confluent.kafka.schemaregistry.avro.AvroUtils#parseSchema to a more or less schema equal form, especially not Parsing Canonical Form for Schemas since it lacks the at a minimum the default properties (What about order?). Already tried that ... but i assume it is more of a concept change, just storing canonical schemas into the registry from startup and not only change the lookup.

Anyone from Confluent can guide towards a solution? Concerning implementation i would be happy to help.

Version

Confluent Open Source: 5.0.0

@markush81 markush81 changed the title Schema Registry consideres avro.java.string as part of the schema comparison Schema Registry considers avro.java.string as part of the schema comparison Aug 17, 2018

@cricket007

This comment has been minimized.

Copy link

commented Aug 23, 2018

I have one question: Why did you manually register the schema instead of letting the Serializer do it?

Also, can you post the rest of the stack trace and mention version numbers of the Registry, please?

@markush81

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

I edited my previous input: added complete stacktrace and version.

To answer your question:

The setup ist, that there are multiple clients Java, .NET, ruby and even Node. So the process of defining schema and maintaining schema-registry is done centralised (yet manually, in process of writing some tool for it) one the one hand and language agnostic on the other. In case each producer is allowed to register hist own schemas, all consumers (example: Java is producing and all others are consuming) need to be able to live with that language specific properties.

@cricket007

This comment has been minimized.

Copy link

commented Aug 23, 2018

Thanks! And yes, that makes sense. I come from an almost entirely Java producer environment, so this error is a new one to me, but it makes sense given that the MD5 computation is on the raw Schema string body, I believe, not the Parsing Canonical Form, as you mention

@markush81

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

Yes, the key point is to find the right canonical form, cause the one implemented by avro has at least one big issue, it does not consider default property, which is essential.

So there is sth. needed more like language-agnostic form, basically remove all custom properties.

Btw. do you know why avro.java.string is needed at all? AFAIK the consumer does not need it and i have no clue why a producer should need this as well.

@cricket007

This comment has been minimized.

Copy link

commented Aug 23, 2018

Did you look at the PR mentioned on #28 it mentions defaults are preserved.

The java.string property can be set to String, or its parent class CharSequence. That way, the Avro schema can refer to any other sibling of a String type, and it's generally only used by code generators, as far as I know.

@markush81

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

The PR did a custom implementation to preserve them yes, but seems not work with 5.0.0 anymore.

Actually Code Generator does not need the property according to what i experienced so far, but it actually generates them and that causes the issue. It only makes sense if these are used by Serializer ... but until now i did not debug that deep, but will hopefully find time soon.

@cricket007

This comment has been minimized.

Copy link

commented Aug 23, 2018

By code generator, I mean the Avro Maven Plugin's <stringType> tag.

https://github.com/apache/avro/blob/master/lang/java/maven-plugin/src/main/java/org/apache/avro/mojo/AbstractAvroMojo.java#L101-L106

That determines if Java classes are made with String x or CharSequence x

@markush81

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

sure i understand this part, but what i mean is: why does the generator add avro.java.string to the SCHEMA$ of each generated class. But i will find out.

@cricket007

This comment has been minimized.

@markush81

This comment has been minimized.

Copy link
Author

commented Aug 25, 2018

I believe no.

My process is following:

  1. Take a schema, without this property
  2. Generate code from this schema
  3. Look into the generated class and see the property SCHEMA$.

Only now this property occurs. So there is no reason for any later Generator or Compiler steps. Either the Serializer needs this property now when this generated class gets serialized or i do not see the reason.

@markush81

This comment has been minimized.

Copy link
Author

commented Aug 28, 2018

I think i now found out the purpose. It is used by the deserializer, but it seems that deserializer takes two things: reader- (<- the SCHEMA$ property with avro.java.string) and writer-schema (<- from schema registry, without avro.java.string) and somehow this does work. So i question (at least for my setup) the need of it. But i am pretty sure there are reasons for it.

Anyhow: the original problem still exists, how to let @confluentinc Schema Registry be language agnostic and still use it in such situation, that there is a Java client with generated code.

@rayokota rayokota added the enhancement label Nov 9, 2018

@chrisdoberman

This comment has been minimized.

Copy link

commented Dec 20, 2018

Yes, I am having the same issue. We have non-java consumers. The java producer, which uses the generated avro class with the SCHEMA$ field, writes this to the registry - which includes the avro.java.string type.

@mbieser

This comment has been minimized.

Copy link

commented Feb 22, 2019

+1
Having the java libraries accept language-independent schema would be helpful.

@plinioj

This comment has been minimized.

Copy link

commented Jun 18, 2019

+1

2 similar comments
@cbsmith

This comment has been minimized.

Copy link

commented Jun 26, 2019

+1

@clande

This comment has been minimized.

Copy link

commented Jul 12, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.