-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writing to kafka topic #2
Comments
Could you give us some more details related to exceptions and logs? Also, if possible (in case it is small enough), would you send us the Avro file you're using along with the schema? Finally, would you also let us know what results you get when running Yet, how is |
felip, I corrected the code, there is only one dataframe df.
./kafka-avro-console-consumer --topic exampletopic --zookeeper ip:2181 --from-beginning prior to running the producer code there wasn't a schema registered. My understanding is that this is fine, and it will pull the schema once you produce a message. No exeptions are thrown from the REPL. |
if I am using confluent kafka (and schema reg) |
It seems that this is the issue for this exception: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1214396213 Would you be able to try the examples as explained here: https://github.com/AbsaOSS/ABRiS? You'll see snippets showing how to use Spark to read from Kafka, so that you can check the results and also examples of how to use the library to perform the schema registration. |
I deleted the topic, recreated it, and used the commandline to producer a few test messages to the same topic. I again tried the same code.. I |
Is this exception being generated by the program or the command line? |
by the commandline. I went through verified the topic was created. I also generated a json with the avro file's schema, and used that instead of schema inference. The first attempt seems to have an issue with talking to Kafka, but after restarting the kafka cluster it runs normally. When i try to use confluent's command line consumer tool with the library, that's the exception thrown. I have tried using the command line producer and it consumes correctly. I have reread your examples. Can you point me to a simple example that loads a avro file and sends it to a kafka topic. I would love to use your library. |
Sure thing. Here there's an example of how to write a Dataframe into Kafka: https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/examples/SampleKafkaDataframeWriterApp.scala Here you'll find an example of how to read it: https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/examples/SampleKafkaAvroFilterApp.scala Since Confluent enhances the payload with the schema version, there's a separate API for this, which you'll also find in the library. |
https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/examples/SampleKafkaDataframeWriterApp.scala |
So I have it registering to the schema reg, and writing that same schema out to a file, then using it. The same error is present. |
This is probably related to the fact that Confluent adds the schema id to the payload and the command line consumer is waiting for it. The library does not add this id since it is a Confluent specific construct. Would you please try reading from Kafka using Spark, as in the example, just to be sure? If this is the case, I can add a |
ok I will just to point out when I run |
This is a Schema Registry specific question, but the expected result is actually schema not found. The GET API always expects you to inform the version. Check the spec for the GET method: https://docs.confluent.io/current/schema-registry/docs/api.html Also, take a look at my previous comment, please. |
yes I am able to read from the topic, so it appears to be a schema-reg issue. |
When you say that confluent adds a schema Id to the payload, do you mean the schema id that's sent to the schema reg or are you saying that the schema id is added to the kafka message? |
I get the same error when I use .fromConfluentAvro as in the example. I do not get this error when I use the commandline producer and .fromConfluentAvro |
how can I verify that the id is being sent with the kafka message? |
In the context of the library, there is a |
I get the same error when it tries to retrieve the schema. |
"it" what? Also, have you been able to run the examples just changing the parameters? Finally, can you share the Avro record you're trying to send, along with the schema? |
removing unused imports
I have tested .toAvro when writing to a kafka topic.
I believe the to avro is serializing, but when the action portion is run spark just hands and the job is not completed. Kafka never sees the message.
import org.apache.spark.sql.SparkSession
import za.co.absa.abris.avro.AvroSerDe._
object oppStreamingMatching {
def main(args: Array[String]): Unit = {
}}
The text was updated successfully, but these errors were encountered: