New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low performance while using latest
as schema.id
#105
Comments
@agolovenko , which version are you using? In the past it used to be the case, but in the newest version, 3.1.1, it should be invoked only once by executor. |
It is |
It is strange: this is supposed to be cached, but is not: https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/avro/sql/AvroDataToCatalyst.scala#L51-L55 Here's a a typical trace:
|
maybe issues accessing Schema Registry? Also, are you using 3.1.1? |
As I already mentioned I use: abris 3.1.1 with spark 2.4.5 and scala 2.11.12. |
tried with spark
|
Just as an idea: could it all be for the reason that my app is in fact structured streaming app? Seems like val schemaRegistryConfig = Map(
SchemaManager.PARAM_SCHEMA_REGISTRY_URL -> "https://psrc-4kk0p.westeurope.azure.confluent.cloud",
SchemaManager.PARAM_SCHEMA_REGISTRY_TOPIC -> "input2",
SchemaManager.PARAM_VALUE_SCHEMA_NAMING_STRATEGY -> SchemaManager.SchemaStorageNamingStrategies.TOPIC_NAME,
SchemaManager.PARAM_VALUE_SCHEMA_ID -> "latest", //"100009",
"basic.auth.credentials.source" -> "USER_INFO",
"schema.registry.basic.auth.user.info" -> "...",
"auto.register.schemas" -> "false"
)
val upstream = consumeEventHub() // creates an upstream of messages
upstream
.select(from_confluent_avro(col("body"), schemaRegistryConfig) as "value")
.writeStream
.option("checkpointLocation", "checkpoints")
.format("console")
.start()
.awaitTermination() |
Hi @agolovenko , sorry, was on holidays. It shouldn't be, since the library was developed specifically for the structured API. I'll try to replicate your issue and come back to you asap. |
Hi, I can confirm this is happening to us as well. We are receiving many calls in the Schema Registry API trying to get the latest version. |
Hi @agolovenko and @algorri94 , once again, tks a lot for the help. We seem to have 2 situations here:
The former is definitely a "performance bug" and is something we can quickly address by retrieving the schema before the Catalyst expression is invoked, which would achieve the same performance as when the schema is informed as a plain JSON file. The latter is a bit more involved. As you certainly know, Avro uses writer and reader schemas to provide evolution capabilities. We cannot assume a single writer schema for the whole execution for compatibility reasons, so we rely on CachedSchemaRegistryClient to cache it, which should be doing its job locally, i.e. should not reach Schema Registry back-end for a cached id as we can see here. @cerveada is currently off but I'll ask him for a chat as soon as he's back for us to decide how to better address these questions. In the meantime, if you have ideas or would like to give a PR a try, please, feel free. Cheers. |
Thanks @felipemmelo ! Here's my comment
not all the calls of The problem is that this isn't that this model is the best for this library. You probably what to cache the result of this call, but also not forever but some period of time... |
Hi @agolovenko , my comments on your comments.
Anyway, thank you very much for coming back and we'll soon release an improvement for this. |
Hello in new version (
Could you test the new version and let us know if it works? |
Thanks guys! Great job! |
You are welcome. Since there seems to be no issue any more, I'm closing this ticket. Please open a new one if you have any problems. |
Looks like
SchemaLoader
uses unchached call to get the latest version id:https://github.com/AbsaOSS/ABRiS/blob/master/src/main/scala/za/co/absa/abris/avro/schemas/SchemaLoader.scala#L103-L110
This happens quite often and results in a huge amount of http requests to schema registry. This value could be cached for some time period, and the time period should be configurable.
The text was updated successfully, but these errors were encountered: