New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong seqNo is set when reading from evenehubs #462
Comments
Any update on this? I have also been seeing a similar error, even after ensuring that a dedicated consumer group is created specifically for the readers. The eventhub in question has 2 partitions and there are 2 worker nodes (databricks) part of the same consumer group trying to read from the event-hub. Any help would be much appreciated. |
Facing the same issue, any update on this issue. As mentioned by others I do have dedicated consumer group and the notebook still throws this exception. |
I am facing the same issue. This is a pretty critical bug that makes it near impossible to reliably productionize any pipeline that uses this connector. Please prioritize a fix or workaround to this. |
Could you please tell us the version of the connector that you are using? Are you using the latest version (2.3.14.1) and still seeing this issue? |
I was currently using |
Please let us know if you face the error using 2.3.14.1. In that case, please let me know about your set up and send me the logs. |
When using this setup with Databricks, I noticed that this usually happened when there was a scale up or scale down on auto-scaling worker nodes. And also during development, where we were sending data to eventhub sporadically. In proper staging/prod environment, I decided to have a fixed set of workers - 1 per consumer group/partition combo and things seem to be stable for now. For development situations, setting the eventhub enqueued time to now() seems to be a stable option. Also, in case of databricks especially, the checkout gets corrupted eventually. And needs to be cleared or the job needs to be restarted for it to recover. Hope this helps someone. |
Thanks for sharing the info, @agilityhawk. |
@agilityhawk Did you experience this behavior even in the newest version (2.3.14.1)? |
@k4jiang - We're still using com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.13. |
After upgrading to 2.3.14.1 on 2020-03-10 until now (2020-03-13), we have not encountered the bug. The bug happened maybe once/week so let me sit on it for another week before jumping to conclusions. |
I'm closing the issue. If you see the error again using 2.3.14.1 please reopen. |
@nyaghma We are seeing this issue even on 2.3.14.1.
|
Bug Report:
I am using azure-event-hubs-spark connector to read data from eventhubs and write to one elasticsearch cluster and I got the following errors. After get this error, the streaming job can not resume as it always fali with this error message and I have to change the checkpoint folder to mitigate the issue, but this is causing data loss as the offset is reset to head.
got the following errors
Expected behavior
offset is correctly setup
Spark version
2.4.3
spark-eventhubs artifactId and version
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.13
The text was updated successfully, but these errors were encountered: