-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After upgrade to 8.2: Kafka events fail to post #7123
Comments
I might have understood the problem. I enabled DEBUG and I see:
so connection fails. Why?
Looks like IPv6 firewall is blocking port 9092 and since 8.2 dCache prefers IPV6. I just made this hack:
restarted and I am getting events into Kafka...
|
mksahakyan
added a commit
that referenced
this issue
Apr 26, 2023
Motivation After we upgraded to 8.2 we no longer are getting events into Kafka. We have 3 dCache instances. One 7.2 remaining still publishing to Kafka with no issue. (#7123). The issue is that according to spring-projects/spring-kafka#2251, the kafka-clients provide no hooks to determine that a send failed because the broker is down (spring-projects/spring-kafka#2250). This is still not fixed so this should be fixed. Modification Change the LoggingProducerListener so that when TimeoutException will be catch, the error message will indicate that there is a connection issue or the broker is down. Result Log looks like this 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (NFS-localhost) [] Producer failed to send the message, the broker is down or the connection was refused or 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 17:27:51 (pool_write) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag NFS-localhost PoolAcceptFile 0000C9CFA47686574B43B1EF9CF037A24780] Topic billing not present in metadata after 60000 ms. 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] TEST Topic billing not present in metadata after 60000 ms. class org.springframework.kafka.KafkaException 24 Apr 2023 17:28:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqZqubA pool_write DoorTransferFinished 00002B30ED198C494F25A31F589AB91F903F] Producer failed to send the message, the broker is down or the co Target: master 8.2, 9.0 Require-book: no Require-notes: yes Patch: https://rb.dcache.org/r/13967/ Acked-by: Lea Morschel, Abert Rossi, Tigran Mkrtchyan
mksahakyan
added a commit
to mksahakyan/dcache
that referenced
this issue
May 6, 2023
Motivation After we upgraded to 8.2 we no longer are getting events into Kafka. We have 3 dCache instances. One 7.2 remaining still publishing to Kafka with no issue. (dCache#7123). The issue is that according to spring-projects/spring-kafka#2251, the kafka-clients provide no hooks to determine that a send failed because the broker is down (spring-projects/spring-kafka#2250). This is still not fixed so this should be fixed. Modification Change the LoggingProducerListener so that when TimeoutException will be catch, the error message will indicate that there is a connection issue or the broker is down. Result Log looks like this 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (NFS-localhost) [] Producer failed to send the message, the broker is down or the connection was refused or 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 17:27:51 (pool_write) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag NFS-localhost PoolAcceptFile 0000C9CFA47686574B43B1EF9CF037A24780] Topic billing not present in metadata after 60000 ms. 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] TEST Topic billing not present in metadata after 60000 ms. class org.springframework.kafka.KafkaException 24 Apr 2023 17:28:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqZqubA pool_write DoorTransferFinished 00002B30ED198C494F25A31F589AB91F903F] Producer failed to send the message, the broker is down or the co Target: master 8.2, 9.0 Require-book: no Require-notes: yes Patch: https://rb.dcache.org/r/13967/ Acked-by: Lea Morschel, Abert Rossi, Tigran Mkrtchyan
mksahakyan
added a commit
to mksahakyan/dcache
that referenced
this issue
May 6, 2023
Motivation After we upgraded to 8.2 we no longer are getting events into Kafka. We have 3 dCache instances. One 7.2 remaining still publishing to Kafka with no issue. (dCache#7123). The issue is that according to spring-projects/spring-kafka#2251, the kafka-clients provide no hooks to determine that a send failed because the broker is down (spring-projects/spring-kafka#2250). This is still not fixed so this should be fixed. Modification Change the LoggingProducerListener so that when TimeoutException will be catch, the error message will indicate that there is a connection issue or the broker is down. Result Log looks like this 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (NFS-localhost) [] Producer failed to send the message, the broker is down or the connection was refused or 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 17:27:51 (pool_write) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag NFS-localhost PoolAcceptFile 0000C9CFA47686574B43B1EF9CF037A24780] Topic billing not present in metadata after 60000 ms. 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] TEST Topic billing not present in metadata after 60000 ms. class org.springframework.kafka.KafkaException 24 Apr 2023 17:28:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqZqubA pool_write DoorTransferFinished 00002B30ED198C494F25A31F589AB91F903F] Producer failed to send the message, the broker is down or the co Target: master 8.2, 9.0 Require-book: no Require-notes: yes Patch: https://rb.dcache.org/r/13967/ Acked-by: Lea Morschel, Abert Rossi, Tigran Mkrtchyan
lemora
pushed a commit
that referenced
this issue
May 12, 2023
Motivation After we upgraded to 8.2 we no longer are getting events into Kafka. We have 3 dCache instances. One 7.2 remaining still publishing to Kafka with no issue. (#7123). The issue is that according to spring-projects/spring-kafka#2251, the kafka-clients provide no hooks to determine that a send failed because the broker is down (spring-projects/spring-kafka#2250). This is still not fixed so this should be fixed. Modification Change the LoggingProducerListener so that when TimeoutException will be catch, the error message will indicate that there is a connection issue or the broker is down. Result Log looks like this 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (NFS-localhost) [] Producer failed to send the message, the broker is down or the connection was refused or 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 17:27:51 (pool_write) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag NFS-localhost PoolAcceptFile 0000C9CFA47686574B43B1EF9CF037A24780] Topic billing not present in metadata after 60000 ms. 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] TEST Topic billing not present in metadata after 60000 ms. class org.springframework.kafka.KafkaException 24 Apr 2023 17:28:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqZqubA pool_write DoorTransferFinished 00002B30ED198C494F25A31F589AB91F903F] Producer failed to send the message, the broker is down or the co Target: master 8.2, 9.0 Require-book: no Require-notes: yes Patch: https://rb.dcache.org/r/13967/ Acked-by: Lea Morschel, Abert Rossi, Tigran Mkrtchyan
lemora
pushed a commit
that referenced
this issue
May 12, 2023
Motivation After we upgraded to 8.2 we no longer are getting events into Kafka. We have 3 dCache instances. One 7.2 remaining still publishing to Kafka with no issue. (#7123). The issue is that according to spring-projects/spring-kafka#2251, the kafka-clients provide no hooks to determine that a send failed because the broker is down (spring-projects/spring-kafka#2250). This is still not fixed so this should be fixed. Modification Change the LoggingProducerListener so that when TimeoutException will be catch, the error message will indicate that there is a connection issue or the broker is down. Result Log looks like this 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (pool_write) [] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 16:17:04 (NFS-localhost) [] Producer failed to send the message, the broker is down or the connection was refused or 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] Producer failed to send the message, the broker is down or the connection was refused 24 Apr 2023 17:27:51 (pool_write) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag NFS-localhost PoolAcceptFile 0000C9CFA47686574B43B1EF9CF037A24780] Topic billing not present in metadata after 60000 ms. 24 Apr 2023 17:27:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqLTdag pool_write DoorTransferFinished 0000C9CFA47686574B43B1EF9CF037A24780] TEST Topic billing not present in metadata after 60000 ms. class org.springframework.kafka.KafkaException 24 Apr 2023 17:28:51 (NFS-localhost) [door:NFS-localhost@dCacheDomain:AAX6FqZqubA pool_write DoorTransferFinished 00002B30ED198C494F25A31F589AB91F903F] Producer failed to send the message, the broker is down or the co Target: master 8.2, 9.0 Require-book: no Require-notes: yes Patch: https://rb.dcache.org/r/13967/ Acked-by: Lea Morschel, Abert Rossi, Tigran Mkrtchyan
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After we upgraded to 8.2 we no longer are getting events into Kafka. We have 3 dCache instances. One 7.2 remaining still publishing to Kafka with no issue.
I have tried:
And also I have tried to comment all of the above and use defaults - same results:
I must add that the decision to sprinkle Kafka interaction across services was not the best idea. Trying different configs I have to restart doors and pools ..... Unacceptable in production.
If instead we limited Kafka interactions to billing cell, no transfers would be affected while debugging
Also note, that the version we used has print format error that ends up dumping metric ton of data into log files which are filling up log partitions.
Another bit:
Just doing random restores one at a time on ITB, I see events going to Kafka. Production seem to report some of the events.
So is not a 100% fail. But it mostly fails.
Why is this bad? We want to rely on storage events for data handling. Right now this is completely broken.
This is 8.2.13
I am yet to deploy format fix that Tigran addd recently.
Correction: No. I see no events from any 8.2 installations except one domain - NFS domain. That one successfully reports to Kafka. No config differences between NFS domain and other doors/pools with regard to Kafka.
The text was updated successfully, but these errors were encountered: