The 'upstream connect error or disconnect/reset before headers. reset reason: connection failure' error when using collector in OpenShift infrastructure #5091
Unanswered
art-iva-cente
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Doesn't sound like a Jaeger issue to me - I bet if you place OTEL collector between your SDK and Jaeger you'd get the same behavior. My guess would be it has something to do with how networking or service mesh routing is working, maybe they don't detect a downed server in time. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are running a jaeger collector (1.48.1) in OpenShift, and we send the telemetry from our Java Spring applications using opentelemetry javaagent ver 1.28.0. We are using service endpoints in OpenShift as in http://jaeger-collector-headless.qa-app-monitoring.svc:4317, and the issue is when the collector pod restarts the javaagent can't reconnect again. What happens instead the javaagent reports the error below continuously:
What could be the cause of the issue in terms of how the error maps to the networking/infrastructure problems?
If I simulate the situation locally, i.e. I run the local jaeger-all-in-one container then drop it, restart it, the error is different, and the javaagent restores the connection successfully, that's the error on local that I get:
It's straightforward. It can't connect as the container is intentionally down, then it restores the connection as soon as I restart it.
What is different with the "upstream connect error or disconnect/reset before headers. reset reason: connection failure" compare to "Failed to connect to localhost/0:0:0:0:0:0:0:1:4317" error?
PS: To the javaagent we are passing OTEL_TRACES_EXPORTER=otlp OTEL_METRICS_EXPORTER=none OTEL_EXPORTER_OTLP_ENDPOINT=http://our-openshift.svc:4317 OTEL_EXPORTER_OTLP_PROTOCOL=grpc
Beta Was this translation helpful? Give feedback.
All reactions