Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JGRP000029]java.net.ConnectException: Connection refused during che-server update on OpenShift #15176

Closed
skabashnyuk opened this issue Nov 13, 2019 · 3 comments
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Milestone

Comments

@skabashnyuk
Copy link
Contributor

Is your task related to a problem? Please describe.

I've noticed such a message during an update of che-server on OpenShift

05-Nov-2019 09:31:42.112 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 23493 ms
2019-11-05 09:32:13,732[9d9-dxn8r-30537]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-30537: failed sending message to che-6cddfc74fb-frc9j-42521 (87 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: VERIFY_SUSPECT: [VERIFY_SUSPECT: ARE_YOU_DEAD], TP: [cluster=WorkspaceStateCache]
2019-11-05 09:32:13,733[9d9-dxn8r-60511]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-60511: failed sending message to che-6cddfc74fb-frc9j-43945 (93 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: VERIFY_SUSPECT: [VERIFY_SUSPECT: ARE_YOU_DEAD], TP: [cluster=EclipseLinkCommandChannel]
2019-11-05 09:32:13,762[9d9-dxn8r-51684]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-51684: failed sending message to che-6cddfc74fb-frc9j-13389 (93 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: VERIFY_SUSPECT: [VERIFY_SUSPECT: ARE_YOU_DEAD], TP: [cluster=RemoteSubscriptionChannel]
2019-11-05 09:32:13,762[9d9-dxn8r-19609]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-19609: failed sending message to che-6cddfc74fb-frc9j-57195 (82 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: VERIFY_SUSPECT: [VERIFY_SUSPECT: ARE_YOU_DEAD], TP: [cluster=WorkspaceLocks]
2019-11-05 09:32:14,269[9d9-dxn8r-19609]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-19609: failed sending message to che-6cddfc74fb-frc9j-57195 (132 bytes): java.net.ConnectException: Connection refused (Connection refused), headers: CENTRAL_LOCK: [LockingHeader], TP: [cluster=WorkspaceLocks]
2019-11-05 09:32:15,240[9d9-dxn8r-51684]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-51684: failed sending message to che-6cddfc74fb-frc9j-13389 (78 bytes): java.net.SocketException: Socket closed, headers: FD: heartbeat, TP: [cluster=RemoteSubscriptionChannel]
2019-11-05 09:32:15,254[9d9-dxn8r-19609]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-19609: failed sending message to che-6cddfc74fb-frc9j-57195 (67 bytes): java.net.SocketException: Socket closed, headers: FD: heartbeat, TP: [cluster=WorkspaceLocks]
2019-11-05 09:32:15,267[9d9-dxn8r-30537]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-30537: failed sending message to 10.128.1.43:7800 (144 bytes): java.net.SocketException: Socket closed, headers: KUBE_PING: [GET_MBRS_REQ cluster=WorkspaceStateCache initial_discovery=false], TP: [cluster=WorkspaceStateCache]
2019-11-05 09:32:15,286[9d9-dxn8r-60511]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-60511: failed sending message to che-6cddfc74fb-frc9j-43945 (78 bytes): java.net.SocketException: Socket closed, headers: FD: heartbeat, TP: [cluster=EclipseLinkCommandChannel]
2019-11-05 09:32:17,244[9d9-dxn8r-51684]  [ERROR] [org.jgroups.protocols.TCP 99]        - JGRP000029: che-ff45f99d9-dxn8r-51684: failed sending message to 10.128.1.43:7800 (156 bytes): java.net.SocketTimeoutException: connect timed out, headers: KUBE_PING: [GET_MBRS_REQ cluster=RemoteSubscriptionChannel initial_discovery=false], TP: [cluster=RemoteSubscriptionChannel]

Describe the solution you'd like

Figure out what is going on and how critical it is.

Describe alternatives you've considered

n/a

Additional context

n/a

@skabashnyuk skabashnyuk added kind/task Internal things, technical debt, and to-do tasks to be performed. severity/P1 Has a major impact to usage or development of the system. team/platform labels Nov 13, 2019
@skabashnyuk skabashnyuk added this to the Backlog - Platform milestone Nov 13, 2019
@skabashnyuk
Copy link
Contributor Author

@mshaposhnik can you comment?

@skabashnyuk skabashnyuk pinned this issue Nov 13, 2019
@skabashnyuk skabashnyuk unpinned this issue Nov 13, 2019
@mshaposhnik
Copy link
Contributor

This is an JGroups replication related output.
Usually that means that KUBE_PING discovery protocol is found an container which meets our requirements to be an replication-enabled one (e.g. it is a Che server instance) and jgroups tries to establish connection with it, but for some reason this process is failed (container is unhealthy and cannot accept TCP connections, or it already shot down and KUBE_PING did not rediscovered the changes yet e.t.c.). While usually it is not critical if it happens during rolling updates and amount of errors is not so big, but if such messages repeated continously for a long time it means there is some infrastructure problem and further investigation required.

@skabashnyuk skabashnyuk added kind/bug Outline of a bug - must adhere to the bug report template. and removed kind/task Internal things, technical debt, and to-do tasks to be performed. labels Nov 14, 2019
@mshaposhnik
Copy link
Contributor

I think we can close the issue since there is nothing much to do - we're added property which will prevent connection errors when jgroups cluster node is just starting and not yet able to accept connections. And some small amount of errors on node shutdown is seems to be not an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
No open projects
Development

No branches or pull requests

2 participants