Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long boot process and crashes if OPCUA server is not reachable #5

Open
eddybl opened this issue Nov 15, 2022 · 2 comments
Open

Long boot process and crashes if OPCUA server is not reachable #5

eddybl opened this issue Nov 15, 2022 · 2 comments
Assignees

Comments

@eddybl
Copy link
Contributor

eddybl commented Nov 15, 2022

We use one IOC with multiple connections to several OPCUA servers.

Now it seems like one of these OPCUA servers is down. This seems to crash the IOC after a while and also makes the start up very slow while it trys to connect to each individual channel of this not available server. So the ioc init takes a couple of minutes while it is still working through the not available channels and crashes suddenly:

2022/11/15 16:32:18.709832 cbLow <PV_PREFIX_HIDDEN>:PowerMon:F10:Phase:L1 Record processing failed: Error monitoring node: BadInternalError
[2022-11-15 16:32:18.709 (UTC+0100)] info/client Connecting to endpoint opc.tcp://<PLC_URL_HIDDEN>:4840
[2022-11-15 16:32:18.709 (UTC+0100)] info/client SecurityPolicy not specified -> use default #None
[2022-11-15 16:32:18.710 (UTC+0100)] warn/securitypolicy Security policy None is used to create SecureChannel. Accepting all certificates
[2022-11-15 16:32:21.781 (UTC+0100)] warn/network        Connection to opc.tcp://<PLC_URL_HIDDEN>:4840 failed with error: No route to host
[2022-11-15 16:32:21.781 (UTC+0100)] error/client        Opening the TCP socket failed
[2022-11-15 16:32:21.781 (UTC+0100)] error/client        Couldn't connect the client to a TCP secure channel
2022/11/15 16:32:21.781605 non-EPICS_140462526551808 Could not connect to OPC UA server: BadConnectionClosed
[2022-11-15 16:32:21.781 (UTC+0100)] error/network       No connection to server.
[2022-11-15 16:32:21.781 (UTC+0100)] info/client Connecting to endpoint opc.tcp://<PLC_URL_HIDDEN>:4840
[2022-11-15 16:32:21.781 (UTC+0100)] info/client SecurityPolicy not specified -> use default #None
[2022-11-15 16:32:21.781 (UTC+0100)] warn/securitypolicy Security policy None is used to create SecureChannel. Accepting all certificates
2022/11/15 16:32:21.781700 cbLow <PV_PREFIX_HIDDEN>:PowerMon:F10:Phase:L2 Record processing failed: Error monitoring node: BadInternalError
[2022-11-15 16:32:24.857 (UTC+0100)] warn/network        Connection to opc.tcp://<PLC_URL_HIDDEN>:4840 failed with error: No route to host
[2022-11-15 16:32:24.857 (UTC+0100)] error/client        Opening the TCP socket failed
[2022-11-15 16:32:24.857 (UTC+0100)] error/client        Couldn't connect the client to a TCP secure channel
2022/11/15 16:32:24.857576 non-EPICS_140462526551808 Could not connect to OPC UA server: BadConnectionClosed
[2022-11-15 16:32:24.857 (UTC+0100)] error/network       No connection to server.
[2022-11-15 16:32:24.857 (UTC+0100)] info/client Connecting to endpoint opc.tcp://<PLC_URL_HIDDEN>:4840
[2022-11-15 16:32:24.857 (UTC+0100)] info/client SecurityPolicy not specified -> use default #None
[2022-11-15 16:32:24.857 (UTC+0100)] warn/securitypolicy Security policy None is used to create SecureChannel. Accepting all certificates
2022/11/15 16:32:24.857705 cbLow <PV_PREFIX_HIDDEN>:PowerMon:F10:Phase:L3 Record processing failed: Error monitoring node: BadInternalError

Both issues seem less than ideal (slow init and crash), is there something to improve this situation?

@smarsching
Copy link
Contributor

The crashes definitely are a bug that needs to be addressed. Do you have any more details that might help with reproducing this problem (e.g. does it only happen when a server is unavailable or when there is more than one connection defined in the IOC)?

The startup taking so long is the result of the code trying to reconnect if there is no working connection. We could implement a mechanism that blocks reconnection attempts for a certain time after a failed connection attempt. This would probably improve startup times in this scenario, the downside being that this means it might take longer for a connection to be reestablished after the cause of the problem has been resolved.

@eddybl
Copy link
Contributor Author

eddybl commented Nov 15, 2022

I added a branch on the IOC for power monitoring PLCs "test-opcua-issues" with only the PLC which is currently offline.

It seems like without all the other channels the IOC init does work reasonably quickly, but still trying to connect (and failing) to each individual channel one after each other takes a long time, 3-4 seconds per channel, but it seems to try to connect to each channel individually, so right now it takes around 10 minutes to loop through all records. Once all records where processed further errors show up with non-EPICS lines:

2022/11/15 23:13:05.273562 non-EPICS_139649138407168 Could not connect to OPC UA server: BadConnectionClosed

But without the other additional channels it does not seem to crash, whereas with all the channels it does seem to crash (as evident by the cmk e-mails realted to the Power Monitoring PLC IOC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants