Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash upon connection from Ignition server #62

Open
eudoxos opened this issue Feb 13, 2024 · 6 comments
Open

Crash upon connection from Ignition server #62

eudoxos opened this issue Feb 13, 2024 · 6 comments

Comments

@eudoxos
Copy link
Collaborator

eudoxos commented Feb 13, 2024

I am getting consistent crashes when Ignition client is connecting to QUaServer. Line 775 (where the crash is reported) is this line (a slightly old versionof QUaServer, Dec 2021, there):

UA_Connection* connection = session->header.channel->connection;

Can someone shed light on this? The code is running on a sit which is difficult to access (in docker, so also not easy to debug interactively, put in extra logs etc). It did not happen before the client upgraded their Ignition software (I would remember that).

This is stack trace from the thread where the crash happens (other threads are in poll, syscall or sleep; and all crashes happen here):

19:14:13: Thread 1 (Thread 0x7fffeba841c0 (LWP 17) "mycode-main"):
19:14:13: #0  0x00005555558f1db2 in QUaServer::newSession(QUaServer*, UA_NodeId const*) (server=0x555556632c30, sessionId=0x555556250930) at /tmp/mycode/src/other/QUaServer/src/wrapper/quaserver.cpp:775
19:14:13: session = <optimized out>
19:14:13: clientDescription = {applicationUri = {length = 42, data = 0x5555576d5e20 "uri://127.0.0.1/Ignition%20OPC-UA%20Client"}, productUri = {length = 47, data = 0x5555576d5e60 "https://inductiveautomation.com/scada-software/"}, applicationName = {locale = {length = 2, data = 0x55555757dd70 "en"}, text = {length = 22, data = 0x5555576d6220 "Ignition OPC-UA Client"}}, applicationType = UA_APPLICATIONTYPE_CLIENT, gatewayServerUri = {length = 0, data = 0x0}, discoveryProfileUri = {length = 0, data = 0x0}, discoveryUrlsSize = 0, discoveryUrls = 0x0}
19:14:13: strApplicationUri = {d = 0x7fffc818a940}
19:14:13: strProductUri = {d = 0x555557567d10}
19:14:13: strApplicationName = {d = 0x555557552730}
19:14:13: strAddress = {d = 0x7ffff7df5ae0}
19:14:13: intPort = <optimized out>
19:14:13: connection = 0x0
19:14:13: sockFd = <optimized out>
19:14:13: address = {sa_family = 0, sa_data = "\000\000\000\000\000\000`\333\377\377\377\177\000"}
19:14:13: address_len = 16
19:14:13: remote_name = "\320\351JWUU\000\000\335\317_UUU\000\000P\327\377\377\377\177\000\000\002\307_UUU\000\000\234\000\000\000\000\000\000\000\340,cVUU\000\000\272\227\035\064\000\000\000\000~c\214UUU\000\000\060\t%VUU\000\000\340,cVUU\000\000\060,cVUU\000\000\060\t%VUU\000\000\000\000\000"
19:14:13: #1  0x00005555558f4064 in QUaServer::activateSession(UA_Server*, UA_AccessControl*, UA_EndpointDescription const*, UA_String const*, UA_NodeId const*, UA_ExtensionObject const*, void**) (server=<optimized out>, ac=<optimized out>, endpointDescription=<optimized out>, secureChannelRemoteCertificate=<optimized out>, sessionId=0x555556250930, userIdentityToken=<optimized out>, sessionContext=0x555556250928) at /tmp/mycode/src/other/QUaServer/src/wrapper/quaserver.cpp:687
19:14:13: token = <optimized out>
19:14:13: context = 0x5555574ae9d0
19:14:13: srv = 0x555556632c30
19:14:13: #2  0x00005555558afae2 in Service_ActivateSession ()
19:14:13: #3  0x00005555558bbbc9 in processMSG ()
19:14:13: #4  0x00005555558bc618 in processSecureChannelMessage ()
19:14:13: #5  0x00005555556064a9 in UA_SecureChannel_processBuffer ()
19:14:13: #6  0x00005555558b0452 in UA_Server_processBinaryMessage ()
19:14:13: #7  0x00005555558bd676 in ServerNetworkLayerTCP_listen ()
19:14:13: #8  0x00005555558af116 in UA_Server_run_iterate ()
19:14:13: #9  0x00005555558eea13 in operator() (__closure=0x555556dbd540) at /tmp/mycode/src/other/QUaServer/src/wrapper/quaserver.cpp:1639
19:14:13: this = 0x555556632c30
19:14:13: #10 QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, QUaServer::start()::<lambda()> >::call (arg=<optimized out>, f=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:146
19:14:13: #11 QtPrivate::Functor<QUaServer::start()::<lambda()>, 0>::call<QtPrivate::List<>, void> (arg=<optimized out>, f=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:256
19:14:13: #12 QtPrivate::QFunctorSlotObject<QUaServer::start()::<lambda()>, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=<optimized out>, this_=0x555556dbd530, r=<optimized out>, a=<optimized out>, ret=<optimized out>) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:443
19:14:13: #13 0x00007ffff7d3841e in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
19:14:13: #14 0x00007ffff7d0ae07 in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
19:14:13: #15 0x00007ffff7d0df27 in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
19:14:13: #16 0x00007ffff7d64a67 in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
19:14:13: #17 0x00007ffff5dfdd3b in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
19:14:13: #18 0x00007ffff5e53258 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
19:14:13: #19 0x00007ffff5dfb3e3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
19:14:13: #20 0x00007ffff7d640b8 in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
19:14:13: #21 0x00007ffff7d0975b in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
19:14:13: #22 0x00007ffff7d11cf4 in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
19:14:13: #23 0x00005555555c3f87 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at /tmp/mycode/src/mycode-main-qt.cpp:266
@juangburgos
Copy link
Collaborator

It has been a long time, but by looking at the code around it, it looks like a race condition when accessing the session object. I'd look into the open62541 library to see if there is some mutex that needs to be locked when calling activateSession (maybe I missed that at the time). Or if you are not using any session releated features (keeping track them or user management), then you can simply comment out the accessControl stuff in QUaServer::resetConfig:

	// static methods to reimplement custom behaviour
/*
	config->accessControl.activateSession           = &QUaServer::activateSession;
	config->accessControl.closeSession              = &QUaServer::closeSession;
	config->accessControl.getUserRightsMask         = &QUaServer::getUserRightsMask;
	config->accessControl.getUserAccessLevel        = &QUaServer::getUserAccessLevel;
	config->accessControl.getUserExecutable         = &QUaServer::getUserExecutable;
	config->accessControl.getUserExecutableOnObject = &QUaServer::getUserExecutableOnObject;
*/

You would have to check if this would break anything else, since I cannot remember the details anymore.

@juangburgos
Copy link
Collaborator

You can also give a try to https://github.com/juangburgos/QCrashHandler if you want to obtain a mini dump that can shed some light into the issue, although I give the same support guarantees as for this library 😅 (welcome to open source).

@eudoxos
Copy link
Collaborator Author

eudoxos commented Feb 14, 2024

Thank you for your reply. I have the gdb trace which is enough (thanks for pointing me out to QCrashHandler, though). I will try to see the open62541 code (though it is daunting to me). I have one observation, maybe it will make sense to you. The crash happens when there are many OPC/UA connections already (about 30, over 4 channels — that's what logs say; monitoring some 800 items — sorry, I am not responsible for that...).

And right before the crash, there is always a new connection (and another one closing — interestingly; shortly before or shortly after), and a new connection with "revised lifetime of 600.00s" is opened.

Do you see any causality there, a pointer where to look? Does that confirm that the session needs to be locked?

19:18:26: <open62531>:1    [info] Connection 30 | New connection over TCP from 192.168.210.131
19:18:26: <open62531>:1    [info] Connection 29 | Closed
19:18:27: <open62531>:1    [info] Connection 30 | SecureChannel 5 | SecureChannel opened with SecurityPolicy http://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
19:18:27: Thread 1 "mycode-main" received signal SIGSEGV, Segmentation fault.

@juangburgos
Copy link
Collaborator

I will try to see the open62541 code (though it is daunting to me)....

It is daunting, actually I never really went through it all, just by parts. My recomendation, make a local debug session and put a breakpoint in the place where the crash occurs in production. Then once you are stopped at that point, go on and inspect the functions from open62541 that appear up in the stack of the debugger.

In the end is C code, and from what I remember, it is well written, good variable names and understandable. See if something stands out, some mutex that gets unlocked before calling the user-land callback. Find all references to variables of interest and explore a llittle bit around.

Sorry I cannot do this research for you, I am already retired from the industrial OT world due to low wages and bad working conditions. I moved to the IT side of things, grass is greener if you ever want to make the change 😜

@eudoxos
Copy link
Collaborator Author

eudoxos commented Feb 14, 2024

The traceback is in the log above; I looked at Service_ActivateSession, but will have to look a bit more :) There are no UA_* functions in other threads; can there still be race condition even if the UA server serves from single thread?

19:14:13: #0  0x00005555558f1db2 in QUaServer::newSession(QUaServer*, UA_NodeId const*) (server=0x555556632c30, sessionId=0x555556250930) at /tmp/mycode/src/other/QUaServer/src/wrapper/quaserver.cpp:775
[...]
19:14:13: #1  0x00005555558f4064 in QUaServer::activateSession(UA_Server*, UA_AccessControl*, UA_EndpointDescription const*, UA_String const*, UA_NodeId const*, UA_ExtensionObject const*, void**) (server=<optimized out>, ac=<optimized out>, endpointDescription=<optimized out>, secureChannelRemoteCertificate=<optimized out>, sessionId=0x555556250930, userIdentityToken=<optimized out>, sessionContext=0x555556250928) at /tmp/mycode/src/other/QUaServer/src/wrapper/quaserver.cpp:687
[...]
19:14:13: #2  0x00005555558afae2 in Service_ActivateSession ()
19:14:13: #3  0x00005555558bbbc9 in processMSG ()
19:14:13: #4  0x00005555558bc618 in processSecureChannelMessage ()
19:14:13: #5  0x00005555556064a9 in UA_SecureChannel_processBuffer ()
19:14:13: #6  0x00005555558b0452 in UA_Server_processBinaryMessage ()
19:14:13: #7  0x00005555558bd676 in ServerNetworkLayerTCP_listen ()
19:14:13: #8  0x00005555558af116 in UA_Server_run_iterate ()

@juangburgos
Copy link
Collaborator

I wanted to quickly build the latest master and take a look to see if I could spot something, but it seems master does not build with Qt5 anymore 😆 . Sorry amigo, I think you are on your own on this one, or maybe @sergey-kuzminov can help you out, he seems to be recently using the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants