-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong handling of applications that are started multiple times in parallel #404
Comments
For the re-registration process of application we have to distinguish the following cases.
Gracefully terminated means that the main thread goes out of scope (e.g. by catching Ctrl-C with a signal handler that terminates an endless loop). Hard termination means that Ctrl-C is not caught or we have an assertion etc. The difference is that our runtime d'tor is only executed in case of a graceful termination. Current behavior is:1. a gracefully terminated application re-registers with RouDi w/ active monitoring Proposed future behavior is:The following changes are proposed
1. a gracefully terminated application re-registers with RouDi w/ active monitoring @sculpordwarf @dkroenke @mossmaurice. What do you think? |
@budrus Proposal makes sense to me. I'll take up this issue and work on it in the coming weeks. I plan the following pull requests (in no particular order):
|
@budrus From my point of view:
|
I think you can send the kill to the process but without providing the signal parameter. See here If there is no monitoring and the PID was already reused we cannot differentiate 5 and 6. I'm wondering if you describe a case where an application registers with a PID that already belongs to a registered application. This is an additional scenario, here we could remove the application that is in the list as there are no two processes with same PID. But this case is not the normal case of 5 and 6. Here we would have same process names but different PIDs The attack you describe is a problem with the current message queue design and must be fixed when switching to UDS and message passing. Here we can get the information from the OS which user / application registers. A user provided PID is a bad idea for sure |
…neParser Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
… already registered at RouDi Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
After a chat with @sculpordwarf and @elBoberido we decided to switch to UNIX domain sockets first (#381), before addressing this bug report. The behaviour is implementation specific and therefore there is no need to fix the message queue behaviour as they will be deprecated before v1.0. Current behaviour example (3 & 6):
Ideas to allow more than one app with the same name:
|
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
This is the tricky part. Ideally the cleanup is not done by deleting and recreating the IPC channel since this opens the window for a race of a third start of the application which notices that there is no IPC channel and tries to open one. I would suggest to have this sequece:
|
…#404-fix-verbose-output
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…st cases Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…o_moving_other_way_round' into iox-eclipse-iceoryx#404-extend-timeout-registration-info
…ess.hpp and process_manager.hpp Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…n destruction Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…tion posix::FileLock Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…#404-extend-timeout-registration-info
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…rocess c'tor const and check if callable contains value Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…r names Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…rocesses * Add forwarding c'tor to smart_lock * Remove sendMessageToRouDi and use sendRequestToRoudi * Wrap ProcessManager in smart_lock and remove internal mutex Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…#404-extend-timeout-registration-info_new
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…IpcInterfaceCreator Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
… segfault and using EXPECT_DEATH Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
* Initalising member variables * Reword doxygen comments * Reword method names of `ProcessManager` * Use std::unique_ptr in ProcessManager test Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…ce without monitoring, const-correctness and make conversion of pid explicit Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Dietrich Krönke <dietrich.kroenke@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…_test Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…#404-extend-timeout-registration-info_new3
* Use LogFatal instead of LogError * Fix return values in registerProcess() * Add possibility to omit ACK IpcMessage when removing process Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
…meout-registration-info Iox eclipse-iceoryx#404 Extend timeout registration info and send termination request on destruction
Required information
Operating system:
Gentoo LInux
Compiler version:
GCC 9.3.0
Observed result or behaviour:
When a second instance of application is started with the same Runtime name we have a wrong handling
With Monitoring: The second application tries to register again an again as there is no response until it gets a timeout that says there is no RouDi. RouDi ignores the registration as this app is already registered but does not send an error response
Without Monitoring: The second application is accepted. For the first one the shared memory resources are removed. But the first application is still running an on a nice trip to undefined behavior
Expected result or behaviour:
RouDi should prevent that multiple instances of applications with the same runtime name are running in parallel and also instantly report a clear error
Conditions where it occurred / Performed steps:
Start RouDi and the first instance of the application. Then start another instance of the application
Bugfix steps
The text was updated successfully, but these errors were encountered: