Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing with iceoryx sharedmemory enabled. #1026

Closed
jwrl7 opened this issue Nov 10, 2021 · 5 comments
Closed

Crashing with iceoryx sharedmemory enabled. #1026

jwrl7 opened this issue Nov 10, 2021 · 5 comments

Comments

@jwrl7
Copy link

jwrl7 commented Nov 10, 2021

On several occasion I have seen iceory crash the system due what looks like memory access or out of memory errors.
The latest test I doubled the default segment.mempool from size = 16448 to 32896 and ran into the error below. The log is if after the system had been running for about 6 hours.

System is Nvidia Xavier running JP 4.4.1 and ROS2 Galactic.
Using default

<SharedMemory>
       <Enable>true</Enable>
       <SubQueueCapacity>256</SubQueueCapacity>
       <SubHistoryRequest>16</SubHistoryRequest>
       <PubHistoryCapacity>16</PubHistoryCapacity>
       <LogLevel>info</LogLevel>
 </SharedMemory>

First basic question: What is the best way to changed the default paramters based on our system? Would different paramters help in these cases.

[mav_external_node-6] 2021-11-10 16:56:51.211 [Warning]: ICEORYX error! MEPOO__MEMPOOL_GETCHUNK_POOL_IS_RUNNING_OUT_OF_CHUNKS [ghost_connector_can-11] Mempool [m_chunkSize = 32936, numberOfChunks = 32768, used_chunks = 32768 ] has no more space left [ghost_connector_can-11] 2021-11-10 16:56:51.211 [ Error ]: MemoryManager: unable to acquire a chunk with a chunk-payload size of 64The following mempools are available: MemPool [ ChunkSize = 32936, ChunkPayloadSize = 32896, ChunkCount = 32768 ] [ghost_connector_can-11] 2021-11-10 16:56:51.211 [Warning]: ICEORYX error! MEPOO__MEMPOOL_GETCHUNK_POOL_IS_RUNNING_OUT_OF_CHUNKS [component_container-1] Mempool [m_chunkSize = 32936, numberOfChunks = 32768, used_chunks = 32768 ] has no more space left [component_container-1] 2021-11-10 16:56:51.211 [ Error ]: MemoryManager: unable to acquire a chunk with a chunk-payload size of 64The following mempools are available: MemPool [ ChunkSize = 32936, ChunkPayloadSize = 32896, ChunkCount = 32768 ] [component_container-1] 2021-11-10 16:56:51.211 [Warning]: ICEORYX error! MEPOO__MEMPOOL_GETCHUNK_POOL_IS_RUNNING_OUT_OF_CHUNKS [ros2_mscl_node-2] 2021-11-10 16:56:51.209 [ Error ]: MemoryManager: unable to acquire a chunk with a chunk-payload size of 64The following mempools are available: MemPool [ ChunkSize = 32936, ChunkPayloadSize = 32896, ChunkCount = 32768 ] [ros2_mscl_node-2] 2021-11-10 16:56:51.212 [Warning]: ICEORYX error! MEPOO__MEMPOOL_GETCHUNK_POOL_IS_RUNNING_OUT_OF_CHUNKS [mav_external_node-6] Mempool [m_chunkSize = 32936, numberOfChunks = 32768, used_chunks = 32768 ] has no more space left [mav_external_node-6] 2021-11-10 16:56:51.212 [ Error ]: MemoryManager: unable to acquire a chunk with a chunk-payload size of 64The following mempools are available: MemPool [ ChunkSize = 32936, ChunkPayloadSize = 32896, ChunkCount = 32768 ] [mav_external_node-6] 2021-11-10 16:56:51.212 [Warning]: ICEORYX error! MEPOO__MEMPOOL_GETCHUNK_POOL_IS_RUNNING_OUT_OF_CHUNKS [ghost_connector_can-11] Mempool [m_chunkSize = 32936, numberOfChunks = 32768, used_chunks = 32768 ] has no more space left

Here are some other example of system crash when the segment.pool size = 16448. Basically to replicate, I just let the system run and eventually this will happen.

ghost@nvidia-desktop:~$ ros2 topic echo /gx5/nav/odom 1636141734.074318 [217] ros2: using network interface eth0 (udp/192.168.168.105) selected arbitrarily from: eth0, eth1, docker0 /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/shared_memory_object/shared_memory.cpp:149 { bool iox::posix::SharedMemory::open(int, mode_t, uint64_t) } ::: [ 2 ] No such file or directory /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/shared_memory_object/shared_memory.cpp:149 { bool iox::posix::SharedMemory::open(int, mode_t, uint64_t) } ::: [ 2 ] No such file or directory Shared Memory does not exist. Unable to create shared memory with the following properties [ name = /iceoryx_mgmt, access mode = AccessMode::READ_WRITE, ownership = OwnerShip::OPEN_EXISTING, mode = 0000, sizeInBytes = 60138536 ] Unable to create SharedMemoryObject since we could not acquire a SharedMemory resource Unable to create a shared memory object with the following properties [ name = /iceoryx_mgmt, sizeInBytes = 60138536, access mode = AccessMode::READ_WRITE, ownership = OwnerShip::OPEN_EXISTING, baseAddressHint = 0, permissions = 0000 ] 2021-11-05 15:48:54.080 [ Error ]: ICEORYX error! POSH__SHM_APP_MAPP_ERR python3: /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:56: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertionfalse' failed.
Aborted (core dumped)

`

2021-11-04 16:54:57.696 [Warning]: Error in sending keep alive [ghost_connector_can-11] internal logic error in unix domain socket "/tmp/roudi" occurred ::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const } ::: [ 107 ] Transport endpoint is not connected [static_transform_publisher-12] internal logic error in unix domain socket "/tmp/roudi" occurred [static_transform_publisher-12] 2021-11-04 16:54:57.712 [Warning]: Error in sending keep alive [async_mav_comms_node-4] internal logic error in unix domain socket "/tmp/roudi" occurred [async_mav_comms_node-4] 2021-11-04 16:54:57.728 [Warning]: Error in sending keep alive [mpc_ros_planner-7] internal logic error in unix domain socket "/tmp/roudi" occurred [mpc_ros_planner-7] 2021-11-04 16:54:57.732 [Warning]: Error in sending keep alive [ros2_mscl_node-2] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcC [ros2_mscl_node-2] internal logic error in unix domain socket "/tmp/roudi" occurred [ros2_mscl_node-2] 2021-11-04 16:54:57.736 [Warning]: Error in sending keep alive [gps_to_utm_publisher_node-10] internal logic error in unix domain socket "/tmp/roudi" occurred [gps_to_utm_publisher_node-10] 2021-11-04 16:54:57.737 [Warning]: Error in sending keep alive [mission_control_node-9] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix :IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const } ::: [ 107 ] Transport endpoint is not connected [mission_bridge_node-8] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix: [ghost_runner_node-3] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const } ::: [ 107 ] Transport endpoint is not connected [mav_external_node-6] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const } ::: [ 107 ] Transport endpoint is not connected [mpc_ros_planner-7] [sdkUpdateA] ERROR!! ERROR!! updateAtime was too long: 4876 out of 2000 [component_container-1] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix: [component_container-1] 2021-11-04 16:54:57.948 [Warning]: Error in sending keep alive [mpc_ros_planner-7] [sdkUpdateA] ERROR!! ERROR!! updateAtime was too long: 2636 out of 2000 [async_mav_control_node-5] internal logic error in unix domain socket "/tmp/roudi" occurred [async_mav_control_node-5] 2021-11-04 16:54:57.989 [Warning]: Error in sending keep alive [ghost_connector_can-11] internal logic error in unix domain socket "/tmp/roudi" occurred [ghost_connector_can-11] 2021-11-04 16:54:57.995 [Warning]: Error in sending keep alive internal logic error in unix domain socket "/tmp/roudi" occurred 2021-11-04 16:54:58.003 [Warning]: Error in sending keep alive [static_transform_publisher-12] internal logic error in unix domain socket "/tmp/roudi" occurred [static_transform_publisher-12] 2021-11-04 16:54:58.013 [Warning]: Error in sending keep alive [ghost_connector_can-11] [INFO] [1636059298.016157833] [ghost_connector_can_node]: ghost_bms_interfaces.msg.GhostBMSInfo(state_of_charge=254, temperature=26, voltage=40494, current=-8910, sys_stat=128) [mpc_ros_planner-7] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const } ::: [ 107 ] Transport endpoint is not connected [mpc_ros_planner-7] internal logic error in unix domain socket "/tmp/roudi" occurred [mpc_ros_planner-7] 2021-11-04 16:54:58.033 [Warning]: Error in sending keep alive [ros2_mscl_node-2] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcC [ros2_mscl_node-2] internal logic error in unix domain socket "/tmp/roudi" occurred :posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const } ::: [ 107 ] Transport endpoint is not connected [gps_to_utm_publisher_node-10] internal logic error in unix domain socket "/tmp/roudi" occurred [mission_control_node-9] internal logic error in unix domain socket "/tmp/roudi" occurred [mission_bridge_node-8] internal logic error in unix domain socket "/tmp/roudi" occurred

@MatthiasKillat
Copy link
Contributor

MatthiasKillat commented Nov 11, 2021

@jwrl7 I have to take a closer look at the error logs but the first appears to be running out of chunks but you just need very small chunks of size a little more than 64 Byte.

... unable to acquire a chunk with a chunk-payload size of 64 ...

We need a little more than 64 Byte due to internal additional information in each chunk (it is unfortunate that this is not transparent, there will be improvements in this regard).

The second error log (separate run I assume) is the consequence of RouDi terminating I assume, but I would need more details. The application then fails to communicate with RouDi

[Warning]: Error in sending keep alive

and terminates as well. To my knowledge the internal socket communication with Roudi breaks down and this is the result. RouDi is therefore some kind of single point of failure, as it controls shared memory communication. It is up for debate what should happen when RouDi terminates though, but as it is now all other applications relying on it are compromised and will output errors/terminate.

So you can optimize your memory config by adding

[[segment.mempool]]
size = 128
count = 1000000

to have 1000000 chunks (set this number according to your system) of size 128 at our disposal (which should be able to store 64 payload bytes + the hidden extra information). Note that the size should be divisible by 32 for technical reasons.

I think you only changed the size, but what is more important if you run out chunks is to increase the corresponding chunk count. Note that it will always use the smallest chunk in which the data fits, and if there is none you run into the error in the log.

There are considerations to improve the ability to configure this. We plan to also add the option to just define how much total shared memory should be used (say 1GB) and the system will optimize/allocate in a (semi)-optimal way on its own. Then there would be no need to specify individual sizes. It is a matter of time and priorities though (it is high on my priority list).
Note that the individual sizes make still sense if the system and its memory usage profile is well known.

Regarding the options

<SharedMemory>
      <Enable>true</Enable>
       <SubQueueCapacity>256</SubQueueCapacity>
       <SubHistoryRequest>16</SubHistoryRequest>
       <PubHistoryCapacity>16</PubHistoryCapacity>
       <LogLevel>info</LogLevel>
 </SharedMemory>

The extra options will soon disappear in ROS 2 Rollling, they should be derived internally from QoS settings.
You can try to reduce the SubQueueCapacity to something smaller, like 16 if you are mainly interested in the last few data samples. This will reduce the risk of running out of chunks.
You can also reduce the other settings to 1 both if you are not interested in historical data from a publisher. Note that SubHistoryRequest <= PubHistoryCapacity is required.

If you need more information about this I can elaborate further, but as I said those options will disappear.

Finally I run into a phenomenon with the ROS 2 exectuor (which implicitly runs) myself:
When I took data from a subscription in a callback and performed computation in the callback, I started running out of chunks. I identified the reason and it requires fixes in rmw_cyclonedds, which we will add soon. This will only have effect on Rolling though.
In the meantime you may solve this by offloading the computation to a separate thread. See for example https://github.com/MatthiasKillat/ros2_shm_vision_demo/blob/main/src/edge_detector_node.cpp line 32 (note that this is highly experimental code but it maybe illustrates a potential problem/solution). There the message is received and moved (by shared pointer) to another simple buffer. The actual work happens in another thread as soon as the thread as processed the last message. This is only feasible for keep last 1 in this case, but could also be extended to keep last n. In this case a more sophisticated buffer would be needed.

The reason why processing there is a problem is that there will be used memory chunks in iceoryx that are not recyled until the ROS executor calls a specific function which only happens after the subscription callback. If in the meantime (expensive computation) many new samples arrive, those are not available for sending new data.

I can elaborate more and the user side solution is not ideal, ideally the internal queues do all the work with KeepLast. After this is fixed in Rolling (in 2 weeks maybe), using ROS 2 Rolling would be an option as well. Let me know whether you need further assistance.

@jwrl7
Copy link
Author

jwrl7 commented Nov 11, 2021

I really appreciate the elaborate response. These all make since and will give the recommended settings a try.
Ill be sure to capture as much debug info that I can if I see similar issues after make the change.

While I was typing this, and before I making the recommended changes, I did get another crash so here is more info:

[async_mav_control_node-5] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[async_mav_control_node-5] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[async_mav_control_node-5] internal logic error in unix domain socket "/tmp/roudi" occurred
[async_mav_control_node-5] 2021-11-11 11:28:18.976 [Warning]: Error in sending keep alive
[ghost_runner_node-3] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[ghost_runner_node-3] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[ghost_runner_node-3] internal logic error in unix domain socket "/tmp/roudi" occurred
[ghost_runner_node-3] 2021-11-11 11:28:18.992 [Warning]: Error in sending keep alive
[static_transform_publisher-12] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[static_transform_publisher-12] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[static_transform_publisher-12] internal logic error in unix domain socket "/tmp/roudi" occurred
[static_transform_publisher-12] 2021-11-11 11:28:19.007 [Warning]: Error in sending keep alive
[mission_control_node-9] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
[mission_control_node-9] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
[mission_control_node-9] No server for unix domain socket "/tmp/roudi"
[mpc_ros_planner-7] [INFO] [1636648099.056240722] [mpc_ros_planner_node]: waiting for service...
[ghost_connector_can-11] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[ghost_connector_can-11] /home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:254 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::timedSend(const string&, const iox::units::Duration&) const }  :::  [ 107 ]  Transport endpoint is not connected
[ghost_connector_can-11] internal logic error in unix domain socket "/tmp/roudi" occurred
[ghost_connector_can-11] 2021-11-11 11:28:19.063 [Warning]: Error in sending keep alive
ghost@nvidia-desktop:~$ ros2 topic echo /mcu/state/joint_states
1636648168.842359 [123]       ros2: using network interface eth0 (udp/192.168.168.105) selected arbitrarily from: eth0, eth1, docker0
/home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
/home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
No server for unix domain socket "/tmp/roudi"
/home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
/home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
No server for unix domain socket "/tmp/roudi"
/home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
/home/ghost/builds/uWbfTNKP/0/ghostrobotics/autonomy/galactic_ws/src/eclipse-iceoryx/iceoryx/iceoryx_utils/source/posix_wrapper/unix_domain_socket.cpp:407 { iox::cxx::expected<iox::posix::IpcChannelError> iox::posix::UnixDomainSocket::initalizeSocket(iox::posix::IpcChannelMode) }  :::  [ 111 ]  Connection refused
No server for unix domain socket "/tmp/roudi"
2021-11-11 11:29:28.843 [Warning]: RouDi not found - waiting ...

ghost@nvidia-desktop:~$ tail -f iox_roudi.log
Log level set to: [Warning]
Reserving 61193080 bytes in the shared memory [/iceoryx_mgmt]
[ Reserving shared memory successful ]
Reserving 1079246848 bytes in the shared memory [/ghost]
[ Reserving shared memory successful ]
RouDi is ready for clients
2021-11-11 11:08:20.755 [Warning]: Application iceoryx_rt_10213_1636643309186394197 not responding (last response 1544 milliseconds ago) --> removing it


ghost@nvidia-desktop:~$ sudo systemctl status iox_roudi.service
[sudo] password for ghost:
● iox_roudi.service - Ghost ICE ORYX
Loaded: loaded (/etc/systemd/system/iox_roudi.service; enabled; vendor preset: enabled)
Active: failed (Result: signal) since Thu 2021-11-11 11:08:21 EST; 23min ago
Docs: man:ghost()
Process: 5406 ExecStart=/bin/bash /home/ghost/current_ros2/run_iceoryx.sh (code=exited, status=0/SUCCESS)
Main PID: 8085 (code=killed, signal=SEGV)

Nov 11 10:07:58 nvidia-desktop systemd[1]: Starting Ghost ICE ORYX...
Nov 11 10:07:58 nvidia-desktop bash[5406]: Starting ICE ORYX
Nov 11 10:07:58 nvidia-desktop bash[5406]: Lets Get (a little bit) RouDI!!!
Nov 11 10:08:03 nvidia-desktop systemd[1]: Started Ghost ICE ORYX.
Nov 11 11:08:21 nvidia-desktop systemd[1]: iox_roudi.service: Main process exited, code=killed, status=11/SEGV
Nov 11 11:08:21 nvidia-desktop systemd[1]: iox_roudi.service: Failed with result 'signal'.

@budrus
Copy link

budrus commented Nov 11, 2021

@MatthiasKillat @sumanth-nirmal I think this could be the issue we discussed this week. When the reader cache has an overflow the cleanup callback provided by rmw_cyclone might not properly release the loan.
@jwrl7 We'll check this and come back to you if this is the case and we have a fix. Then we have to see if the other errors are consequential ones

@thijsmie
Copy link
Contributor

I am closing this because no new information is forthcoming and it seems this was not an issue with Cyclone but with iceoryx or the ROS2 RMW. Feel free to open an issue on the relevant project if you still run into issues.

@thijsmie thijsmie closed this as not planned Won't fix, can't repro, duplicate, stale Jun 21, 2022
@sumanth-nirmal
Copy link
Contributor

I think this is related to the issue of chunks not being freed correctly in rmw_cyclonedds, and is fixed in ros2/rmw_cyclonedds#365

(For purposes of tracking)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants