Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running COVINS with your own camera #2

Closed
RoZhong opened this issue Nov 15, 2021 · 42 comments
Closed

Error running COVINS with your own camera #2

RoZhong opened this issue Nov 15, 2021 · 42 comments

Comments

@RoZhong
Copy link

RoZhong commented Nov 15, 2021

When I use my camera, I change the camera topic subscribed in ros_mono.cc in ~/ROS/ORB_SLAM3/src, but every time I run it normally for a while, it will appear "bash: line 1: 19366 segfault (core has been transferred) Chu) "rosrun ORB_SLAM3 Mono" error, I want to know what happened, or how to use my camera to run COVINS

@RoZhong
Copy link
Author

RoZhong commented Nov 15, 2021

Eurc data set will also have the problem of bash: line 1: 19520 has been abandoned (core dumped)

@ThomasZiegler
Copy link
Member

Do you get the same error when you test with the EuRoC data and the provided launch file launch_ros_euroc.launch? You might need to adapt the path of voc and cam in the launch file.

@RoZhong
Copy link
Author

RoZhong commented Nov 16, 2021

当您使用 EURoC 数据和提供的启动文件进行测试时,是否遇到相同的错误launch_ros_euroc.launch?您可能需要调整启动文件中的voc和路径cam

I used ~/covins_examples/euroc_examples_mh123_vigba.sh,the dataset is Ruroc ,but there is the same problem.
"./euroc_examples_mh123_vigba.sh:Line 11: 14968 Abandoned (core dumped)"

@RoZhong
Copy link
Author

RoZhong commented Nov 16, 2021

选区_003
选区_004

@RoZhong
Copy link
Author

RoZhong commented Nov 16, 2021

Do you get the same error when you test with the EuRoC data and the provided launch file launch_ros_euroc.launch? You might need to adapt the path of voc and cam in the launch file.

I am looking forward to your help to solve this problem. As a newcomer, I am honored to have this opportunity to communicate with you.

@RoZhong
Copy link
Author

RoZhong commented Nov 17, 2021

当您使用 EURoC 数据和提供的启动文件进行测试时,是否遇到相同的错误launch_ros_euroc.launch?您可能需要调整启动文件中的voc和路径cam

I used the launch of ROS to run the EuRoc dataset, and I used the sh file in covins_example to run the Euroc dataset again. I found that it is OK to run two datasets in turn, but the dataset cannot run at the same time, otherwise the dataset will stop before it is finished. If you let the dataset run in turn, there will be Abandoned (core dumped) at the end of the last dataset. But the dataset is complete, so is this normal? I hope you can write a tutorial for using your own camera. Thank you very much.

@patriksc
Copy link
Collaborator

Hi! I just tested COVINS with 2 ROS launch files running in parallel, it seems to be working on my side. It would be great if you could provide some more information about your problem, potentially a screenshot of a case when the error occurs would be helpful, and a more exact description of what commands you are executing. From the 2 screenshots you shared above, everything seems OK.

Particularly, what is working and what not?

  • If you execute euroc_examples_mh123_vigba.sh, is this working fine?
  • And if you do the following: run COVINS with ROS and MH1 - wait until finished, restart ROS node for ORB-SLAM3 - nun with MH2, is this also working?

A remark regarding using you own camera: COVINS is designed as a visual-inertial system, and only tested with the visual-inertial mode of ORB-SLAM3. We do not support running the monocular version so far, for this, you would have adjust the code to your needs, particularly extending the back-end to use SIM(3) transformations instead of SE(3).

@RoZhong
Copy link
Author

RoZhong commented Nov 19, 2021

Hi! I just tested COVINS with 2 ROS startup files running in parallel, and it seems to work for me. It would be great if you could provide some more information about your problem, screenshots of the circumstances under which the error might occur, and a more accurate description of the command you are executing. From the 2 screenshots you shared above, everything seems to be fine.

In particular, what works and what does not work?

  • If you do euroc_examples_mh123_vigba.sh, does this work properly?
  • If you do the following: Run COVINS with ROS and MH1-wait until it is finished, restart the ROS node for ORB-SLAM3-use nuns of MH2, does this also work?

Use the built-in camera. Remarks: COVINS is designed as a visual inertial system and is only tested in the ORB-SLAM3 visual inertial mode. So far, we do not support running the monocular version, for this, you can adjust the code as needed, especially to extend the backend to use SIM(3) conversion instead of SE(3).

①If you do euroc_examples_mh123_vigba.sh, does this work properly?
·I first ran 'euroc_examples_mh123_vigba.sh' (but I made a few modifications because I only downloaded two datasets, so I
only used two), which worked fine, but there was a problem like "/ euroc_example_mh123_vigba.sh:line 11: 6339 (Not
fixed, different every time) Aborted (core dumped)" at the end of the second dataset
Selection_001
.
·Once again, I used my own sh file (written by CCM-SLAM) to make the two datasets run at the same time.
"#! / bin/bash
Gnome-terminal-t "roscore"-x bash-c "source ~ / ws/covins_ws/devel/setup.bash;roscore;exec bash;"
Gnome-terminal-t "covins_backend_node"-x bash-c "source ~ / ws/covins_ws/devel/setup.bash;rosrun covins_backend covins_backend_node;exec bash;"
Sleep 3
Gnome-terminal-t "visual launch"-x bash-c "roslaunch ~ / ws/covins_ws/src/covins/covins_backend/launch/tf.launch;exec bash;"
Gnome-terminal-t "visual rviz"-x bash-c "rviz-d ~ / ws/covins_ws/src/covins/covins_backend/config/covins.rviz;exec bash;"
Sleep 3
Gnome-terminal-t "ORBSLAM3 Front-END1"-x bash-c "source ~ / ws/covins_ws/devel/setup.bash;cd / home/zyf/ws/covins_ws/src/covins/orb_slam3/covins_examples Download /.. / Examples/Monocular-Inertial/mono_inertial_euroc. /.. / Vocabulary/ORBvoc.txt. /.. / Examples/Monocular-Inertial/EuRoC.yaml / home/zyf/ download / MH_01_easy. /.. / Examples/Monocular-Inertial/EuRoC_TimeStamps/MH01.txt dataset-MH01_monoi;exec bash "
Gnome-terminal-t "ORBSLAM3 Front-END2"-x bash-c "source ~ / ws/covins_ws/devel/setup.bash;cd / home/zyf/ws/covins_ws/src/covins/orb_slam3/covins_examples . /.. / Examples/Monocular-Inertial/mono_inertial_euroc. /.. / Vocabulary/ORBvoc.txt. /.. / Examples/Monocular-Inertial/EuRoC.yaml / home/zyf/ download / MH_02_easy. /.. / Examples/Monocular-Inertial/EuRoC_TimeStamps/MH02.txt dataset-MH02_monoi;sleep 10 + rosservice call / covins_gba 01 + exec bash ""
Selection_002
At this time to draw the map as shown in figure 2, you can see that the blue track did not finish, and in the second data set
of the terminal as shown in figure 3, also appeared "bash:line 11: 7672 (Not fixed, different every time) Aborted (core
dumped)", that is to say, my sh file two data sets can not run at the same time, what went wrong?
Selection_003
② Run COVINS with ROS and MH1、MH2 ,it is worked.And do not have any problem.
③ Can you add a tutorial on how to configure your own camera + IMU in the readme? Thank you very much for taking the time to review my problem and help me solve it, thanks again!

@patriksc
Copy link
Collaborator

Thanks for the details. OK, so that's interesting - apparently, it fails only with the second client, on the MH2 sequence, and also rather towards the end. Generally, you should be able to run both scripts in parallel. Unfortunately, it's not directly obvious what the source of the error is, but some things we could try:

  • change the order, run first the agent with MH2 and then the agent with MH1
  • run the front-end with GDB attached, to get more info on the actual code line that causes the failure - like this (you need to adjust it accordingly to your paths etc.): gdb --args ./Examples/Monocular-Inertial/mono_inertial_euroc ./Vocabulary/ORBvoc.txt ./Examples/Monocular-Inertial/EuRoC.yaml "$pathDatasetEuroc"/MH_01 ./Examples/Monocular-Inertial/EuRoC_TimeStamps/MH01.txt dataset-MH01_monoi
  • what is the output when you execute ldd libORB_SLAM3.so | grep opencv in <your_ws_path>/covins/orb_slam3/lib?

Regarding your own data - we will consider adding a tutorial for this, but we will probably not be able to do this very soon. Generally, I would recommend to look in more detail into the provided examples with different types of data, and try to get your data into the same format. Also, since we use ORB-SLAM3 as the VI front-end, looking in the issues there might help as well, e.g.:

@RoZhong
Copy link
Author

RoZhong commented Nov 20, 2021

Thanks for the details. OK, so that's interesting - apparently, it fails only with the second client, on the MH2 sequence, and also rather towards the end. Generally, you should be able to run both scripts in parallel. Unfortunately, it's not directly obvious what the source of the error is, but some things we could try:

  • change the order, run first the agent with MH2 and then the agent with MH1
  • run the front-end with GDB attached, to get more info on the actual code line that causes the failure - like this (you need to adjust it accordingly to your paths etc.): gdb --args ./Examples/Monocular-Inertial/mono_inertial_euroc ./Vocabulary/ORBvoc.txt ./Examples/Monocular-Inertial/EuRoC.yaml "$pathDatasetEuroc"/MH_01 ./Examples/Monocular-Inertial/EuRoC_TimeStamps/MH01.txt dataset-MH01_monoi
  • what is the output when you execute ldd libORB_SLAM3.so | grep opencv in <your_ws_path>/covins/orb_slam3/lib?

Regarding your own data - we will consider adding a tutorial for this, but we will probably not be able to do this very soon. Generally, I would recommend to look in more detail into the provided examples with different types of data, and try to get your data into the same format. Also, since we use ORB-SLAM3 as the VI front-end, looking in the issues there might help as well, e.g.:

Thank you very much for your reply. I have carried out the experiment according to your method, and the results are as follows:
① first I swapped the order of datasets 1 and 2. When I use the euroc_examples_mh123_vigba.sh you wrote (that is, I can play the dataset in turn), I can draw the map, but I found a small problem, which I ignored when I expressed the problem to you last time, that is, both dataset 1 and dataset 2 appear "terminate called without an active exception".
The problem with bash: line 1: 17518 Aborted (core dumped), it's just that when I played the dataset in turn, I didn't see that dataset 1 also reported this error, but the map was done. The error picture is shown in figures 1 and 2
Selection_004
Selection_005
.When I still run the dataset in turn but exchange the order of dataset 1 and 2, the map can also be drawn, and the same problem will occur.The error picture is shown in figures 3 and 4
Selection_006
Selection_007
.The above situation is still good, although there was an error, but the map was still completed, but when I let the dataset run in parallel, a different situation occurred, when I first ran dataset 1 in the terminal and then opened a terminal to run dataset 2, the same error occurred at the end and the map of dataset 2 stopped drawing, and the map and error messages are shown in figures 5, 6, 7.
Selection_013
Selection_011
Selection_012
But if we first open a terminal dataset 2 and open a terminal run dataset 1, the map will run completely, although there will still be the same error message. As shown in figures 8, 9, 10. There is also a point why this time the two curves run at the same time, there are a lot of gray lines, these lines completely block the track, no matter what the playback order, it has happened occasionally before.
Selection_010
Selection_008
Selection_009

②about executing ldd libORB_SLAM3.so | grep opencv in <your_ws_path>/covins/orb_slam3/lib,the result is shown in figure 11.
Selection_014

③Using gdb to debug the program, the final error is shown in the following figure. It seems that there is no such file, but it is saved in ``/home/zyf/ws/covins_ws/src/covins/orb_slam3/covins_examples'', and the change time is also corresponding superior.
Selection_015
Selection_017

@RoZhong
Copy link
Author

RoZhong commented Nov 20, 2021

Thanks for the details. OK, so that's interesting - apparently, it fails only with the second client, on the MH2 sequence, and also rather towards the end. Generally, you should be able to run both scripts in parallel. Unfortunately, it's not directly obvious what the source of the error is, but some things we could try:

  • change the order, run first the agent with MH2 and then the agent with MH1
  • run the front-end with GDB attached, to get more info on the actual code line that causes the failure - like this (you need to adjust it accordingly to your paths etc.): gdb --args ./Examples/Monocular-Inertial/mono_inertial_euroc ./Vocabulary/ORBvoc.txt ./Examples/Monocular-Inertial/EuRoC.yaml "$pathDatasetEuroc"/MH_01 ./Examples/Monocular-Inertial/EuRoC_TimeStamps/MH01.txt dataset-MH01_monoi
  • what is the output when you execute ldd libORB_SLAM3.so | grep opencv in <your_ws_path>/covins/orb_slam3/lib?

Regarding your own data - we will consider adding a tutorial for this, but we will probably not be able to do this very soon. Generally, I would recommend to look in more detail into the provided examples with different types of data, and try to get your data into the same format. Also, since we use ORB-SLAM3 as the VI front-end, looking in the issues there might help as well, e.g.:

@RoZhong
Copy link
Author

RoZhong commented Nov 20, 2021

Thanks for the details. OK, so that's interesting - apparently, it fails only with the second client, on the MH2 sequence, and also rather towards the end. Generally, you should be able to run both scripts in parallel. Unfortunately, it's not directly obvious what the source of the error is, but some things we could try:

  • change the order, run first the agent with MH2 and then the agent with MH1
  • run the front-end with GDB attached, to get more info on the actual code line that causes the failure - like this (you need to adjust it accordingly to your paths etc.): gdb --args ./Examples/Monocular-Inertial/mono_inertial_euroc ./Vocabulary/ORBvoc.txt ./Examples/Monocular-Inertial/EuRoC.yaml "$pathDatasetEuroc"/MH_01 ./Examples/Monocular-Inertial/EuRoC_TimeStamps/MH01.txt dataset-MH01_monoi
  • what is the output when you execute ldd libORB_SLAM3.so | grep opencv in <your_ws_path>/covins/orb_slam3/lib?

Regarding your own data - we will consider adding a tutorial for this, but we will probably not be able to do this very soon. Generally, I would recommend to look in more detail into the provided examples with different types of data, and try to get your data into the same format. Also, since we use ORB-SLAM3 as the VI front-end, looking in the issues there might help as well, e.g.:

image
I use the simplest cout method to find that it seems that it is not the problem of kf storage files, it shows that the storage has ended.

@patriksc
Copy link
Collaborator

  • Regarding the gray lines: this is the full covisibility graph, which is hidden by default, but changing the order of the datasets has probably bypassed that. You can switch it off by deactivating the according message or namespace in the left selection window in RVIZ.

  • Regarding your actual problem: looking at the trajectories, I think it's actually the first agent that does not finish (MH1), not the second one. From your description, I assume that since, if I understand correctly, you are running almost all everything from a single terminal, the otherwise irrelevant error terminate called without an active exception shuts down the second, still active agent, which has a longer trajectory. This error most likely comes from a not correctly joined or detached thread, which goes out of scope at the end of the run, and then creates this error. We will try to push a fix for this soon. Until then, we would recommend running the 2 agents from different terminals, since this seems to be working (particularly, we would recommend running every instruction - roscore, RVIZ, TF launch file, server, agent1 and agent2 in a different terminal, as described in the manual, since this is also the way we usually run the system).

@RoZhong
Copy link
Author

RoZhong commented Nov 23, 2021

  • Regarding the gray lines: this is the full covisibility graph, which is hidden by default, but changing the order of the datasets has probably bypassed that. You can switch it off by deactivating the according message or namespace in the left selection window in RVIZ.
  • Regarding your actual problem: looking at the trajectories, I think it's actually the first agent that does not finish (MH1), not the second one. From your description, I assume that since, if I understand correctly, you are running almost all everything from a single terminal, the otherwise irrelevant error terminate called without an active exception shuts down the second, still active agent, which has a longer trajectory. This error most likely comes from a not correctly joined or detached thread, which goes out of scope at the end of the run, and then creates this error. We will try to push a fix for this soon. Until then, we would recommend running the 2 agents from different terminals, since this seems to be working (particularly, we would recommend running every instruction - roscore, RVIZ, TF launch file, server, agent1 and agent2 in a different terminal, as described in the manual, since this is also the way we usually run the system).

Thank you very much for taking the time to answer my question! In the past few days, I tried again and found that I should have made a mistake when writing the sh file of the data set running on the two terminals. I used the terminal to open the input code one by one to run the data set, whether it was to run first and then run a map or two terminals to run the map at the same time, it was possible to complete the overall mapping by exchanging the order of the data set.
But just like the bug you announced today, there will be "terminate called without an active exception" followed by "Aborted (core dumped)"
Selection_021
Selection_022
Selection_023
Is my situation the same as yours?Looking forward to your reply and your updates.

@patriksc
Copy link
Collaborator

Yes, that looks correct! The 3D landmarks are still a bit noisy, if you now also execute the bundle adjustment (which currently cannot be called), it will look even better.

I get the same error at the end of the run. As mentioned, when the program exits, a thread goes out of scope that is neither detached nor joined. Since it happens on exit, it's nothing that affects the SLAM estimate, you can ignore it. For this reason, it's a minor flaw, but it should still be fixed.

Regarding the Unable to load type [...]: this usually happens when the workspace is not correctly sourced - I would recommend to double-check this. You can start the server node, and then from a second terminal, source the workspace and check whether you can see and call any service from COVINS. Also, check this page, e.g. rosservice list might help.

@patriksc
Copy link
Collaborator

Recent commit e21f25992369d39c9abb920259dd8fef3f30d973 should resolve the terminate called without an active exception issue

@RoZhong
Copy link
Author

RoZhong commented Dec 18, 2021

最近的提交e21f25992369d39c9abb920259dd8fef3f30d973应该可以解决这个terminate called without an active exception问题

When I re-downloaded and compiled, the following error occurred:

  Error: The file /home/zyf/ws/covins_ws/src/libnabo/package.xml is an invalid package.xml file. See below for details:
  Error(s):
         - The "run_depend" tag must not have the following attributes: condition

@RoZhong
Copy link
Author

RoZhong commented Dec 18, 2021

Recent commit e21f25992369d39c9abb920259dd8fef3f30d973 should resolve the terminate called without an active exception issue

File directory: /home/zyf/ws_new/covins_ws/src/covins
Error: There already is a workspace config file .rosinstall at ".". Use wstool install/modify.
Merge caused no change, no new elements found
[aslam_cv2] Updating /home/zyf/ws_new/covins_ws/src/aslam_cv2
[aslam_cv2] Done.
[catkin_boost_python_buildtool] Updating /home/zyf/ws_new/covins_ws/src/catkin_boost_python_buildtool
[catkin_boost_python_buildtool] Done.
[catkin_simple] Updating /home/zyf/ws_new/covins_ws/src/catkin_simple
[catkin_simple] Done.
[ceres_catkin] Updating /home/zyf/ws_new/covins_ws/src/ceres_catkin
[ceres_catkin] Done.
[doxygen_catkin] Updating /home/zyf/ws_new/covins_ws/src/doxygen_catkin
[doxygen_catkin] Done.
[eigen_catkin] Updating /home/zyf/ws_new/covins_ws/src/eigen_catkin
[eigen_catkin] Done.
[eigen_checks] Updating /home/zyf/ws_new/covins_ws/src/eigen_checks
[eigen_checks] Done.
[gflags_catkin] Updating /home/zyf/ws_new/covins_ws/src/gflags_catkin
[gflags_catkin] Done.
[glog_catkin] Updating /home/zyf/ws_new/covins_ws/src/glog_catkin
[glog_catkin] Done.
[libnabo] Updating /home/zyf/ws_new/covins_ws/src/libnabo
[libnabo] Done.
[minkindr] Updating /home/zyf/ws_new/covins_ws/src/minkindr
[minkindr] Done.
[numpy_eigen] Updating /home/zyf/ws_new/covins_ws/src/numpy_eigen
[numpy_eigen] Done.
[opencv3_catkin] Updating /home/zyf/ws_new/covins_ws/src/opencv3_catkin
[opencv3_catkin] Done.
[opengv] Updating /home/zyf/ws_new/covins_ws/src/opengv
[opengv] Done.
[pangolin] Updating /home/zyf/ws_new/covins_ws/src/pangolin
[pangolin] Done.
[protobuf_catkin] Updating /home/zyf/ws_new/covins_ws/src/protobuf_catkin
[protobuf_catkin] Done.
[robopt_open] Updating /home/zyf/ws_new/covins_ws/src/robopt_open
[robopt_open] Done.
[suitesparse] Updating /home/zyf/ws_new/covins_ws/src/suitesparse
[suitesparse] Done.
[yaml_cpp_catkin] Updating /home/zyf/ws_new/covins_ws/src/yaml_cpp_catkin
[yaml_cpp_catkin] Done.
Ubuntu release: bionic
File directory: /home/zyf/ws_new/covins_ws/src/covins
eigen_catkin already set to version 3.3.4
ceres_catkin already set to version 3.3.4
opengv already set to version 3.3.4

Profile: default
Extending: [explicit] /opt/ros/melodic
Workspace: /home/zyf/ws_new/covins_ws

Build Space: [exists] /home/zyf/ws_new/covins_ws/build
Devel Space: [exists] /home/zyf/ws_new/covins_ws/devel
Install Space: [unused] /home/zyf/ws_new/covins_ws/install
Log Space: [missing] /home/zyf/ws_new/covins_ws/logs
Source Space: [exists] /home/zyf/ws_new/covins_ws/src
DESTDIR: [unused] None

Devel Space Layout: merged
Install Space Layout: None

Additional CMake Args: -DCMAKE_BUILD_TYPE=RelWithDebInfo
Additional Make Args: -j8
Additional catkin Make Args: None
Internal Make Job Server: True
Cache Job Environments: False

Whitelisted Packages: None
Blacklisted Packages: None

Workspace configuration appears valid.

Error: The file /home/zyf/ws_new/covins_ws/src/libnabo/package.xml is an invalid package.xml file. See below for details:

Error(s):

  • The "run_depend" tag must not have the following attributes: condition

@patriksc
Copy link
Collaborator

Have you run the install_file.sh a second time? I don't see why otherwise wstools should get active, when you just pull the most recent version from master and re-build COVINS.

Regarding The "run_depend" tag must not have the following attributes: condition: libnabo has updated their package.xml yesterday. Since COVINS doesn't need the libnabo dependency anymore, we have removed it. Build should work again now.

@RoZhong
Copy link
Author

RoZhong commented Dec 19, 2021

Have you run the install_file.sh a second time? I don't see why otherwise wstools should get active, when you just pull the most recent version from master and re-build COVINS.

Regarding The "run_depend" tag must not have the following attributes: condition: libnabo has updated their package.xml yesterday. Since COVINS doesn't need the libnabo dependency anymore, we have removed it. Build should work again now.

I have built it again and it worked , but there is a new problem , when i test the 'euroc_examples_mh12345_vigba.sh', a segfault occurred, and it was not the same as before, this time it was unable to run from the beginning.
选区_040

@Ramseyous0109
Copy link

Have you run the install_file.sh a second time? I don't see why otherwise wstools should get active, when you just pull the most recent version from master and re-build COVINS.
Regarding The "run_depend" tag must not have the following attributes: condition: libnabo has updated their package.xml yesterday. Since COVINS doesn't need the libnabo dependency anymore, we have removed it. Build should work again now.

I have built it again and it worked , but there is a new problem , when i test the 'euroc_examples_mh12345_vigba.sh', a segfault occurred, and it was not the same as before, this time it was unable to run from the beginning. 选区_040

I met the same problem as you. Before the update, there's no problem running on the euroc dataset. The segmentation fault only occurred when I used my own camera with monocular-only orb-slam3 front end. However, after the update of recent commit e21f25992369d39c9abb920259dd8fef3f30d973, the program running on the dataset crashed with segmentation fault just after the front-end program finished initializing and the covins is unable to run from the beginning now.

@patriksc
Copy link
Collaborator

I cannot reproduce this error - on my machine, 'euroc_examples_mh12345_vigba.sh' runs through.

Potentially, something went wrong when applying the recent changes locally. I would recommend to re-build the thirdparty libraries '.../covins_backend/thirdparty/DBoW2', '.../orb_slam3/Thirdparty/DBoW2' and '.../orb_slam3/Thirdparty/g2o', then re-compile 'covins_comm' and 'covins_backend', and finally 'ORB-SLAM3', as well as the ROS version, if required.

Or, even more recommended, re-install COVINS in a clean workspace and from a freshly cloned version.

If the problem then still persists, please provide some more information on the problem - e.g. is the server of agent failing with segfault, what was the last output before the segfault, and potentially some gdb output. In the attached image, I cannot spot any segfault or similar.

@RoZhong
Copy link
Author

RoZhong commented Dec 21, 2021

我无法重现此错误 - 在我的机器上,“euroc_examples_mh12345_vigba.sh”贯穿始终。

在本地应用最近的更改时,可能会出现问题。我建议重新构建第三方库 '.../covins_backend/thirdparty/DBoW2'、'.../orb_slam3/Thirdparty/DBoW2' 和 '.../orb_slam3/Thirdparty/g2o',然后重新编译'covins_comm' 和 'covins_backend',最后是 'ORB-SLAM3',以及 ROS 版本(如果需要)。

或者,更推荐的是,在干净的工作区和新克隆的版本中重新安装 COVINS。

如果问题仍然存在,请提供有关该问题的更多信息 - 例如代理的服务器是否因段错误而失败,段错误之前的最后一个输出是什么,以及可能有一些 gdb 输出。在附图中,我无法发现任何段错误或类似错误。

When an error occurs, the terminal display of covins_backend_node is shown in Figure 1.
选区_042
the terminal of ORB_SLAM3 is figure2.
选区_043

I have test it in the gdb ,and find it has something wrong with communicator_base.cpp
选区_045

@patriksc
Copy link
Collaborator

Did you re-install in a clean workspace? If you change the communicator code, and e.g. do not re-compile the ORB-SLAM3 front-end, you are linking against different versions of covins_comm for front- and back-end, this can cause such errors. We have tested building the current master and running the script on both Ubuntu 18 and 20, and we do not see this error.

@RoZhong
Copy link
Author

RoZhong commented Dec 22, 2021

Did you re-install in a clean workspace? If you change the communicator code, and e.g. do not re-compile the ORB-SLAM3 front-end, you are linking against different versions of covins_comm for front- and back-end, this can cause such errors. We have tested building the current master and running the script on both Ubuntu 18 and 20, and we do not see this error.

I just tried it again,and a new workspace(i do not delet the last version which is in another workspace),but it still has the segmentation fault.
选区_046

@RoZhong
Copy link
Author

RoZhong commented Dec 22, 2021

Did you re-install in a clean workspace? If you change the communicator code, and e.g. do not re-compile the ORB-SLAM3 front-end, you are linking against different versions of covins_comm for front- and back-end, this can cause such errors. We have tested building the current master and running the script on both Ubuntu 18 and 20, and we do not see this error.

and i try the last version,it can run as before

@Ramseyous0109
Copy link

Is it because Covins just don't support the monocular only front-end of ORB-SLAM3? I just read the Readme and found out that it only supports monocular-inertial VIO. Maybe IMU data is necessary for the back-end?

@RoZhong
Copy link
Author

RoZhong commented Dec 22, 2021

Is it because Covins just don't support the monocular only front-end of ORB-SLAM3? I just read the Readme and found out that it only supports monocular-inertial VIO. Maybe IMU data is necessary for the back-end?

I use the Euroc ,it is a vio dataset.I use the last version to run zed2 ,it worked ,but now even the data set can’t run

@patriksc
Copy link
Collaborator

Thanks for the feedback. Hm, OK - this is difficult to debug, since I cannot reproduce the error. It's the ORB-SLAM3 front-end that produces the segfault, right? One thing I could imagine, maybe it's some sort of runtime effect that for some reason occurs on your machine and only with this recent change.
From your logs, I do not see that the client has received the ID from the server yet. One quick thing we could try is, insert a usleep with enough time before line 173 (// Get ID from back-end) in System.cc, like usleep(100000); or usleep(1000000);. If that helps, it's definitely a runtime effect. Could you try this out, share the outcome, and as well share the logs again like in above image?

@patriksc
Copy link
Collaborator

@Ramseyous0109 in case you want to deal with monocular data only, maybe check out CCM-SLAM, one of our past projects: https://github.com/VIS4ROB-lab/ccm_slam

@RoZhong
Copy link
Author

RoZhong commented Dec 23, 2021

Thanks for the feedback. Hm, OK - this is difficult to debug, since I cannot reproduce the error. It's the ORB-SLAM3 front-end that produces the segfault, right? One thing I could imagine, maybe it's some sort of runtime effect that for some reason occurs on your machine and only with this recent change. From your logs, I do not see that the client has received the ID from the server yet. One quick thing we could try is, insert a usleep with enough time before line 173 (// Get ID from back-end) in System.cc, like usleep(100000); or usleep(1000000);. If that helps, it's definitely a runtime effect. Could you try this out, share the outcome, and as well share the logs again like in above image?

I trt as you say and the pcitures are shown below.
选区_052
选区_051
选区_053
选区_054

@Ramseyous0109
Copy link

@Ramseyous0109 in case you want to deal with monocular data only, maybe check out CCM-SLAM, one of our past projects: https://github.com/VIS4ROB-lab/ccm_slam

Thank you for your relpy. I'll try it.

@patriksc
Copy link
Collaborator

Can you check whether pthread is installed correctly - e.g. run sudo apt-get install libpthread-stubs0-dev, and see whether this installs anything?

@RoZhong
Copy link
Author

RoZhong commented Dec 23, 2021

sudo apt-get install libpthread-stubs0-dev

Of course,but i try .it has been installed and nothing was installed.

@patriksc
Copy link
Collaborator

OK, thanks for the feedback. Some more things you could try:

  • Try to comment out line 107 (thread_recv.detach();) in orb_slam3/src/comm/communicator.cpp and see whether this has any effect
  • After line 407 of communicator_base.cpp: print ContainerSize and msg_type_container_.size() using std::cout

Generally, at this point, you need to gradually remove the changes introduced with commit e21f25992369d39c9abb920259dd8fef3f30d973 to find out what exactly causes the segfault. Since I cannot reproduce the problem on multiple machines, I will not revert the changes at this point, however, we need to monitor this issue and see whether is occurs for other users as well. I will also branch out a version without the changed thread handling in commit e21f25992369d39c9abb920259dd8fef3f30d973, which you can then try out and see whether is resolves your issue. However, this will most likely not happen before 2022.
If you find out which changes exactly introduced the segfault on your machine, I would appreciate if you would share this here.

@patriksc
Copy link
Collaborator

By the way, could you share the specs of the machine you are using to run your experiments - OS version, and CPU / RAM?

@RoZhong
Copy link
Author

RoZhong commented Dec 24, 2021

顺便说一下,你能分享一下你用来运行实验的机器的规格 - 操作系统版本和 CPU / RAM 吗?

of course
Ubuntu 18.04.6 LTS
CPU:Intel® Core™ i7-6700HQ CPU @ 2.60GHz × 8
GPU:NVIDIA GeForce GTX 960M/PCIe/SSE2

@RoZhong
Copy link
Author

RoZhong commented Dec 24, 2021

  • comment out line 107

After commenting out line 107, it is the same as before,including segfault and the fault of gdb.
And after adding cout behind "mtx_recv_buffer_.lock()", I find there is not a cout information,and the error information is the same.
I make sure i run the " ./src/covins/install_file.sh 8" after changing

@zhouleiqiang
Copy link

Hello, can you share the tutorial of running COVINS on ZED2? look forward to your reply.

@zhouleiqiang
Copy link

Hello, can you share the tutorial of running COVINS on ZED2? look forward to your reply.

@patriksc
Copy link
Collaborator

patriksc commented Jan 8, 2022

@RoZhong we have created a branch named nothreadfix without the corrected thread handling. Please try this one out and let us know, whether you are still experiencing the segfault.

As already mentioned, we cannot reproduce the error, and therefore not fix it on out side. However, we have generated an issue (#11) the keep track of the issue you have reported.

@patriksc
Copy link
Collaborator

Problem seems resolved for the moment - closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants