Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization step hangs on exit #247

Closed
ojura opened this issue Jan 12, 2017 · 8 comments
Closed

Optimization step hangs on exit #247

ojura opened this issue Jan 12, 2017 · 8 comments
Assignees

Comments

@ojura
Copy link
Contributor

ojura commented Jan 12, 2017

This is a general issue I have noticed while running Cartographer, so I haven't attached a particular bag or a launch file (I have tried running Cartographer on the e.g. Deutches museum bag and observed the issue with the newest builds at the time of writing, e6e3730 and cartographer-project/cartographer@1f27268). I'm using Ubuntu 16.04.1 and ROS Kinetic.

When an interrupt is issued, e.g. using ctrl-c, Cartographer starts an additional optimization step. I've tried to let it run for hours, but it barely makes any progress. In case you issue ctrl-c when using roslaunch, the roslaunch server detects that the Cartographer node hasn't quit after 10-20 seconds and kills it (with a message Escalating to SIGTERM).

Besides the fact that this hang-kill-on-exit behaviour isn't quite regular, I'm wondering what this additional optimization step actually is, I couldn't find it documented anywhere.

@SirVer
Copy link
Contributor

SirVer commented Jan 13, 2017

The optimization step on the sample bags should terminate in a few seconds - if it does not this is an indication that you did not build an optimzed (-O3) version of Cartographer. The difference between -g and -O3 is hours in runtime for Cartographer.

Besides the fact that this hang-kill-on-exit behaviour isn't quite regular, I'm wondering what this additional optimization step actually is, I couldn't find it documented anywhere.

Cartographer queues up global loop closures and processes them in the background. Once you ask it to terminate it will finish all the optimization in the queues to get a properly loop closed result. In normal usage this queue will be cleared up quick enough so that the final optimization is often a NOOP, but in your case the loop closing is also running slow, so the loop gets very long.

@SirVer SirVer closed this as completed Jan 13, 2017
@ojura
Copy link
Contributor Author

ojura commented Jan 13, 2017

Thanks for the advice! Maybe you could consider pointing this out in the install instructions, it could help someone else avoid the same issue. Setting the cmake build type to "Release" fixed the issue.

One other thing - I haven't noticed that Cartographer complains about this queue of optimizations building up during execution (in fact, I recall seeing messages like "We caught up. Hooray!"). Perhaps the user should be alerted if Cartographer fails to complete loop closures in time.

@ojura
Copy link
Contributor Author

ojura commented Jan 24, 2017

Hi, I would like to follow up on this. It seems that CMake flags are not to blame here (which is why @damonkohler opened cartographer-project/cartographer#183 based on my suggestion above, I think you can close it and reopen this one).

I am still getting this error. My conclusion above, that forcing the build mode to Release fixes the issue, is wrong (I had probably built it correctly in the first place). To verify, I have rebuilt Cartographer once more from scratch (with checked out cartographer-project/cartographer@0fe51185beda3 and cartographer_ros @ 0783398).

I've added -frecord-gcc-switches in cartographer/cmake/functions.cmake and dumped the recorded gcc options section from the executable. The whole dump is in this gist - you can verify that -O3 is there.

After compiling, I ran demo_backpack_2d.launch, with cartographer_paper_deutsches_museum.bag, and I had let it run for about a minute. The log is available in the same gist above. Then, I issued a finish_trajectory call, and the problem occurs again. After waiting for an hour or two, the optimization progress is around 30%.

As @SirVer explained above, this implies that optimization is way too slow on my machine - I haven't witnessed some cool loop closure moments like this one when watching Cartographer's map in Rviz.

I am running Cartographer on a Thinkpad T520 with Core i7-2670QM - I hope I'll try this out on different hardware and perhaps with a fresh install of Ubuntu and ROS.

@SirVer
Copy link
Contributor

SirVer commented Jan 24, 2017

@ojura Try running our Docker container and see if the problem exists there too.

It should be simple:

docker build . -f Dockerfile.indigo -t cartographer_container
docker run -it --net=host cartographer_container

then inside the container, roslaunch cartographer and outside launch rviz and replay your bag. If that works fast (which is likely), it is due to something on your system.

@ojura
Copy link
Contributor Author

ojura commented Jan 25, 2017

@SirVer, thank you on a quick response. I have tried building on a fresh install of Ubuntu and compared the build logs - I then realized I was building Ceres without a sparse linear algebra library. The reason this happened is that I forgot to run rosdep install --from-paths src ... on the machine I was working on previously. When I finally did it, it installed the packages libatlas-base-dev and libsuitesparse-dev. After rebuilding, everything works okay.

The reason I got confused above is that at one point I had tried passing -DEIGENSPARSE to use Eigen as a sparse linear algebra library, which helped a little bit, but at one point I forgot about that and rebuilt it without this. Anyway, that is not a configuration that you are supporting officially, and the true fix was to properly follow the install instructions.

To conclude:

@SirVer
Copy link
Contributor

SirVer commented Jan 25, 2017

Thanks for getting back with the conclusion.

maybe issue a big red warning that loop closing will be a few orders of magnitude slower with a build without ATLAS and SuiteSparse

Where would you have expected such a warning as a first time user?

@ojura
Copy link
Contributor Author

ojura commented Jan 25, 2017

I am not sure - the easiest place for placing a warning not to skip that step would be on the Readthedocs installation page. Alternatively, a warning could be emitted perhaps during building, or at runtime (for example, during startup of the Cartographer node).

@ojura
Copy link
Contributor Author

ojura commented Jan 25, 2017

@SirVer, I think enforcing this at build time should be enough, have a look at cartographer-project/cartographer#189.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants