Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velocity speed estimate is noisy and unstable with a higher frequency laser (previously: Cartographer crashes with a Pioneer 3DX bag) #193

Closed
ojura opened this issue Jan 26, 2017 · 8 comments
Assignees

Comments

@ojura
Copy link
Contributor

ojura commented Jan 26, 2017

I am getting a crash when mapping with Cartographer and a Pioneer robot. I am using Ubuntu 16.04, ROS Kinetic, cartographer @ddb3c890a6 and cartographer-project/cartographer_ros@f7057fc5a35aed4. A sample crash log can be found here. The launch file and the appropriate bag can be found here. To run, extract the archive, change into the launch directory and call roslaunch pioneer_cartographer.launch from there.

The crash usually occurs somewhere around 60-70 seconds into the bag (counting from the 25th second, where it starts in the launch file) , after the robot makes a right turn to the hallway looking up. The pose estimate makes wild jumps just before it crashes, with a message like the following one: [FATAL]: ... probability_grid.h:161] Check failed: limits_.Contains(xy_index) -215

The crash seems to be related to odometry - if use_odometry is set to false in the .lua file, Cartographer doesn't crash. However, scan matching in that case doesn't work very well; it gets stuck in the same hallway where it crashes when using odometry.

@wohe
Copy link
Member

wohe commented Feb 8, 2017

Hi, this kind of crash happens when the laser scan that is about to be inserted is no longer contained by the current submap. The size of the submap can be configured in trajectory_builder_2d.lua and defaults to 400 meters. I'm not suggesting to change it though, it seems appropriate for your setup. The underlying problem is that the SLAM is not tracking well, i.e. a tuning issue.

I took a quick look at your bag and it seems neither the laser nor the odometry is providing the quality expected by the default settings. If you set

TRAJECTORY_BUILDER_2D.use_online_correlative_scan_matching = true
TRAJECTORY_BUILDER_2D.real_time_correlative_scan_matcher.linear_search_window = 0.
TRAJECTORY_BUILDER_2D.submaps.resolution = 0.1

you can generate a map from your bag. The coarser resolution I chose because of the quality of the laser scan, the use of the correlative scan matcher to figure out the rotation with neither IMU nor odometry. At this point you can even disable the use of odometry and still get a good map.

Please let me know if you have further questions.

@wohe wohe closed this as completed Feb 8, 2017
@ojura ojura changed the title Cartographer crashes with a Pioneer 3DX bag Velocity speed estimate is noisy and unstable with a higher frequency laser (previously: Cartographer crashes with a Pioneer 3DX bag) Mar 16, 2017
@ojura
Copy link
Contributor Author

ojura commented Mar 16, 2017

Hi @wohe,

tackling this has recently been a mini-project for me, and I have now reached a point where I am convinced that you should reopen this issue and take a more in-depth look. It seems that the problem is not that the quality of our data is too low, it is too high :-)

More specifically, the frequency of our laser data is too high (50 Hz, odometry is 10 Hz); precision-wise, it is definitely true that the Sick 100 that has been used in this bag is not the world's most precise 2D LIDAR. However, the odometry on the Pioneer is fairly well tuned, and discarding it was not an option for me. After examining visually the laser and odometry data, I have not found any sudden jumps or anomalies in the data which would explain why Cartographer would have such a wild jump in its pose estimate. This eventually causes out-of-map insertion of laser measurements, as you have explained above, and is the cause of the crash. On the other hand, e.g. GMapping manages to build a map without exhibiting such jumpy behavior.

It is also interesting that the crash is highly deterministic, but also random. It either crashes in the middle of the first hallway, or in the second hallway, after taking a right turn. Which of these happens is influenced by totally random variables, such as whether you are visualizing the map in rviz, or whether publishing of bag transformations in the offline node (from cartographer-project/cartographer_ros#266) is enabled.

This all suggested to me that there is a numeric anomaly in Cartographer, and that the quality of our data is not the primary cause. So, I started digging around Cartographer and debugging it.

(As a side note on debugging, I removed the check in functions.cmake which disables building Cartographer with CMAKE_BUILD_TYPE=Debug. I still used an optimised Ceres build though. When built this way, the Cartographer offline node takes somewhere around 25 minutes to go through the 60 seconds of the bag to get to the place where the crash happens. This sounds completely unusable, however, I used Mozilla's fantastic rr debugger which can rewind back in time. This way, in order to get to the crash, I would have to wait for these 25 minutes to pass only once, and then I could look around and quickly jump between various calls of e.g. AddHorizontalLaserFan by using reverse-continue/reverse-next commands in gdb.)

I have found that the core of the problem is in the local trajectory builder, more specifically, the way it calculates the velocity estimate. Every time AddHorizontalLaserFan is invoked, the following steps are performed, if I have understood everything correctly:

  • call the Predict method, which applies a constant velocity model to form a prediction:

    previous_pose_estimate = pose_estimate_
    pose_estimate_ = pose_estimate_ + delta_t * velocity_estimate_
    
  • call the Ceres scan matcher with the new pose_estimate_ as the initial guess

     initial_guess = pose_estimate_
     pose_estimate_ = ScanMatch(initial_guess, occupancy_grid, ...)
    
    • the objective function prevents the new solution from straying too much from the initial guess, besides considering the correspondence with the occupancy grid
  • update the velocity estimate, effectively this (it is written slightly different in the source code, but if I thought this out correctly, it amounts to the same thing):

     velocity_estimate_ = (pose_estimate_ - previous_pose_estimate) / delta_t
    

A significant problem with this is that no filtering is employed here. When the period between the measurements delta_t is small, velocity_estimate_ becomes completely dominated by scan matching noise. In essence, the scan matcher is fighting itself, because this noise becomes part of the initial guess via the Predict method in the next call of AddHorizontalLaserFan, and a feedback loop is established. Because the estimate is unstable, in one moment, the scan matcher noise manages to prevail, and the velocity and pose estimates diverge.

Here is a plot of the x component of velocity_estimate_ moments before the crash (for the same bag as in the first message). The plot has been cut off when it diverges so you can see the behavior while it "works".

cartographer_velocity

Each point in the plot corresponds to one invocation of AddHorizontalLaserFan, right after velocity_estimate_ is updated. The speed of the robot is constant, almost completely in the x direction and very close to 0.5 m/s. The orange line is the average of the points on the graph from beginning to the point of divergence, showing that Cartographer is globally doing a good job (as you can see on the video). Here is the above graph extended to include all the velocity estimates until the crash happens. Notice the y scale.

cartographer_velocity_divergence

There already is a binary "use or discard the measurement" motion filter in Cartographer, however, in AddHorizontalLaserFan, it only prevents updating the map, not updating the velocity_estimate_.

I have manually filtered the bag to keep every third laser scan. Before, there were 5 laser scan messages for one odometry message (50 Hz vs 10 Hz); decimating the number of laser scans helped improve the stability of the velocity estimate (shown in the same scale on the y axis as the first graph):

every_third

Cartographer does not crash now, and builds a nice map:
every_third_map

However, I think that this is not something that the user should have to do, and should be automatically handled. Some thoughts on how this could be fixed:

  • add some kind of filtering/smoothing for the velocity estimate, or use the existing motion filter for this too
  • calculate the time variance of the velocity estimate and take some action to fix it (e.g. at least warn the user if it is too high); maybe also add a unit test. The criterion could be the ratio of the variance to the mean, both calculated within a sliding window
  • maybe average the dropped scans instead of throwing them away; this would be beneficial when using sensors which have higher sample rates and lower precision, whose noise could actually carry information if it is integrated (similar to e.g. pulse width modulation)

Thanks,
Juraj

@wohe wohe reopened this Mar 17, 2017
@wohe
Copy link
Member

wohe commented Mar 17, 2017

Hi @ojura,

First I want to thank you for the thorough investigation. As far as I see it, your explanations of what is happening are correct.

I am wondering though: Did you also observe the issue without using odometry? Would be nice to know for certain that it is related purely to the high frequency of scan matches.

Regarding how to fix it, I think your first suggestion of smoothing the velocity estimate is the easiest stop-gap solution. The way the update to velocity_estimate_ is written should make it very easy to implement: you just have to multiply the correction that gets added by a factor 0 < r < 1 and you get an exponential moving average. Do you want to give it a try and add the option? Alternatively I can prepare a PR next week.

Now to the reason I consider this a stop-gap solution. Looking at each revolution of the LIDAR independently is certainly convenient to implement, but there is nothing forcing us to do it and it might not give the best result. Instead, we could in the scan matching step look at the data as individual returns (or groups of returns) each taken at known points in time. If we do this, we can decide which time interval we want to look at, use data from multiple revolutions, even multiple sensors, and unwarp each revolution based on how the LIDAR moved while scanning. This approach is used in the 3D SLAM implementation and I would like the 2D code to move in this direction. This is a larger change of course.

@ojura
Copy link
Contributor Author

ojura commented Mar 17, 2017

Hi,

I have not observed the issue without using odometry. But, to be honest, I have not been experimenting much without using odometry (as you suggested above, e.g. turning on the correlative scan matcher; it noticeably slowed down Cartographer on my machine, did not perform as well, and I was confident that the odometry was fine :-) )

It seems that this specific scenario, when using odometry, just happened to step into the region where the velocity estimate diverges, and when it does, it happens in such a dramatic fashion. I think that this slipped under your radar because it does not cause visible effects otherwise (even until the crash happens in our bag, you could say that Cartographer is working fine). The velocity estimate is quite noisy even in the Deutsches museum dataset, just not enough to be unstable:

deutches

I believe that the combination of having a bit lower LIDAR precision and higher sampling rate is the primary factor which pushed us into the unstable region, and not usage of odometry, which I think was coincidental. However, I can not really confirm to you that I am 100% certain that it is not related to odometry at all, the best I can give you is strong suspicion :-)

I have already tried a simple exponential filter, exactly like you mentioned, and could not get a satisfactory result with my bag. With r around 0.1-0.3, we manage to tame the scan matching noise in the velocity estimate, but we also stop trusting odometry. This happens because the odometry correction is applied to the pose estimate only through scan matching, and only in the first scan match after odometry is added; for all the following scan matches, the odometry correction is set to an identity transformation (until the next odometry arrives). If we set r this low (0.1-0.3), this means that we apply the odometry correction to the velocity estimate with importance f_odom / f_laser * r; in my case, this means that the odometry correction is applied 1/5 times, and it now affects the velocity estimate with importance reduced even further with r = 0.1-0.3. Effectively, this means that the pose estimate stops moving when the hallway becomes monotonic and scan matching concludes that the robot is stationary.

Raising r to 0.5 is not enough to start trusting odometry in these monotonic hallway situations, but it already brings us into unstable behavior where the pose estimate starts wildly jumping around (although it does not crash yet, because we do not jump out of the map yet).

I have also tried changing the scan matching objective function parameters. I thought that lowering translation_weight to be smaller in trajectory builder -> scan matcher options would improve stability, but I was fascinated that I could not get Cartographer's behavior to budge even a little by modifying that one or other scan matching parameters. It did not matter how insanely large or small values I put in there, it always behaved the same. Only when I went absolutely overboard, e.g. 1e30 or larger for the occupancy grid correspondence weight, did Ceres start complaining about some matrix having rank too low.

The only way I managed to get a satisfactory result with this bag was to reduce the frequency of the laser scans by a factor of 3. This way, the odometry messages come more often between scan matches, and this also has a stabilizing effect (besides the increased delta_t). With a factor of 2, it was still unstable, and higher factors such as 4, 5, etc. did not work as well, although I think that this could be fixed with further tuning of parameters, but I did not bother to do that because I was happy with the performance when the laser frequency is in the sweet spot of 50 / 3 Hz.

If we take the route of adding an exponential filter to the velocity estimate, I think that we need to have a separate velocity estimate for odometry (and perhaps for the IMU tracker as well), which would perhaps have its own smoothing factor, and then we need to combine the separate velocity estimates somehow.

Thanks for reading all of this :-)
Juraj

By the way, could explain what you mean by "return" in the last paragraph? Is it the pose change from one scan to another, and by "group of returns", did you mean a chain of pose changes from a series of multiple scans?

@ojura
Copy link
Contributor Author

ojura commented Mar 20, 2017

I will add another thing on the wish list here :-)

As far as I have seen, the covariance of the pose estimate obtained by scan matching is not used during the prediction step, where the constant speed motion model is applied (although this covariance is used in building the sparse pose graph).

It would be cool if we could automatically discriminate between e.g. a monotonic hallway, where we should trust the odometer speed, and between wheel slip, where the odometry is in this case incorrect and we should not trust it. In the first case, the scan matcher would conclude that the covariance of the pose is large in the direction of the hallway, and would place greater confidence in the motion model on the odometer speed. In the second case, if the scan matcher is confident enough to conclude that we are in fact stationary, it could discard the incorrect infomation from odometry. (Of course, if we get wheel slip in a monotonic hallway, then we're really unlucky and hope the global pose optimiser will be able to cope with this.)

@Entropy512
Copy link

@ojura Since you have an LMS100 - have you tried changing the scan rate to 25 Hz which allows 0.25 degree angular resolution instead of 0.5 degree? Note that you need to make some changes to the current lms1xx driver node to handle this - see clearpathrobotics/LMS1xx#26 for some discussion on this. (In my case, I removed the scan rate check and switched anything that used outputRange.angleResolution to cfg.angleResolution in LMS1xx_node.cpp, as my LMS100 seems to lie about the angular resolution in the response to the output range query. I'm still investigating this, so it'll be a while before I make any sort of a pull request as what I have may prove to be a dirty hack.)

@wohe When you say the data from ojura's scanner was of low quality - was the issue noise in the range estimates, or too low of an angular resolution?

@ojura
Copy link
Contributor Author

ojura commented Apr 21, 2017

Hi @Entropy512! I haven't tried that, but thanks for the suggestion! That might yield higher quality maps.

For now, I'm mostly concerned with the existing bags that I have, in which the scan rate is 50 Hz, and the solution I am using currently is scan message decimation (you can cherry-pick commits #225, where I added this, if you need it). @wohe is right, the real solution is to update the 2D local trajectory builder to be more sophisticated like the 3D trajectory builder, and a better velocity estimate will naturally follow. If I find the time, I will maybe take that on.

@ojura
Copy link
Contributor Author

ojura commented Sep 6, 2017

I believe this can be considered fixed, so I will close it. Since the new pose extrapolator got merged, bags with the 50 Hz laser do not cause the velocity estimate to diverge any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants