Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rplidar users] Potential errors with drivers + 360 lidar support #198

Closed
saschroeder opened this issue Apr 24, 2020 · 53 comments · Fixed by #326
Closed

[rplidar users] Potential errors with drivers + 360 lidar support #198

saschroeder opened this issue Apr 24, 2020 · 53 comments · Fixed by #326

Comments

@saschroeder
Copy link

Required Info:

  • Operating System:
    • Ubuntu 16.04
  • Installation type:
    • from source
  • ROS Version
    • ROS Kinetic
  • Version or commit hash:
    • kinetic-devel, latest commit (9b14cf8)

Steps to reproduce issue

I created a minimal repo with bags, config and launch files:
https://github.com/saschroeder/turtlebot_slam_testing

The bug always occurs with rosbags and sometimes in simulation, too. Disabling loop closure prevents the bug.
I can upload more rosbags if necessary.

Expected behavior

Record an accurate map and correctly use loop closure to improve the map.

Actual behavior

After some time the map is rotated around 180° (may be related or not) and some time after this (seconds to minutes) the pose graph is optimized in a wrong way, causing the map to be broken (see images below).

Additional information

Example 1 (before/after optimization):
ex1_before_optimization
ex1_after_optimization

Example 2 (before/ after optimization):
ex2_before_optimization
ex2_after_optimization

@SteveMacenski
Copy link
Owner

SteveMacenski commented Apr 24, 2020

First off, I recommend the Melodic branch. It has a bunch more features and is better structured. It will work in Kinetic.

Second, it would be helpful to provide me with a bag file I can use to reproduce.

My steps to reproduce:

  1. Downloaded your bag file
  2. Ran
rosbag filter bag_name.bag output.bag 'topic != "/tf" or topic == "/tf" and m.transforms[0].header.frame_id != "map" and m.transforms[0].child_frame_id != "odom"'
 output.bag

on the bag file, to remove the transformations your recorded from your SLAM/localization runs, so I was just left with the raw data to provide my own.

  1. Ran: roslaunch slam_toolbox online_sync.launch with a roscore and sim time.

So make sure you see this error when following those steps on your machines.

Third, how did you install Ceres? I've heard of similar issues that were the result of a dirty Ceres install that comes with Google Cartographer. If you've installed Google Cartographer, you will need to remove that Ceres install and install it again via rosdep. Ceres that ships with Cartographer is only suitable for cartographer. This thread #167 may be helpful.

@saschroeder
Copy link
Author

saschroeder commented Apr 25, 2020

Thanks for the quick response.

  1. Sounds good, I will have a look at it.

  2. I did literally the same, except that I used the launch file provided in the repo (https://github.com/saschroeder/turtlebot_slam_testing) which loads a slightly different config file.
    Just in case, I reset all parameters to match the config file used by online_sync.launch (except for laser_frame and max_laser_range) and tested again. In fact it took much longer for the error to occur (~10min), but it still did.

  3. I installed Ceres via apt (libceres-dev, libceres1). Cartographer is not and was never installed.
    However, after reading Map making error #167 I'm wondering if I have a similar issue and need to invert the laser.
    I'm gonna test this on monday and give you feedback.

@saschroeder
Copy link
Author

Update:

Nothing wrong with the laser frame.

I did some further testing. All with the same rosbag and the steps you described above. I tested both the sync/ async nodes with the default configs (adapted laser frame and range).

  1. Previous system (Ubuntu 16.04, ROS Kinetic, libceres-dev 1.12.0)
  • purged and installed ceres via rosdep
  • kinetic-devel and melodic-devel branch
  1. Clean ROS Melodic install (Ubuntu 18.04, libceres-dev 1.13.0)
  • ceres via rosdep
  • melodic-devel branch

The error occured in any case.
Are you using one of these Ceres versions @SteveMacenski ?

@SteveMacenski
Copy link
Owner

I have 1.13.0+dfsg0-2 installed via rosdep.

Please post a bag file and I can take a look. Include your tf and tf static and laser scan. Specify how your laser is mounted on the robot. Also specify how you've installed this.

@saschroeder
Copy link
Author

This is the robot:
photo5447159618695441923
The laser is an RPLIDAR A2M8. It's installed with the cable pointing to the robot's back.

This is my launch file for the lidar:
https://github.com/saschroeder/rplidar_ros/blob/master/launch/rplidar.launch

This is the complete rosbag (previously cut to fit on github):
https://drive.google.com/file/d/1Kly5dnjB9VTyjvI6uO32_Yzn-tEqmYA_/view?usp=sharing

@SteveMacenski
Copy link
Owner

SteveMacenski commented May 2, 2020

What is angle_compensate? I can't seem to find any documentation about what it does and the logic in the code for rplidar isn't simple enough for me to just glance to understand.

Do you see any logging / errors from SLAM Toolbox or anything else when running?

image

Your TF tree also looks weird, why is your laser oriented backwards in frame to the robot? Your wheel encoder TF also updates insanely slow, looks like only 2-3 hz (non-scientifically from looking). I don't know if that's related, but that's far too slow for quality autonomous navigation. You should be getting it at 50hz with the kobuki if memory serves.

image

I'm getting past where you had issues in your first example images, no issue.

image

Also made it past your second issue example, no problem.

2

I see a good loop closure here.

I got through the loop closure of the main corridor and a bit more and finally had it happen around 610 seconds into the bag. And I get that pretty deterministically from the 3 times I ran in a row.

3

I reset all parameters to match the config file used by online_sync.launch (except for laser_frame and max_laser_range)

Where do you see a laser_frame param? That's ancient. You shouldn't even know about that - are you using Melodic devel like I suggested? You shouldn't need to set any laser frame.

Do you reset odometry or otherwise do anything to effect TF / laser scans in your stack? IMU off nuts? Anything? I've mapped spaces 10x larger than this on a regular basis and never had an issue. This is unique and the other users that have reported similarly (if you check the closed tickets) are all results of messed up URDFs or inverted lasers or something wacky. I also don't have any robots with 360 lasers so perhaps there's something there to look at, there was a couple of PRs to enable 360 lidars but maybe something got messed up.

My recommendation would be to first critically analyze your URDF, frames, odom, and transformations since that seems to fix all the users so far that have reported similar things. My second recommendation would be to use laser_filters to transition your 360 lidar into a 270 lidar and feed that scan into SLAM Toolbox as a test. If no issues, then its something 360 lidar related. If it still happens, then its robot related.

I ran 2 much larger bags when I went out for dinner that I had laying around and I confirmed that mapping a much larger space over the course of several hours is still working fine as a sanity check. Both of these bags were collected with different robots and different lasers from different manufacturers and exceeded 2 hours in length each.

@tik0
Copy link

tik0 commented May 3, 2020

I am using vanilla Ubuntu 16.04/ROS Kinetic and slam_toolbox on melodic-devel as of today. I know the building and it is literally a glasshouse. I've inspected the bag topics and transforms, and everything looks OK: /odom publishes with 20 Hz and also the "inverted" frame rplidar_frame is correctly in place. My first guess is, that the solver is not optimally configured for this recording, as broken loop-closures happen only in very particular situations. However, I ran

$ rosbag filter part1.bag output.bag 'topic != "/tf" or topic == "/tf" and m.transforms[0].header.frame_id != "map" and m.transforms[0].child_frame_id != "odom"'
$ roscore
$ rosparam set use_sim_time true
$ rosbag play --clock output.bag
$ roslaunch slam_toolbox online_sync.launch

If I use vanilla online_sync.launch with ceres_loss_function=None (aka squared loss) I get the same reproducible result after ~600 seconds:

Screenshot from 2020-05-03 20-16-59

if I change ceres_loss_function=HuberLoss (which is more robust to outliers), it looks quite good:

Screenshot from 2020-05-03 20-10-48

@saschroeder and @SteveMacenski can you confirm this?
@saschroeder can you elaborate on this behavour on other bags?

@SteveMacenski
Copy link
Owner

SteveMacenski commented May 4, 2020

That is really, really interesting. Is that area that you're in also made of glass where the error happens? I could see how this could happen if it gets some scans that look flipped from the reflection from the glass and then when it looks to loop close, it matches those warped scans and when Ceres runs, it has some internal conflicts.

A different loss function would help with that fact. I had not included one prior because I felt that each "measurement" was supposed to be "legit" and so we shouldn't remove some outliers. However, with glass or mirrors, you're right in thinking that would totally break my assumptions that they're all "legit".

I would be interested to hear if that fixes your issues in other bag files as well. If so, that seems to be a completely valid solution to the problem.

I'd appreciate it if you submitted a PR to mention that behavior in the README.md file. That way future folks can learn from our experiences that in environments with glass/mirrors we should consider using a loss function. Thanks for the debugging @tik0. I also believe in trying to give folks a really excellent out of box experience. I would be OK also with that PR changing to the HuberLoss as the new default. It may be useful as well to expose the loss function weight (currently hardcoded to 0.7 which I tested to generally be good for my testing while developing) so that you can tune this as well since that capability seems necessary for your requirements.

I can confirm that the loss function fixes it for me too. I stopped around 1000 seconds because it seemed fine and I needed my CPU back.

PS: I'm reasonably happy with the performance of the RP lidar in this environment / dataset. That wasn't expected.

@tik0
Copy link

tik0 commented May 6, 2020

I'd like to add some more information to keep this topic alive. I managed to get very stable results on 2020-04-08-11-01-32.bag (both cleaned up packages available here) after tuning the optimizers' parameters one after another as follows:

Change in mapper_params_online_sync.yaml
ceres_loss_function: NONE
to
ceres_loss_function: HuberLoss
stabilizes big loop-closure in the early stage of recording

Change in mapper_params_online_sync.yaml
ceres_trust_strategy: LEVENBERG_MARQUARDT
to
ceres_trust_strategy: DOGLEG
stabelizes smaller loop-closure in the later stage of recording

Change in solvers/ceres_solver.cpp
loss_function_ = new ceres::HuberLoss(0.7);
to
loss_function_ = new ceres::HuberLoss(50.0);
stabilizes very small loop-closures in the later stage of recording
(Yes 50. I first thought that then HuberLoss might be replaceable by NONE, but this is not the case. It might be very beneficial, to see a distribution to find optimal values. Values over 70 let the graph collide in loop closures)

Change in mapper_params_online_sync.yaml
scan_buffer_size: 10
to
scan_buffer_size: 20
stabilizes some medium size loop-closure in the later stage of recording (values like 30 still managed the loop closures, but caused misaligned walls)

Maybe @SteveMacenski can tell a story about the parameters in my empiric observations?
However, I've made the most intriguing observations during the exploitation of the bag's replay speed. As soon as I keep the CPU load below the number of real cores, everything looks perfect, and ceres never taint a loop-closure. As soon as I exceed the CPU load with too high replay speeds, ceres mess up loop-closures very frequently like seen in the posts before.

@SteveMacenski
Copy link
Owner

ceres_loss_function: HuberLoss

I think we've talked about this one already so I'll skip

ceres_trust_strategy: LEVENBERG_MARQUARDT

I'm really surprised you had to change that. I don't have a good answer off the cuff on that. Are you sure after changing all the other things that this actually made a beneficial impact?

loss_function_ = new ceres::HuberLoss(50.0);

Like I mentioned above, I really didn't expect this to need to be used. I work with mostly professional long range lidars so this is not much of an issue. As you mention, there could be some benefits to adding this for mirrors/glass/cheap(?) lidars. Can you please submit a PR exposing this parameter (the value to assign to the loss function)? This makes sense to me as something to optimize if you have graph closure problems from outliers. Higher here will mean more rigor applied to each point about whether it really fits the graph.

scan_buffer_size: 20

From the description

    "Scan buffer size is the length of the scan chain stored for scan "
    "matching. \"ScanBufferSize\" should be set to approximately "
    "\"ScanBufferMaximumScanDistance\" / \"MinimumTravelDistance\". The "
    "idea is to get an area approximately 20 meters long for scan "
    "matching. For example, if we add scans every MinimumTravelDistance == "
    "0.3 meters, then \"scanBufferSize\" should be 20 / 0.3 = 67.)",

This lets you have a large window of memory to match against, but also what this does ( and what I think you see happening) is that this also effects the minimim chain size for what is considered a loop closure. Ex if this was 5, on scan 6 you could "loop close" against scan 1. If this is 100, you cannot loop close from scans 1-99 on scan 100 since they're all chained together. Making this larger will effective disable smaller loop closures which could be beneficial to reduce the number of instances of this problem you see.

@saschroeder
Copy link
Author

I tested the parameters proposed by @tik0 on my laptop, which worked well, as long as I played the bags with normal speed, even though the CPU load was temporarily higher than the number of real cores.
However, I'm still having issues with the bag 04-27-a.bag (https://drive.google.com/open?id=1BxGWGJKLLBVZoQjnzj2JAQtSkixL-gM-)

I already did some testing based on @tik0's first comment and found that the HuberLoss function does have an interesting effect on the mapping behavior:

  • I still get the initial 180 degree rotation of the map in every run (due to a 180 degree jump in the map to odom transform)
  • besides this, the behavior is less deterministic (there are distinct areas where a loop closure might break the map, but it's possible to have a run without any issues)
  • these loop closures don't necessarily distort the map as we've experienced before, but the transform from map to odom is suddenly jumping by 180 degrees (+translation), usually causing the mapping to continue with a 180 degree offset

I uploaded some gifs to illustrate this:
https://drive.google.com/open?id=1BxGWGJKLLBVZoQjnzj2JAQtSkixL-gM-

  • turn1.gif shows the initial 180 degree turn, which doesn't break anything because the whole map is adjusted
  • turn2.gif shows the next turn, which "breaks" the map
  • turn3.gif another loop closure fixes everything
  • turn4.gif next turn, breaks the map again (mapping continues, but no more turns so it's broken in the end)
    I'm not sure if there's a pattern of repetitive breaks and fixes or if I was just lucky in that case.

Seeing all these 180 degree turns and considering that I can reproduce this in the simulation, I don't think this is about an unreliable sensor/ reflections, but rather a systematic error.
So, why does this problem occur with my setup and not with @SteveMacenski's?
Option 1: 360 degree lidar
Option 2: The fact that my lidar frame is oriented backwards. This is absolutely correct considering the frame definition from the rplidar_ros package and the lidar installation, but probably different from your setup @SteveMacenski and thus not tested?

I applied a laser filter to test with a range of 270, 230 and 180 degrees. For 270 I get the same initial rotation, but not with 230 and 180 degrees. I stopped the bag (2020-03-27-...) after several minutes as the map quality got really bad.

I used the gazebo sim for some quick testing. I set the lidar frame's z-axis rotation to 0, 90, 135 and 180 degrees compared to base_link and tried to reproduce the loop closures with 180° rotation.
For 180° lidar rotation (as before) I could easily trigger these (first is fine, seconds breaks the map...). For 0° no strange rotations, but plenty of good loop closures. For 90° and 135° I could trigger one map rotation each, but this time about 90° and 135°. If that isn't suspicious...

I will do some more tests with the lidar frame facing forward (possibly on the robot) to verify that it solves the problem.

@tik0
Copy link

tik0 commented May 7, 2020

I've written a small script to flip the tf by 180° and laserscans by -180° inside the bags. Therefore, I am doing virtually nothing to the data. However, after this, everything worked perfectly even with the standard parameters. I can also penetrate the CPU to even higher loads after the flip and slam_toolbox does its job.

There might be a real issue in slam_toolbox if the laser frame is not co-aligned with the base_link frame. This shouldn't actually be an issue if all transforms are defined correctly. But, in fact, it is.

import rosbag
from sensor_msgs.msg import LaserScan
from tf2_msgs.msg import TFMessage

baginname = '2020-03-27-13-47-13_filtered.bag'
bagoutname = '2020-03-27-13-47-13_filtered_flipped.bag'

bagout = rosbag.Bag(bagoutname, 'w')
bagin = rosbag.Bag(baginname)


for topic, msg, t in bagin.read_messages():
    if topic == '/scan':
        # just flip all scans by 180°
        msg.ranges = msg.ranges[180:] + msg.ranges[0:180]
        msg.intensities = msg.intensities[180:] + msg.intensities[0:180]
    elif topic == '/tf_static':
        for tf in msg.transforms:
            if tf.child_frame_id == 'rplidar_link' and tf.header.frame_id == 'base_link':
                # was 0 0 1 0 before, do we flip it by 180° around z-axis this way:
                tf.transform.rotation.x = 0.0
                tf.transform.rotation.y = 0.0
                tf.transform.rotation.z = 0.0
                tf.transform.rotation.w = 1.0
    bagout.write(topic, msg, t)

bagout.close()
bagin.close()

@SteveMacenski
Copy link
Owner

There might be a real issue in slam_toolbox if the laser frame is not co-aligned with the base_link frame

I take your bet and raise you a laser frame yaw parameterization

https://github.com/SteveMacenski/slam_toolbox/blob/eloquent-devel/src/laser_utils.cpp#L95-L127

@tik0
Copy link

tik0 commented May 8, 2020

I printed all variables in the parametrization and they look ok in all cases. However, I've found another minor issue regarding the detection, if the scanner is 360°. I don't know if this holds for all 360°lidar, but for the rplidar A2M8 from the bags the following code changes should be considered:

The scanner has a valid 360° configuration and publishes as follows:
angle_min: -3.124139 (which is -180° - angle_increment)
angle_max: 3.141539 (which is the last reading at 180° that completes the circle)
angle_increment: 0.017453 (which is 1°)
len(ranges) = 360

Therefore, the condition in the parameterization does always fail because of std::fabs(scan_.angle_min + M_PI) < 1e-3. One should rather check, if the residual is less than the increment. Something like

std::fabs(std::fabs(scan_.angle_max - scan_.angle_min - 2. * M_PI) - scan_.angle_increment) < 1e-3

Furthermore, if the scanner is then correctly detected as a 360° scanner, this condition shows falsy behavior because of the later calculation of the number of scans. The residual should always be 1, because calculation does only calculate the spaces between the scans, even in the 360° case. However, fixing this has no impact on the erroneous behavior from @saschroeder .

@SteveMacenski as the scan matcher always seems to work, I assume that maybe the orientation of the lidar is not respected when ceres does the loop closure. Because falsy behavior only occurs in these cases. @SteveMacenski any ideas on that?

@SteveMacenski
Copy link
Owner

Therefore, the condition in the parameterization does always fail because of std::fabs(scan_.angle_min + M_PI) < 1e-3. One should rather check, if the residual is less than the increment. Something like

Please submit a PR and I'll merge that. Looks good to me. That was a user submitted issue with a RPLidar and apparently it didn't totally work out.

@SteveMacenski as the scan matcher always seems to work, I assume that maybe the orientation of the lidar is not respected when ceres does the loop closure. Because falsy behavior only occurs in these cases. @SteveMacenski any ideas on that?

I'm fairly certain it does, you can test yourself with the magazino bags here: https://google-cartographer-ros.readthedocs.io/en/latest/data.html#id78 their robot has 2 laser scanners at 90 degree angles from the base link and it builds fine. I used these bags in developing the (still incomplete) multi-robot synchronous distributed mapping features (so... damn... close... just need someone to help me trouble shoot some optimizer errors).

@saschroeder
Copy link
Author

I ended up turning the lidar on my robot as this was the fastest way to fix it. As described by @tik0 everythings works fine now. I think the overall performance has improved, too.

Out of curiosity I tested the hallway_return.bag from the cartographer docs with scan_front/ scan_rear each and the mapping does look strange to me. Though the map itself is build nicely you can see in rviz that the laser scans are misaligned to the map and I see the same characteristic map rotation, that I described in my previous comments, shortly after the robot turns back in the hallway. To me this is a strong indicator that there might be more issues. I guess the bag is just to short or the map to simply for more issues.
You could also use @tik0 script to shift the laser frames in your own bags and try to reproduce the issues with those.

@SteveMacenski SteveMacenski changed the title Graph optimization error Potential errors in lidars not facing forward in optimizer May 14, 2020
@SteveMacenski
Copy link
Owner

SteveMacenski commented May 14, 2020

Hi,

I renamed the ticket to reflect that (mostly because that original name was non-descript).

This is an interesting topic, and maybe there is some issue. @saschroeder @tik0 do either of you see where that laser utils for handing the angle may be wrong? Or tracing back into Karto-land where that may not be properly utilized? I'd love to help fix this, but without dropping everything else to try to debug this, I'm not sure I can. If we can at least identify an area that might be wrong, that's something I can sit down and attack.

I think the overall performance has improved, too.

Can you qualify that a little? (or quantify, though I think probably not :-) )

... and I see the same characteristic map rotation

I'd like also a description on what the "same" you see. Particularly in pertaining to the behaviors you see in this that you also saw in your setup. We might be able to get at something here.

Keep in mind that that bag as TF built into it. That offset I think is because you didn't use the rosbag filter script I showed above to remove the existing TF odom->map transforms. I've seen that as well, but as a result of not conditioning the bag for "raw" data without SLAM or localization TF.

@joekeo
Copy link

joekeo commented May 28, 2020

I think this is the same issue I was having in:
#167
And ended up playing a lot with the URDF files to have the lidar pose modified and matching the IMU for that.
I currently came back to this, so i will post some updates with my own robot.

@joekeo
Copy link

joekeo commented May 28, 2020

So we are using the RPLidar S1 on some of our robots. And for some reason it provides the laser scan rotated by 180 degrees.
When running slam_tollbox with loop_closure set to true, the map gets flipped 180 degrees and then, after a while, it becomes very inaccurate.
I just flipped the lidar 180 degrees in the robot (physically, along with the URDF) and now slam_toolbox works properly, loop closure is stable.

@SteveMacenski
Copy link
Owner

SteveMacenski commented May 28, 2020

I just flipped the lidar 180 degrees in the robot

Mhm... that also makes me suspect that maybe the laser scans are backwards (though logic would say flipping both 180 should be identical). There are standards about how the laser scan messages should be formulated (e.g. sweep direction). If flipping it works, maybe that relates?

The annoying thing is that this is only reported on RP-Lidars and no-one else. Their ROS driver is really bad. It sounds like this may be the result of multiple issues in the drivers, not just the frame. I'm not sure what I can do here until someone here with a sensor finds what's wrong with either the driver or this package regarding it, or someone with another non-RP-lidar or its derivatives 360 sensor verify it happens as well. This isn't something I can debug without a robot equipped with this sensor in front of me. I'd be more than happy to help whoever is willing to work on it any way I can including explaining things, looking at datasets and seeing if I can find the issue, etc. @tik0 was pretty on point in figuring out the other issues, maybe they're interested?

@joekeo
Copy link

joekeo commented May 29, 2020

I will dig in the RP lidar S1 part, will try to fix the frame issue and see if that sorts it.

@SteveMacenski
Copy link
Owner

If you need justification to upstream that fix REP-103 has your back https://www.ros.org/reps/rep-0103.html RP-lidar should be in compliance with it.

@chrisl8
Copy link
Contributor

chrisl8 commented Jul 20, 2020

I also have an RPLIDAR. I also have the cord facing the back of the robot. I am also seeing this same behavior.

  1. Is flipping my RPLIDAR around on the robot to face the cord forward the best solution? Or is that only going to partially help?

  2. Is there anything I can add in the way of testing, uploads, etc. that could help anybody else here to find the solution?

Unfortunately I am somewhat of a novice at this stuff along with this being a hobby project so I don't have the time or skills to dig into the RPLIDAR driver myself.

I don't hold out a lot of hope at the moment that Slamtec will fix their driver. The last responses I see from the maintainer to an issue on that repo is from 2018.
I did do some diving into the "Network Graph" and found a few forks that look interesting. I may try some of them out to see if they behave differently. I can post more info here, but I didn't know if this was the right place to have that conversation or not.

@joekeo
Copy link

joekeo commented Jul 20, 2020

Hello, we solved this issue by modifying the rplidar ros package, we made a fork of it and modded how the laser scan is read, and exposed the change via a parameter that can be set on the launch file.
The fork is here:
https://github.com/Ross-Robotics/rplidar_ros

and for it to do the 'flipping'g you need to set the parameter rotate_scan to true

this is how our launch file looks like:

<launch>
    <arg name="frame_id" default="horizontal_laser_link"/>
    <arg name="serial_port" default="/dev/RP_LiDAR"/>
    <arg name="remap_as_raw" default="false"/>

    <node name="rplidarNode" pkg="rplidar_ros" type="rplidarNode" output="screen">
        <param name="serial_port" type="string" value="$(arg serial_port)"/>
        <param name="serial_baudrate" type="int" value="256000"/>
        <param name="frame_id" type="string" value="$(arg frame_id)"/>
        <param name="inverted" type="bool" value="false"/>
        <param name="angle_compensate" type="bool" value="true"/>
        <param name="scan_mode" type="string" value="Standard"/>
        <remap from="/scan" to="/scan_raw" if="$(arg remap_as_raw)"/>
        <param name="rotate_scan" type="bool" value="true"/>
    </node>


</launch>

Copy the repository to your work space, make a catkin clean then catkin build, the source (source devel/setup.bash) and it should work. Let me know if it works for you.

@SteveMacenski SteveMacenski pinned this issue Jul 20, 2020
@chrisl8
Copy link
Contributor

chrisl8 commented Jul 25, 2020

I tried the patch recommended by @joekeo and I think it changed things for me, but didn't fix them.
Before, the entire map would rotate and offset from the obvious position of the world based on the scan data.

Now, instead of rotating, it just shifts.

It always happens though after 10 to 15 minutes of running.

Unfortunately I don't have a bag file for you, but I will work on that. What I do have is an image.
I run the robot back and forth over this area repeatedly using 2D Nav Goal input.
Despite the messy environment, it maps very well and navigates very well using teb_local_planner.
However, as I said, after about 10 to 15 minutes, the map offsets from the obvious scan position and navigation then becomes impossible.
Screenshot from 2020-07-25 02-35-01

It is obvious in the image that it is offset in one direction significantly.

This was harder to see before when the map would rotate instead of only shift.

I may just try rotating the scanner on the robot too, to eliminate needing to perform code tricks to deal with this,
although I'm also concerned this may not be the issues.
Already it was thought by some to be a bad URDF file and a bad Cartographer install, both of which turned out to be false.

I appreciate any input you may have for how I can debug this further, and I understand that I haven't given you enough information to actually diagnose this yet. I wanted to share my findings before I forgot though.

Update It still rotates also. My issue might not even be related to this?
Interestingly on the last run, it shifted almost twice the width of the map, but then recovered after a while, and then later rotated about 90 degrees and could not recover.
I did make a bag file. I will take a look at it and see what I can discover.
Screenshot from 2020-07-25 03-05-53

Update: What happens at each of these points (pictured above) is that the transform from map to odom suddenly becomes all zeros. So the "offset" is whatever the accumulated odometry drift has been since the start, which is why it seems somewhat "random" between runs and is sometimes highly rotated and other times just offset.

---                                                                                           
transforms:                                                                                   
  -                                                                                           
    header:                                                                                   
      seq: 0                                                                                  
      stamp:                                                                                  
        secs: 1595664325                                                                      
        nsecs: 766761376                                                                      
      frame_id: "map"                                                                         
    child_frame_id: "odom"                                                                    
    transform:                                                                                
      translation:                                                                            
        x: -2.3483314927042844                                                                
        y: -4.429779868765097                                                                 
        z: 0.0                                                                                
      rotation:                                                                               
        x: 0.0                                                                                        
        y: 0.0                                                                                
        z: -0.5969529408360271                                                                
        w: 0.8022762531866555                                                                 
---                                                                                           
transforms:                                                                                   
  -                                                                                               
      header:                                                                                   
      seq: 0                                                                                  
      stamp:                                                                                          
        secs: 1595664325                                                                      
        nsecs: 846779109                                                                      
      frame_id: "map"                                                                         
    child_frame_id: "odom"                                                                        
    transform:                                                                                
      translation:                                                                            
        x: -0.0                                                                                       
        y: -0.0                                                                               
        z: -0.0                                                                               
      rotation:                                                                               
        x: 0.0                                                                                
        y: 0.0                                                                                
        z: 0.0
        w: 1.0
---

Update: It is 100% repeatable and consistent with the bag file.
I am using the Melodic ROS Slam Toolbox package on the robot and the Noetic package on my laptop, with the same results.
The package version says 1.5.4, so I believe it includes the 360 lidar changes from January.

@chrisl8
Copy link
Contributor

chrisl8 commented Jul 27, 2020

Update with bag file, config, and video showing what happens and when.
SLAM Failure 2020-07-27 16-35
I am using ROS Noetic installed via apt. This also happens on ROS Melodic.
This image is from playing the bag file.

Here is the bag file:
https://www.dropbox.com/s/fuma3x9c57os3iw/slamIssue-07-25-2020.bag?dl=1

Here is the configuration file used:
https://www.dropbox.com/s/l1l2b6n3xod9jr2/slamIssue-07-25-2020-mapper_params_online_sync.yaml?dl=1

The RPLIDAR A3 is mounted backwards. This happens with or without using the RPLIDAR driver code that flips the lidar image 180 degrees, but this bag file is using that code, hence the transform for the LIDAR appears facing forward.

@SteveMacenski
Copy link
Owner

1.5.4 was incremented a month ago so yeah, its pretty new.

Overall, this issue is only being reported by people with RP-lidars (or potentially 360 2D lidars, but so far only RP-lidar users have commented on this thread). I have run this package for now hundreds of hours across dozens (of not low-hundreds) of facilities of various shapes and sizes using typical 2D laser scanners (sick, hokuyo, & similar) without seeing an issue so there's so far not much I can do since I cannot reproduce it.

I'd love to merge in a change to fix what you're seeing, but that requires someone to do the leg work to identify the issue and potentially resolve it. But even just telling what the specific issue is could be something I may be able to fix and have a user test.

That identify transform is interesting. Are you seeing errors like: https://github.com/SteveMacenski/slam_toolbox/blob/noetic-devel/slam_toolbox/src/slam_toolbox_common.cpp#L358? That's the only place an identity transform would be set at runtime after the first iteration. I looked through the convert() method in TF2 and I don't see anywhere where it might set an identity if it fails (you may want to double check me), though I didn't track down the transform() method.

If you can deterministically cause this to happen, awesome. I'd print or set break points in setTransformFromPoses() method and check the values at various steps. Maybe you'll see a clear location where it occurs.

@tik0
Copy link

tik0 commented Aug 4, 2020

@chrisl8 I am using the RPLIDAR A2M8 and rplidar_ros on this commit. I had also issues with a falsy calculation of the residual scan, which I have explained here in detail. However, PR #257 does not show this effect on my scanner.

@chrisl8
Copy link
Contributor

chrisl8 commented Aug 4, 2020

Here are my summarized findings for future RPLIDAR users who may find themselves here

Map catastrophic rotation due to RPLIDAR mounting axis

Sometimes the Map rotates itself 180 degrees underneath you. I have had this happen even in Localization mode:
ROS Map Flipping

I currently have a ROS Answers post open on this issues in case it has anything to do with Slam Toolbox:
https://answers.ros.org/question/359654/what-may-be-causing-my-map-to-sometimes-rotates-180-degrees-during-operation/

A similar or possibly the same problem is noted under Issue #281

This appears to be caused by the RPLIDAR being mounted "backwards", that is, with the cord toward the back of the robot.
The RPLIDAR's front is the point where the cord exits the unit.
Many people have solved this issue by simply ensuring the RPLIDAR is mounted with the hardware "front" toward the literal front of the robot. That is the, the edge the cord extends from is oriented toward the front of the robot.

My RPLIDAR is currently mounted backwards, and this happens to me. I am currently researching the cause, but if I get tired of that I will rotate the RPLIDAR myself. My RPLIDAR is now facing "forward", and this has not happened again.

TL;DR: Ensure the "front" (cord end) of your RPLIDAR is oriented toward the FRONT of your robot. Nobody quite knows why, but it fixes this.

NOTE: As of September 2020 there are some alternate solutions if you are using the ROS2 driver provided by allenh1 as of this PR.

Angle Increment Calculation

You may see this error as soon as you start up Slam Toolbox:
LaserRangeScan contains <number> range readings, expected <number-1> which prevents mapping

As of September 2020 this issue is fixed by this PR.

Sweep Direction

The Sweep Direction of the RPLIDAR was called into question a few times.

I used this code to hack the output into a curve starting at the first point in the scan distance output array:

float read_value = (float) nodes[i].dist_mm_q2/4.0f/1000;
// TODO: This will clearly demonstrate the sweep start, end, and direction.
read_value = scan_msg.range_min + (0.001 * i);

and the output is in a counter-clockwise direction as ROS expects:
RPLidarSweepDirection-InvertedFalse
NOTE: The RED axis is "pointing" in the direction of the cord on the RPLIDAR A3 unit.

It may not be immediately obviouse, but the code actually lays down the data into the ranges list backwards, so that it ends up being "counter-clockwise":

header:                                                                                                                                      [510/3169]
  seq: 50                                                                                                                                              
  stamp:                                                                                                                                               
    secs: 1596956675                                                                                                                                   
    nsecs: 279228181                                                                                                                                   
  frame_id: "rplidar"                                                                                                                                  
angle_min: -3.12413907051                                                                                                                              
angle_max: 3.14159274101                                                                                                                               
angle_increment: 0.00435422640294                                                                                                                      
time_increment: 5.38406420674e-05                                                                                                                      
scan_time: 0.0774766877294                                                                                                                             
range_min: 0.15000000596                                                                                                                               
range_max: 25.0                                                                                                                                        
ranges: [1.5889999866485596, 1.5880000591278076, 1.5870000123977661, 1.5859999656677246,

Clearly the 1.5... meter entry is the first in the list.

Based on some .bag files I downloaded this seems to be the same data order that the Hokuyo uses, with the angle_min being negative, and to the "right" of center, and ending with angle_max as positive and being to the left of center.

The sweep direction from the RPLidar driver is fine.

If anybody finds themselves here and needing help with RPLIDAR and Slam Toolbox, feel free to ping me or contact me directly and I will share anything further I've found, or personal help on hacking things up to where I am at.

For me, with my RPLIDAR driver edits, Slam Toolbox works very well.
If you have trouble, be sure that you are sticking to the default settings for Slam Toolbox, and don't mount it backwards.

@SteveMacenski
Copy link
Owner

SteveMacenski commented Aug 31, 2020

I don’t think the cable forward is 100% necessary, but if you don’t, you must correct the wrong axes on the rplidar driver since they haven’t patched it themselves. Both the axes and the swapping the readings with the python script above would have a similar output.

allenh1 pushed a commit to allenh1/rplidar_ros that referenced this issue Sep 13, 2020
* Fix incompatibilities with slam_toolbox:
- Fix angle compensate mode to publish angle compensated values
- Fix angle_increment calculation
- Add optional flip_x_axis option to deal with issue discussed here: SteveMacenski/slam_toolbox#198.  Flip x-axis can be used when laser is mounted with motor behind it as rotated TF laser frame doesn't seem to work with slam_toolbox.

* Fix whitespace
WLwind added a commit to WLwind/slam_toolbox-1 that referenced this issue Feb 3, 2021
…ap 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).
WLwind added a commit to WLwind/slam_toolbox-1 that referenced this issue Feb 3, 2021
…ap 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).
WLwind added a commit to WLwind/slam_toolbox-1 that referenced this issue Feb 3, 2021
…ap 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).
@SteveMacenski
Copy link
Owner

SteveMacenski commented Feb 3, 2021

As an update @WLwind has found something interesting in #325. I need to track through it, but it looks like why the rplidar had issues being "backwards" with its origin frame (which by the way still should be fixed by RPlidar to be REP105 compatible!) is the openKarto assumed it to be aligned with the base frame. I need to verify that, since I thought I'd checked that and the GetSensorPose should contain orientation information, but assuming this PR works, that should put the final nail in this issue without specific rplidar driver fixes (though they should still be made).

I'm a little uncertain then why on some of my testing bag files with the robot having lidars at 45 degree angles to the forward that this never came up. They're relatively short bags (15 minutes) so maybe the optimizer is just good at correcting for it? Maybe why changing the loss function had an outsized impact at hiding the issue?

@chrisl8 can you test this PR from your last comment it sounded like you wanted to have the lidar mounted the opposing way. This might be a good way to see if this fixes that issue for you once and for all with map flipping!

@chrisl8
Copy link
Contributor

chrisl8 commented Mar 13, 2021

@SteveMacenski I rotated my RPLIDAR to set the cord toward the front of my robot last year, which solved the spurious map rotation issue for me.

I will try running some bag files through the new and old code this weekend to see if I can obtain a consistent positive and negative result with the old/new code.

@chrisl8
Copy link
Contributor

chrisl8 commented Mar 15, 2021

Noetic #325 appears to entirely fix the reported issue!

I ran the original bag file from this issue through the current main branch a few times and got different results every time, none of which looked reasonable:

slam_toolbox_previous-Mach2021a

slam_toolbox_previous-March2021b

Then I used Noetic #325 many times and obtained this result every time:

slam_toolbox_fixed-March2021

The fact that Noetic #325 not only creates a map that is reasonable, but that it is also 100% consistent on every run for this bag file is wonderful!

@SteveMacenski
Copy link
Owner

Great to hear! Now we're just waiting to see how localization is impacted, see this comment (#355 (comment)) if you have a few moments to test in localization mode too to see if you get any warnings out of the ordinary. The only thing blocking me from merging and releasing these updates to binaries for all distros is validation from someone on the localization warnings potentially reported. I couldn't get them locally in my sandbox environment, but I don't have a large production space like that above or localization dataset different from the SLAM run.

@chrisl8
Copy link
Contributor

chrisl8 commented Mar 15, 2021

@SteveMacenski Just to be clear, "localization" is when I load an existing map and send the robot to points on that map, correct?

I did save a map, restart ROS, and load the map using Noetic #325 and let my robot randomly navigate to various points on the map for about 6 hours. The orientation of the map stayed true to the scan data the entire time.

This is the area I am testing in:
My Robot Testing Area

It is in my home, so also not an enormous area.

However, I did not look at the log at all. I will set it up and run it again this week and look at the log output. Worst case I might not get to it until this weekend again.

@SteveMacenski
Copy link
Owner

SteveMacenski commented Mar 15, 2021

We mean the localization mode available via the localization node https://github.com/SteveMacenski/slam_toolbox/blob/noetic-devel/slam_toolbox/launch/localization.launch#L3 rather than the sync/async mapping modes. You can load a pose graph in these modes and continue adding to the map, but that's different than the pure localization mode. I don't know practically speaking if there would be any difference in whether I expect that you'd see any warnings or not, but just to play it safe, the localization mode is where I was told about this potential warning issue introduced in this PR.

That test, if you just did continued mapping in the async/sync nodes, is also a good test and a good datapoint that it A) worked and B) you didn't see a terminal flooded with warnings after 6 hours.

SteveMacenski pushed a commit that referenced this issue Mar 23, 2021
…e lidar is mounted backwards (#326)

* fix the issue (#198) that loop closures can rotate the map 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).

* linting things

* add function GetCorrectedAt to simplify getting robot pose

* add update() when setting corrected pose after optimizing
SteveMacenski pushed a commit that referenced this issue Mar 23, 2021
…e lidar is mounted backwards (#325)

* fix the issue (#198) that loop closures can rotate the map 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).

* linting things

* add function GetCorrectedAt to simplify getting robot pose

* add update() when setting corrected pose after optimizing
SteveMacenski pushed a commit that referenced this issue Apr 27, 2021
…th exception about pthread_mutex_lock (#386)

* fix the issue (#198) that loop closures can rotate the map 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).

* linting things

* add function GetCorrectedAt to simplify getting robot pose

* add update() when setting corrected pose after optimizing

* fix issue (#180) that localization mode may terminate unexpectedly with exception about pthread_mutex_lock
SteveMacenski pushed a commit that referenced this issue Apr 29, 2021
…th exception about pthread_mutex_lock (#390)

* fix the issue (#198) that loop closures can rotate the map 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).

* linting things

* add function GetCorrectedAt to simplify getting robot pose

* add update() when setting corrected pose after optimizing

* fix issue (#180) that localization mode may terminate unexpectedly with exception about pthread_mutex_lock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants