Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the issue (#198) that loop closures can rotate the map 180° if the lidar is mounted backwards #326

Merged
merged 4 commits into from
Mar 23, 2021

Conversation

WLwind
Copy link
Contributor

@WLwind WLwind commented Feb 3, 2021

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).


Basic Info

Info Please fill out this column
Ticket(s) this addresses (#198 )
Primary OS tested on (Ubuntu 20.04 ROS Noetic and Ubuntu 18.04 ROS Melodic)
Robotic platform tested on (Handsfree mini with RPLIDAR A2 mounted backwards)

Description of contribution in a few bullet points

The reason of weird results of loop closures when lidars are mounted backwards is that the vertices and the edges of the solver are not based on the same frame. The original code sets the pose of the base link (base_footprint) as the vertex pose, but the edges are constraints of 2 poses of sensor frame (laser). That's really unreasonable because the tf between base poses and the tf between their corresponding sensor poses are not the same! The differences are significantly big especially when the sensor is at the opposite orientation of the robot base. So this uncommon loop closure will always happen (more or less) unless the sensor frame is at the exact pose of the base. This is regardless of whether the lidar is 360° or whether it's an RPLIDAR.

I made the edges and vertices based on the same frame. I simply set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).
It works well with my backwards mounted RPLIDAR A2.

Description of documentation updates required from your changes


Future work that may be required in bullet points

I don't have a ROS2 robot yet. Is anyone there be kind to test this modification on a ROS2 robot?

…ap 180° if the lidar is mounted backwards

Set the edges (constraints) based on the robot base frame (base_footprint) instead of sensor frame (laser).
@SteveMacenski
Copy link
Owner

SteveMacenski commented Feb 3, 2021

So by backwards, you do not mean up-side down, correct? By the way, just going to chat on this PR since all the others are the same, I'll just merge them all at once but use this as the proxy.

#198 (comment) <-- Can you show that you can work on this dataset and have it work? They give a pretty good backwards-lidar dataset here. We resolved this by changing a loss function, so please make sure you don't use that one for your testing to show that it also fixes the issue (and does so more reliably).

So I think we're getting more to the root of all these rplidar problems then. It sounds like there were CPU (fixed), backwards (or rather, non-directionally aligned), and 360 param read in wrong (fixed). There's also this "flipping" problem only 360 lidar users report (#281) that I also wonder if this would fix if this indeed is a functional solution... (also makes me wonder if this was my issue with the multi-lidar support testing... mhm...). It would be mighty convenient if this fixed that too. I've never been able to replicate on professional 270 lidars so its been hard for me to be able to spend the time to fix it. Something along these lines could be the root cause or at least a new way of thinking I can use to see if I can spot other locations in the Karto code this is wrong.

I'll review in a moment. I super appreciate you digging into this code. It's been hard to fix all these issues without someone else to bounce ideas off of that's dug into the core SLAM code. 👏

slam_toolbox/lib/karto_sdk/src/Mapper.cpp Show resolved Hide resolved
slam_toolbox/lib/karto_sdk/src/Mapper.cpp Outdated Show resolved Hide resolved
slam_toolbox/lib/karto_sdk/src/Mapper.cpp Outdated Show resolved Hide resolved
@WLwind
Copy link
Contributor Author

WLwind commented Feb 4, 2021

So by backwards, you do not mean up-side down, correct? By the way, just going to chat on this PR since all the others are the same, I'll just merge them all at once but use this as the proxy.

#198 (comment) <-- Can you show that you can work on this dataset and have it work? They give a pretty good backwards-lidar dataset here. We resolved this by changing a loss function, so please make sure you don't use that one for your testing to show that it also fixes the issue (and does so more reliably).

So I think we're getting more to the root of all these rplidar problems then. It sounds like there were CPU (fixed), backwards (or rather, non-directionally aligned), and 360 param read in wrong (fixed). There's also this "flipping" problem only 360 lidar users report (#281) that I also wonder if this would fix if this indeed is a functional solution... (also makes me wonder if this was my issue with the multi-lidar support testing... mhm...). It would be mighty convenient if this fixed that too. I've never been able to replicate on professional 270 lidars so its been hard for me to be able to spend the time to fix it. Something along these lines could be the root cause or at least a new way of thinking I can use to see if I can spot other locations in the Karto code this is wrong.

I'll review in a moment. I super appreciate you digging into this code. It's been hard to fix all these issues without someone else to bounce ideas off of that's dug into the core SLAM code. clap

I've recorded a video of mapping process using #198 dataset.
https://user-images.githubusercontent.com/46443331/106920506-1ef9ef80-6746-11eb-9145-7b2a9c26ff76.mp4
It works pretty well with 18 loop closures.

@SteveMacenski
Copy link
Owner

Thanks for the verification. @coderkarl also verified which is good to have an external thumbs up. I agree this is an issue and now I agree that your solution is workable. Now we're just on the details of making sure this is the right place to put it / if there's a better way to accomplish the same task.

@WLwind
Copy link
Contributor Author

WLwind commented Feb 5, 2021

When I try localization mode with new code, every time loop closure comes, there comes a warning of ceres CeresSolver: Ceres could not find a usable solution to optimize.. So I think there is something I didn't consider.

@SteveMacenski
Copy link
Owner

SteveMacenski commented Feb 5, 2021

The only difference I can tell is that old L1611 SetCorrectedPose() sets m_CorrectedPose but did not call Update() so from your pose-swapping, the values might be different? Also in SetCorrectedPose() it sets m_IsDirty = true but I don't think that should cause something. Both of those would serialize though for a localization session.

Did you check the inputs / outputs of the new function to make sure its working properly? If mapping works OK and just localization, I feel like it does, but just checking. I highly doubt its the issue, but also try removing inline. Maybe after serialization that's making something unhappy.

@WLwind
Copy link
Contributor Author

WLwind commented Feb 5, 2021

The only difference I can tell is that old L1611 SetCorrectedPose() sets m_CorrectedPose but did not call Update() so from your pose-swapping, the values might be different? Also in SetCorrectedPose() it sets m_IsDirty = true but I don't think that should cause something. Both of those would serialize though for a localization session.

Did you check the inputs / outputs of the new function to make sure its working properly? If mapping works OK and just localization, I feel like it does, but just checking. I highly doubt its the issue, but also try removing inline. Maybe after serialization that's making something unhappy.

I mean b0800ed, not just 5cb6690.
Both of them show warning CeresSolver: Ceres could not find a usable solution to optimize. in localization mode. But 5cb6690 shows more:
W0206 02:13:16.236030 10238 parameter_block.h:349] Local parameterization Jacobian computation returnedan invalid matrix for x: -nan Jacobian matrix : -nan
I think the problem is from deserialization. Because when I tried to make that "1 line change" to make sensor poses as ceres nodes, the deserialization process threw an exception terminate called after throwing an instance of 'karto::Exception'. For this reason I switched to another way, making constraints between 2 base poses. Although it doesn't threw exceptions during deserialization now, I still think that there might be some faults.

@SteveMacenski
Copy link
Owner

SteveMacenski commented Feb 5, 2021

To repeat that back to you, you think this change makes the serialized data wrong so things are failing? that would make sense, we're messing with things at a somewhat low level. ABI breaking changes like this might impact any data you've previously recorded, but its necessary to fix the bug.

What happens if you run it over a newly generated serialized file done with the code in this PR after the change?

@WLwind
Copy link
Contributor Author

WLwind commented Feb 7, 2021

Firstly I tried to find the problem with deserialization when I use "1 line change" solution, and (after adding lots of print function) I found that the problem is from addNode. The function need to GetCorrectedPose and it calls GetLaserRangeFinder. But I don't think the LaserRangeFinder is correctly set before deserialization in localization mode. So I think that is what makes the program throw the exception.
Let's go back to "all base pose" strategy. I've carefully compared the differences between 5cb6690 and the original code 44c9f84 and the main difference is in MapperGraph::CorrectPoses(). When it calls SetSensorPose it does an Update() after setting the pose, but SetCorrectedPose doesn't do that updating. I added the update and test with the dataset with mapping and localization mode. There was nothing new with mapping (18 loop closures). The localization mode made 9 loop closures but only 1 popped the warning CeresSolver: Ceres could not find a usable solution to optimize.. I can clearly see some loop closures optimizing the "walls", so I think this new commit solves the problem with localization mode mostly. It's workable now.

@SteveMacenski
Copy link
Owner

I don't understand why calling Update would change Ceres's ability to find a solution. Update is messing with the point values and resetting m_isDirty. It seems like is dirty is the major issue but every time we try to access the dirty info, there's a check and update is called anyhow in the object

class LocalizedRangeScan : public LaserRangeScan

Does CeresSolver: Ceres could not find a usable solution to optimize. pop up when using a new serialized file from this current set of code? I suspect that it now has nothing to do with Update() if it happens on occasion and more do with this change pEdge->SetLabel(new LinkInfo(pFromScan->GetCorrectedPose(), pToScan->GetCorrectedAt(rMean), rCovariance)); since the graph now would have data in it assuming rMean and then new data that is pToScan->GetCorrectedAt(rMean). If it goes away from new serialized data, then I think we're good. If it still appears, that's still a problem we need to solve.

A single small dataset of 9 loop closures having it appear once might not be the worst-case situation. I don't think it was happening at all prior. If we have new serialized files with the new code, then it should be acting at least as good as before in localization mode if we did everything correctly, yes?

@SteveMacenski
Copy link
Owner

SteveMacenski commented Feb 16, 2021

@WLwind any update? Just trying to iterate on the best solution to this issue so that it doesn't come back and bite you

@WLwind
Copy link
Contributor Author

WLwind commented Feb 17, 2021

@WLwind any update? Just trying to iterate on the best solution to this issue so that it doesn't come back and bite you

I'm busy these days and have little time to consider about this issue, sorry.

@SteveMacenski
Copy link
Owner

Closed the other 2 PRs since we're still working on getting this one passing completely

@WLwind
Copy link
Contributor Author

WLwind commented Mar 24, 2021

Thank you for testing! I really don't have time to test it these weeks.

@SteveMacenski
Copy link
Owner

Thanks for identifying this issue so we can finally put this issue to bed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[rplidar users] Potential errors with drivers + 360 lidar support
2 participants