Reimplement boost::split and optimize tokenization by djanekovic · Pull Request #5285 · PointCloudLibrary/pcl

djanekovic · 2022-06-04T14:13:44Z

Hi all!

This PR is addressing #4333.
Boost::split was pretty aggressive with allocations and they were slowing down parallel and sequential execution.
New implementation is generic in way that it will work with current std::vector<std::string> result type but it should also work with C++17 std::string_view instead of std::string.

Implementation of pcl::split is currently used for pcd and obj file parsing, see benchmarks below.

Before:

PCD - temp.pcd
11,119,959 allocs, 635,551,067 bytes allocated

OBJ - power_lines.obj
107,061 allocs, 6,143,143 bytes allocated

After:

PCD - temp.pcd
50 allocs, 76,378,011 bytes allocated

OBJ - power_lines.obj
12,957 allocs, 1,087,645 bytes allocated

Just a minor comment, majority of allocations in obj_io.cpp is now done in boost::lexical_cast. I think I could optimize it away since it is a bit of overkill there but that would be a new PR and I would need to experiment a bit.

Thanks in advance and kind regards,
Darko

mvieth · 2022-06-07T19:21:49Z

Thank you! This looks very promising.

Could you write some individual tests for pcl::split?
Do you have a guess why boost::split is apparently so much worse? Is it just badly implemented? Does it offer extra functionality that we don't need in PCL, that adds a lot of complexity? Some other reason?

djanekovic · 2022-06-17T20:40:18Z

Hi,

Thanks for taking a look at this and for nice initial comments :)

Could you write some individual tests for pcl::split?

Sure thing! Currently, I am a bit swarmed by everything but I will try to find time in order to do it. I think I will be able to do it in a week or something like that.

Do you have a guess why boost::split is apparently so much worse? Is it just badly implemented? Does it offer extra functionality that we don't need in PCL, that adds a lot of complexity? Some other reason?

First and biggest thing why boost::split is slower is that split will generate a vector of tokens called tmp and then swap that vector with a result. In other words all the allocations we have in our st vector will be wasted on every line we read from file.

Reads:     666,423,021 bytes (40.7%, 37,509.97/Minstr), 1.87/byte
Writes:    355,834,528 bytes (44.78%, 20,028.34/Minstr), 1/byte
^1: 0x48DF502: allocate (new_allocator.h:115)
^2: 0x48DF502: allocate (alloc_traits.h:460)
^3: 0x48DF502: _M_allocate (stl_vector.h:346)
^4: 0x48DF502: void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (vector.tcc:440)
^5: 0x48D880C: emplace_back<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (vector.tcc:121)
^6: 0x48D880C: _M_range_initialize<boost::iterators::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::__cxx11::basic_string<char>, __gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::use_default, boost::use_default> > (stl_vector.h:1564)
^7: 0x48D880C: vector<boost::iterators::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::__cxx11::basic_string<char>, __gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::use_default, boost::use_default> > (stl_vector.h:657)
^8: 0x48D880C: std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >& boost::algorithm::iter_split<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > >(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> >) [clone .isra.0] (iter_find.hpp:186)
#9: 0x48D8ECB: split<std::vector<std::__cxx11::basic_string<char> >, std::__cxx11::basic_string<char>&, boost::algorithm::detail::is_any_ofF<char> > (split.hpp:158)
#10: 0x48D8ECB: pcl::PCDReader::readBodyASCII(std::istream&, pcl::PCLPointCloud2&, int) (pcd_io.cpp:453)

see iter_find.hpp:186 here https://github.com/boostorg/algorithm/blob/00c6f1d6c10508d06598466a15190fc9812c9100/include/boost/algorithm/string/iter_find.hpp#L186

The other thing is functor copy constructions that happen because functors that do the splitting are not trivial and they need a heap allocation. Quick note, I don't fully understand this part so this is maybe gibberish.. Implementation in the boost starts to get magical around this point...

^1: 0x48DE05C: manager (function_base.hpp:349)
^2: 0x48DE05C: manager (function_base.hpp:379)
^3: 0x48DE05C: boost::detail::function::functor_manager<boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > >::manage(boost::detail::function::function_buffer const&, boost::detail::function::function_buffer&, boost::detail::function::functor_manager_operation_type) (function_base.hpp:404)
^4: 0x48D8126: assign_to_own (function_template.hpp:915)
^5: 0x48D8126: function2 (function_template.hpp:746)
^6: 0x48D8126: find_iterator_base (find_iterator.hpp:46)
^7: 0x48D8126: split_iterator (find_iterator.hpp:248)
^8: 0x48D8126: iterator_adaptor (iterator_adaptor.hpp:264)
^9: 0x48D8126: transform_iterator (transform_iterator.hpp:92)
^10: 0x48D8126: make_transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::__cxx11::basic_string<char>, __gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > > > (transform_iterator.hpp:137)
^11: 0x48D8126: std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >& boost::algorithm::iter_split<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > >(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> >) [clone .isra.0] (iter_find.hpp:176)
#12: 0x48D8ECB: split<std::vector<std::__cxx11::basic_string<char> >, std::__cxx11::basic_string<char>&, boost::algorithm::detail::is_any_ofF<char> > (split.hpp:158)
#13: 0x48D8ECB: pcl::PCDReader::readBodyASCII(std::istream&, pcl::PCLPointCloud2&, int) (pcd_io.cpp:453)

I am attaching full DHAT report generated from master branch without changes in this PR. You can check it out by opening: file:///usr/libexec/valgrind/dh_view.html in your browser and then loading the attached file. I wouldn't say the implementation is bad, I would say it is a bit weird because there is no notion of reusing allocations and that becomes very painful when the function is called in a loop several million times..

pcd_master_dhat.txt

If I get smarter about this in the meantime, I will try to correct myself here.

Thanks, Darko

This commit reimplements boost::split in a io/split.h. Implementation is then used for obj and pcd file parsing. Boost::split was reimplemented because the function was aggressively allocating new memory in a hot parsing loops and that was slowing down sequential and parallel execution. New implementation is generic in way that it will work with current std::vector<std::string> result type but also with a std::vector<std::string_view> result type. Signed-off-by: Darko Janekovic <darko.janekovic@gmail.com>

djanekovic · 2022-06-22T09:33:05Z

Hi, I pushed one set of tests for pcl::split so I think this is ready for another look.
Please advise if you think there are some more edge cases we would need to cover.

mvieth · 2022-06-22T18:53:01Z

Thank you for the insight into boost:split!
Two cases you haven't covered yet would be a non-empty string with only delimiter-characters (e.g. " \t "), and a non-empty string without delimiter-characters (e.g. "abcd")

Signed-off-by: Darko Janekovic <darko.janekovic@gmail.com>

djanekovic · 2022-06-24T17:45:52Z

These two new test cases are added

mvieth

Great, thank you! This is a really nice speed/memory improvement!

djanekovic · 2022-07-03T19:56:07Z

Thanks! First and hopefully not the last improvement from me :)

djanekovic force-pushed the djanekovic/reimplement_boost_split branch from c32ba62 to 7402a6e Compare June 4, 2022 14:14

djanekovic commented Jun 4, 2022

View reviewed changes

Comment thread io/src/obj_io.cpp

Comment thread io/src/obj_io.cpp

Comment thread io/include/pcl/io/split.h Outdated

mvieth added changelog: enhancement Meta-information for changelog generation module: io labels Jun 7, 2022

djanekovic force-pushed the djanekovic/reimplement_boost_split branch 2 times, most recently from 4068d49 to eeb2e20 Compare June 21, 2022 21:05

Added tests for pcl::split tokenization function

0f42b35

Signed-off-by: Darko Janekovic <darko.janekovic@gmail.com>

djanekovic force-pushed the djanekovic/reimplement_boost_split branch from eeb2e20 to 0f42b35 Compare June 24, 2022 17:42

mvieth approved these changes Jul 3, 2022

View reviewed changes

larshg approved these changes Jul 4, 2022

View reviewed changes

mvieth merged commit b7c2537 into PointCloudLibrary:master Jul 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reimplement boost::split and optimize tokenization#5285

Reimplement boost::split and optimize tokenization#5285
mvieth merged 2 commits intoPointCloudLibrary:masterfrom
djanekovic:djanekovic/reimplement_boost_split

djanekovic commented Jun 4, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mvieth commented Jun 7, 2022

Uh oh!

djanekovic commented Jun 17, 2022

Uh oh!

djanekovic commented Jun 22, 2022

Uh oh!

mvieth commented Jun 22, 2022 •

edited

Loading

Uh oh!

djanekovic commented Jun 24, 2022

Uh oh!

mvieth left a comment

Uh oh!

djanekovic commented Jul 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

djanekovic commented Jun 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mvieth commented Jun 7, 2022

Uh oh!

djanekovic commented Jun 17, 2022

Uh oh!

djanekovic commented Jun 22, 2022

Uh oh!

mvieth commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djanekovic commented Jun 24, 2022

Uh oh!

mvieth left a comment

Choose a reason for hiding this comment

Uh oh!

djanekovic commented Jul 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

djanekovic commented Jun 4, 2022 •

edited

Loading

mvieth commented Jun 22, 2022 •

edited

Loading