Skip to content

Reimplement boost::split and optimize tokenization#5285

Merged
mvieth merged 2 commits intoPointCloudLibrary:masterfrom
djanekovic:djanekovic/reimplement_boost_split
Jul 4, 2022
Merged

Reimplement boost::split and optimize tokenization#5285
mvieth merged 2 commits intoPointCloudLibrary:masterfrom
djanekovic:djanekovic/reimplement_boost_split

Conversation

@djanekovic
Copy link
Copy Markdown
Contributor

@djanekovic djanekovic commented Jun 4, 2022

Hi all!

This PR is addressing #4333.
Boost::split was pretty aggressive with allocations and they were slowing down parallel and sequential execution.
New implementation is generic in way that it will work with current std::vector<std::string> result type but it should also work with C++17 std::string_view instead of std::string.

Implementation of pcl::split is currently used for pcd and obj file parsing, see benchmarks below.

Before:

PCD - temp.pcd
11,119,959 allocs, 635,551,067 bytes allocated

OBJ - power_lines.obj
107,061 allocs, 6,143,143 bytes allocated

After:

PCD - temp.pcd
50 allocs, 76,378,011 bytes allocated

OBJ - power_lines.obj
12,957 allocs, 1,087,645 bytes allocated

Just a minor comment, majority of allocations in obj_io.cpp is now done in boost::lexical_cast. I think I could optimize it away since it is a bit of overkill there but that would be a new PR and I would need to experiment a bit.

Thanks in advance and kind regards,
Darko

@djanekovic djanekovic force-pushed the djanekovic/reimplement_boost_split branch from c32ba62 to 7402a6e Compare June 4, 2022 14:14
Comment thread io/src/obj_io.cpp
Comment thread io/src/obj_io.cpp
Comment thread io/include/pcl/io/split.h Outdated
@mvieth mvieth added changelog: enhancement Meta-information for changelog generation module: io labels Jun 7, 2022
@mvieth
Copy link
Copy Markdown
Member

mvieth commented Jun 7, 2022

Thank you! This looks very promising.

  • Could you write some individual tests for pcl::split?
  • Do you have a guess why boost::split is apparently so much worse? Is it just badly implemented? Does it offer extra functionality that we don't need in PCL, that adds a lot of complexity? Some other reason?

@djanekovic
Copy link
Copy Markdown
Contributor Author

Hi,

Thanks for taking a look at this and for nice initial comments :)

Could you write some individual tests for pcl::split?

Sure thing! Currently, I am a bit swarmed by everything but I will try to find time in order to do it. I think I will be able to do it in a week or something like that.

Do you have a guess why boost::split is apparently so much worse? Is it just badly implemented? Does it offer extra functionality that we don't need in PCL, that adds a lot of complexity? Some other reason?

First and biggest thing why boost::split is slower is that split will generate a vector of tokens called tmp and then swap that vector with a result. In other words all the allocations we have in our st vector will be wasted on every line we read from file.

Reads:     666,423,021 bytes (40.7%, 37,509.97/Minstr), 1.87/byte
Writes:    355,834,528 bytes (44.78%, 20,028.34/Minstr), 1/byte
^1: 0x48DF502: allocate (new_allocator.h:115)
^2: 0x48DF502: allocate (alloc_traits.h:460)
^3: 0x48DF502: _M_allocate (stl_vector.h:346)
^4: 0x48DF502: void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (vector.tcc:440)
^5: 0x48D880C: emplace_back<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (vector.tcc:121)
^6: 0x48D880C: _M_range_initialize<boost::iterators::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::__cxx11::basic_string<char>, __gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::use_default, boost::use_default> > (stl_vector.h:1564)
^7: 0x48D880C: vector<boost::iterators::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::__cxx11::basic_string<char>, __gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::use_default, boost::use_default> > (stl_vector.h:657)
^8: 0x48D880C: std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >& boost::algorithm::iter_split<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > >(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> >) [clone .isra.0] (iter_find.hpp:186)
#9: 0x48D8ECB: split<std::vector<std::__cxx11::basic_string<char> >, std::__cxx11::basic_string<char>&, boost::algorithm::detail::is_any_ofF<char> > (split.hpp:158)
#10: 0x48D8ECB: pcl::PCDReader::readBodyASCII(std::istream&, pcl::PCLPointCloud2&, int) (pcd_io.cpp:453)

see iter_find.hpp:186 here https://github.com/boostorg/algorithm/blob/00c6f1d6c10508d06598466a15190fc9812c9100/include/boost/algorithm/string/iter_find.hpp#L186

The other thing is functor copy constructions that happen because functors that do the splitting are not trivial and they need a heap allocation. Quick note, I don't fully understand this part so this is maybe gibberish.. Implementation in the boost starts to get magical around this point...

^1: 0x48DE05C: manager (function_base.hpp:349)
^2: 0x48DE05C: manager (function_base.hpp:379)
^3: 0x48DE05C: boost::detail::function::functor_manager<boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > >::manage(boost::detail::function::function_buffer const&, boost::detail::function::function_buffer&, boost::detail::function::functor_manager_operation_type) (function_base.hpp:404)
^4: 0x48D8126: assign_to_own (function_template.hpp:915)
^5: 0x48D8126: function2 (function_template.hpp:746)
^6: 0x48D8126: find_iterator_base (find_iterator.hpp:46)
^7: 0x48D8126: split_iterator (find_iterator.hpp:248)
^8: 0x48D8126: iterator_adaptor (iterator_adaptor.hpp:264)
^9: 0x48D8126: transform_iterator (transform_iterator.hpp:92)
^10: 0x48D8126: make_transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::__cxx11::basic_string<char>, __gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::__cxx11::basic_string<char> > > > (transform_iterator.hpp:137)
^11: 0x48D8126: std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >& boost::algorithm::iter_split<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> > >(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::algorithm::detail::token_finderF<boost::algorithm::detail::is_any_ofF<char> >) [clone .isra.0] (iter_find.hpp:176)
#12: 0x48D8ECB: split<std::vector<std::__cxx11::basic_string<char> >, std::__cxx11::basic_string<char>&, boost::algorithm::detail::is_any_ofF<char> > (split.hpp:158)
#13: 0x48D8ECB: pcl::PCDReader::readBodyASCII(std::istream&, pcl::PCLPointCloud2&, int) (pcd_io.cpp:453)

I am attaching full DHAT report generated from master branch without changes in this PR. You can check it out by opening: file:///usr/libexec/valgrind/dh_view.html in your browser and then loading the attached file. I wouldn't say the implementation is bad, I would say it is a bit weird because there is no notion of reusing allocations and that becomes very painful when the function is called in a loop several million times..

pcd_master_dhat.txt

If I get smarter about this in the meantime, I will try to correct myself here.

Thanks, Darko

This commit reimplements boost::split in a io/split.h. Implementation is
then used for obj and pcd file parsing. Boost::split was reimplemented
because the function was aggressively allocating new memory in a hot
parsing loops and that was slowing down sequential and parallel
execution.

New implementation is generic in way that it will work with current
std::vector<std::string> result type but also with a
std::vector<std::string_view> result type.

Signed-off-by: Darko Janekovic <darko.janekovic@gmail.com>
@djanekovic djanekovic force-pushed the djanekovic/reimplement_boost_split branch 2 times, most recently from 4068d49 to eeb2e20 Compare June 21, 2022 21:05
@djanekovic
Copy link
Copy Markdown
Contributor Author

Hi, I pushed one set of tests for pcl::split so I think this is ready for another look.
Please advise if you think there are some more edge cases we would need to cover.

@mvieth
Copy link
Copy Markdown
Member

mvieth commented Jun 22, 2022

Thank you for the insight into boost:split!
Two cases you haven't covered yet would be a non-empty string with only delimiter-characters (e.g. " \t "), and a non-empty string without delimiter-characters (e.g. "abcd")

Signed-off-by: Darko Janekovic <darko.janekovic@gmail.com>
@djanekovic djanekovic force-pushed the djanekovic/reimplement_boost_split branch from eeb2e20 to 0f42b35 Compare June 24, 2022 17:42
@djanekovic
Copy link
Copy Markdown
Contributor Author

These two new test cases are added

Copy link
Copy Markdown
Member

@mvieth mvieth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you! This is a really nice speed/memory improvement!

@djanekovic
Copy link
Copy Markdown
Contributor Author

Thanks! First and hopefully not the last improvement from me :)

@mvieth mvieth merged commit b7c2537 into PointCloudLibrary:master Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: enhancement Meta-information for changelog generation module: io

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants