Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-8113: [C++] Lighter weight variant<> #8472

Closed
wants to merge 6 commits into from

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Oct 15, 2020

  • Unit tests for util::Variant<>
  • Compilation time and code size comparison to ensure this refactoring is beneficial
  • Benchmark comparison to ensure compute and nested parquet are not impacted
  • More docstrings and comments

Locally (release build, no ccache):

$ ninja
ninja: no work to do.
$ export OBJS=`ls src/arrow/CMakeFiles/arrow_objlib.dir/compute/**/*.o`

$ git checkout master && rm $OBJS && time ninja $OBJS
real    0m16.918s
user    2m41.702s
sys     0m2.495s

$ du -h --total $OBJS | tail -n 1
total 7.3M

$ git checkout 8113-Implement-a-lighter-weigh && rm $OBJS && time ninja $OBJS
real    0m14.363s
user    2m19.939s
sys     0m2.125s

$ du -h --total $OBJS | tail -n 1
6.6M    total

@bkietz bkietz requested a review from pitrou October 15, 2020 17:51
@github-actions
Copy link

Comment on lines 208 to 212
detail::conditional_t<
detail::all<(std::is_copy_constructible<T>::value &&
std::is_copy_assignable<T>::value)...>::value,
detail::explicit_copy_constructor,
detail::delete_copy_constructor>::template type<Variant<T...>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to borrow this idiom to disable copy construction for Result<std::unique_ptr<T>>

@emkornfield
Copy link
Contributor

drive by comment. just want to make sure we aren't using the same idioms for variant (i.e. they are compatible with C++ std)?

@bkietz
Copy link
Member Author

bkietz commented Oct 19, 2020

@emkornfield I'm implementing a subset of std::variant's functionality but all members which are present are compatible; if we switch to using std::variant at some point no code will break. We will need to keep the mutable-pointer visit overload since that's non standard, but IMHO that's acceptable overhead

@wesm
Copy link
Member

wesm commented Oct 20, 2020

@bkietz presumably this yields smaller code size, too?

@bkietz
Copy link
Member Author

bkietz commented Oct 20, 2020

Updated description with build times and code sizes for a release build

@pitrou
Copy link
Member

pitrou commented Oct 22, 2020

This seems to speed up the compute layer a bit.

  • Before:
ArrayArrayKernel<Add, Int64Type>/32768/100       3781 ns         3780 ns       738685 bytes_per_second=8.07385G/s items_per_second=1083.65M/s null_percent=1 size=32.768k
  • After:
ArrayArrayKernel<Add, Int64Type>/32768/100       3615 ns         3614 ns       777365 bytes_per_second=8.44344G/s items_per_second=1.13326G/s null_percent=1 size=32.768k

@pitrou
Copy link
Member

pitrou commented Nov 10, 2020

@bkietz Do you plan to update this?

@bkietz bkietz force-pushed the 8113-Implement-a-lighter-weigh branch from 27560d0 to 40430af Compare November 11, 2020 20:21
@bkietz
Copy link
Member Author

bkietz commented Nov 13, 2020

@pitrou yes but not soon

@pitrou pitrou force-pushed the 8113-Implement-a-lighter-weigh branch from 40430af to a03eb14 Compare November 30, 2020 13:32
@pitrou
Copy link
Member

pitrou commented Nov 30, 2020

Rebased, I'm writing some micro-benchmarks.

@pitrou pitrou force-pushed the 8113-Implement-a-lighter-weigh branch from a03eb14 to c1b3a75 Compare November 30, 2020 15:34
@pitrou pitrou changed the title ARROW-8113: [C++][WIP] Lighter weight variant<> ARROW-8113: [C++] Lighter weight variant<> Nov 30, 2020
@pitrou
Copy link
Member

pitrou commented Nov 30, 2020

Variant benchmarks:

  • before:
ConstructTrivialVariant         13290 ns        13287 ns        52532 items_per_second=752.595M/s
ConstructNonTrivialVariant     111230 ns       111215 ns         6202 items_per_second=89.916M/s
VisitTrivialVariant             25744 ns        25740 ns        27269 items_per_second=388.497M/s
VisitNonTrivialVariant          24389 ns        24385 ns        28649 items_per_second=410.082M/s
ConstructDatum                 102925 ns       102911 ns         7190 items_per_second=97.1714M/s
VisitDatum                       6204 ns         6204 ns       112708 items_per_second=1.61199G/s
  • after:
ConstructTrivialVariant         13952 ns        13951 ns        50116 items_per_second=716.811M/s
ConstructNonTrivialVariant     108507 ns       108489 ns         6412 items_per_second=92.1753M/s
VisitTrivialVariant              5848 ns         5847 ns       119628 items_per_second=1.71014G/s
VisitNonTrivialVariant           5973 ns         5972 ns       118440 items_per_second=1.6745G/s
ConstructDatum                  98938 ns        98923 ns         7038 items_per_second=101.088M/s
VisitDatum                       5566 ns         5565 ns       125741 items_per_second=1.79701G/s

@pitrou
Copy link
Member

pitrou commented Nov 30, 2020

Compute overhead benchmarks:

  • before:
BM_CastDispatch                      971704 ns       971533 ns          698 items_per_second=1054k/s
BM_CastDispatchBaseline               97365 ns        97352 ns         6955 items_per_second=10.5185M/s
BM_AddDispatch                          165 ns          165 ns      4232825 items_per_second=6.07671M/s
BM_ExecuteScalarFunctionOnScalar    4666801 ns      4665955 ns          149 items_per_second=2.14318M/s
BM_ExecuteScalarKernelOnScalar       809563 ns       809419 ns          849 items_per_second=12.3545M/s
  • after:
BM_CastDispatch                      914006 ns       913869 ns          759 items_per_second=1.12051M/s
BM_CastDispatchBaseline               95985 ns        95973 ns         7287 items_per_second=10.6697M/s
BM_AddDispatch                          167 ns          167 ns      4137185 items_per_second=5.99082M/s
BM_ExecuteScalarFunctionOnScalar    4345926 ns      4345337 ns          161 items_per_second=2.30132M/s
BM_ExecuteScalarKernelOnScalar       847525 ns       847411 ns          835 items_per_second=11.8007M/s

@pitrou
Copy link
Member

pitrou commented Nov 30, 2020

@bkietz Do you know why all Github CI checks are skipped here?

@pitrou
Copy link
Member

pitrou commented Nov 30, 2020

AppVeyor seems to indicate that MSVC isn't picking up / implementing the copy constructor correctly :-(

@pitrou pitrou force-pushed the 8113-Implement-a-lighter-weigh branch 2 times, most recently from 16a69ec to 9a725a6 Compare December 1, 2020 19:55
@pitrou
Copy link
Member

pitrou commented Dec 2, 2020

@bkietz You may want to review the changes I've done. I think only docs and comments remain to be done.

@pitrou
Copy link
Member

pitrou commented Dec 3, 2020

I added some docstrings. I think this is ready now.

@pitrou
Copy link
Member

pitrou commented Dec 3, 2020

Hmm, I forgot to push some changes...

@pitrou pitrou force-pushed the 8113-Implement-a-lighter-weigh branch from 9ab2798 to 50cef42 Compare December 3, 2020 16:46
@bkietz
Copy link
Member Author

bkietz commented Dec 7, 2020

@pitrou thanks for picking this up! Looks good to me

@pitrou
Copy link
Member

pitrou commented Dec 8, 2020

Ok, I'll rebase a last time to make sure this doesn't break anything.

@pitrou pitrou force-pushed the 8113-Implement-a-lighter-weigh branch from 50cef42 to 168aa54 Compare December 8, 2020 09:50
@pitrou pitrou closed this in 6c831cd Dec 8, 2020
@kou
Copy link
Member

kou commented Dec 9, 2020

I don't know why but It seems that this causes g++ 9 crash:

https://github.com/ursa-labs/crossbow/runs/1522097163#step:4:1237

during GIMPLE pass: pre
In file included from /usr/include/c++/9/functional:59,
                 from /arrow/cpp/src/arrow/vendored/string_view.hpp:1480,
                 from /arrow/cpp/src/arrow/util/string_view.h:25,
                 from /arrow/cpp/src/arrow/buffer.h:31,
                 from /arrow/cpp/src/arrow/array/data.h:26,
                 from /arrow/cpp/src/arrow/array/array_base.h:26,
                 from /arrow/cpp/src/arrow/array/builder_binary.h:30,
                 from /arrow/cpp/src/arrow/compute/kernels/codegen_internal.h:26,
                 from /arrow/cpp/src/arrow/compute/kernels/codegen_internal.cc:18:
/usr/include/c++/9/bits/std_function.h: In static member function 'static void std::_Function_handler<void(_ArgTypes ...), _Functor>::_M_invoke(const std::_Any_data&, _ArgTypes&& ...) [with _Functor = arrow::compute::internal::MakeFlippedBinaryExec(arrow::compute::ArrayKernelExec)::<lambda(arrow::compute::KernelContext*, const arrow::compute::ExecBatch&, arrow::Datum*)>; _ArgTypes = {arrow::compute::KernelContext*, const arrow::compute::ExecBatch&, arrow::Datum*}]':
/usr/include/c++/9/bits/std_function.h:298:7: internal compiler error: Segmentation fault
  298 |       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
      |       ^~~~~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-9/README.Bugs> for instructions.
make[2]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/build.make:1610: src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/codegen_internal.cc.o] Error 1

We can reproduce this on local:

(cd cpp/examples/minimal_build && docker-compose build static && docker-compose run --rm static)

@kou
Copy link
Member

kou commented Dec 10, 2020

@bkietz bkietz deleted the 8113-Implement-a-lighter-weigh branch February 25, 2021 16:12
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
- [x] Unit tests for `util::Variant<>`
- [x] Compilation time and code size comparison to ensure this refactoring is beneficial
- [x] Benchmark comparison to ensure compute and nested parquet are not impacted
- [x] More docstrings and comments

Locally (release build, no ccache):

```
$ ninja
ninja: no work to do.
$ export OBJS=`ls src/arrow/CMakeFiles/arrow_objlib.dir/compute/**/*.o`

$ git checkout master && rm $OBJS && time ninja $OBJS
real    0m16.918s
user    2m41.702s
sys     0m2.495s

$ du -h --total $OBJS | tail -n 1
total 7.3M

$ git checkout 8113-Implement-a-lighter-weigh && rm $OBJS && time ninja $OBJS
real    0m14.363s
user    2m19.939s
sys     0m2.125s

$ du -h --total $OBJS | tail -n 1
6.6M    total
```

Closes apache#8472 from bkietz/8113-Implement-a-lighter-weigh

Lead-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants