Use cpp11 for rpkg #2974

nbenn · 2022-01-22T17:11:28Z

This supersedes PR #2948 (which I will close in favor of this). In addition to #2948, this

fixes a compiler warning that has snuck into Use cpp11 package #2948
adds some infrastructure for performance regression testing of the R package

Currently, this generates data (2 int cols, 1 double col, 1 string col & 1 factor col), with {1e3, 1e5, 1e7} rows and the following benchmarks are run

write_df: data is written using dbWriteTable() and removed again with dbRemoveTable()
register_df: data is registered using duckdb_register() and unregistered with duckdb_unregister()
register_arrow: an arrow InMemoryDataset is registered using duckdb_register_arrow() and unregistered with duckdb_unregister_arrow()
read_from_tbl: a native table is read using dbReadTable()
select_from_tbl: a row subset of a native table is read using dbGetQuery()
read_from_df: dbReadTable() is used to read a registered data.frame
select_from_df: a row subset is read from a registered data.frame
read_from_arw: dbReadTable() is used to read a registered arrow table
select_from_arw: a row subset is read from a registered arrow table

Row subsetting is done as SELECT * FROM table WHERE column > 50 and column is generated as U(0, 100). Benchmarks are repeated 50 times.

@hannesmuehleisen I'm happy to add (or remove) benchmarks if there are relevant aspects that are not yet covered. This is intended as a starting point.

I have attached results of such a benchmark, comparing this (cpp11) to both master and v0.3.1 (release). In the spirit of

duckdb/scripts/regression_test.py

Line 3 in bd14023

# If there is a diference of 10% in regression on any query the build breaks.

red indicates slower and green faster that master +/- 10% (comparing median runtimes). If the threshold is set to 15% as actually seems to be the case in

duckdb/scripts/regression_test.py

Line 179 in bd14023

regression_test(0.15,benchmark)

failures only remain in the nrows = 1000 setting. We could of course automate some of this in CI for e.g. a 10e6 row setting or so and fail if slower than master + 15%, similarly to what is done in regression_test.py.

… that takes an initializer list

hannes · 2022-01-23T06:01:53Z

Thanks for the benchmark, what was GC level again?

nbenn · 2022-01-24T12:17:41Z

R uses a generational garbage collector, which divides allocated nodes into generations based on some notion of age. Younger generations are collected more frequently than older ones. From R-ints:

There are three levels of collections. Level 0 collects only the youngest generation, level 1 collects the two youngest generations and level 2 collects all generations. After 20 level-0 collections the next collection is at level 1, and after 5 level-1 collections at level 2. Additionally, if a level-n collection fails to provide 20% free space (for each of nodes and the vector heap), the next collection will be at level n+1. The R-level function gc() performs a level-2 collection

The people behind the benchmarking package bench, which is used here, have decided that it makes most sense to filter out runs where garbage collections occurred, due to this making things a bit less directly comparable. So for the slower/same/faster comparison, only red dots are actually taken into account. Personally, I don't find it obvious that it always is a good idea to do so. After all, if you're comparing an implementation that triggers GC more frequently, you'd want this penalized in some form. We do have the number of GC invocations per run available (together with other things, such as total allocated memory) and if you're interested in seeing this as well, I'm happy to incorporate this information somehow, or share the raw data as well.

As a side remark: results are not super stable. I just reran and this time, for both thresholds of 0.1 and 0.15, "slower" cases are only present in the nrow: 1000 setting.

hannes · 2022-01-26T08:49:55Z

Can you please merge with master, should fix the failing test.

nbenn · 2022-01-26T09:33:56Z

@hannesmuehleisen Thanks!

Can you please merge with master, should fix the failing test.

If this was directed at me, Gh is telling me that I'm not authorized to merge the PR.

hannes · 2022-01-26T11:37:40Z

@nbenn no I mean merge with the latest upstream master and commit the merge to this branch

nbenn and others added 30 commits January 19, 2022 10:34

vendor cpp11

df0d685

Include header

21c56d2

Use cpp11::list

6cd2648

Use list

85baa8b

No need to protect

a342e11

Use list

4dacf1f

Use doubles and sexp

46d8870

Remove unused

48b4383

Use sexp

e34e166

Use list initialization

c3c8cf1

Use cpp11 classes for values

bd1767a

Reorder

0a11691

cpp11 registration

6232057

more specific includes

22b3e76

use cpp11::stop

6539e62

retire RApi::REvalThrows

fbfe47b

retire RApi::REvalRerror()

7eb7e13

use cpp11::warning()

76a290c

register

a3b9d99

API typing

7caa3de

void functions

7e9d80e

Move definition of RStatement

e32fadb

Type-stable R interface

fed3ebf

Work around gcc build, only std::writable::r_vector has a constructor…

bea7213

… that takes an initializer list

Avoid hpp extension

8316fb9

rm fwd decls & blanket include cpp11 headers

f7ad635

add performance regression check for rpkg

7cf1717

add tests to perf reg

75c11e0

make perf reg script more general

d35da51

Merge branch 'master' into cpp11

03f298a

nbenn added 6 commits January 21, 2022 19:50

fix naming error introduced in merge

16b19cf

clang format rpkg

0de8129

add perf check

7e02435

fix formatting for cpp11 auto-gen

06e53ba

add clang-format exception

2fba9e9

redo 8316fb9

a681b78

nbenn added 3 commits January 23, 2022 09:24

move vendored cpp11 code to inst

11e0497

move perf reg tests in rpkg

bfbdcd7

minor changes to the r regression tests

b7546ff

nbenn requested a review from hannes January 26, 2022 07:56

hannes approved these changes Jan 26, 2022

View reviewed changes

Merge remote-tracking branch 'origin/master' into cpp11

c891e53

hannes merged commit c8fc684 into duckdb:master Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cpp11 for rpkg #2974

Use cpp11 for rpkg #2974

nbenn commented Jan 22, 2022 •

edited

hannes commented Jan 23, 2022

nbenn commented Jan 24, 2022

hannes commented Jan 26, 2022

nbenn commented Jan 26, 2022

hannes commented Jan 26, 2022

Use cpp11 for rpkg #2974

Use cpp11 for rpkg #2974

Conversation

nbenn commented Jan 22, 2022 • edited

hannes commented Jan 23, 2022

nbenn commented Jan 24, 2022

hannes commented Jan 26, 2022

nbenn commented Jan 26, 2022

hannes commented Jan 26, 2022

nbenn commented Jan 22, 2022 •

edited