Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement frev as fast base::rev alternative #5907

Open
wants to merge 73 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
d8db6e3
add macro version
ben-schwen Jan 12, 2024
c468589
write explicit parallel version
ben-schwen Jan 13, 2024
95208ac
copy attributes
ben-schwen Jan 13, 2024
fe474d7
add tests
ben-schwen Jan 13, 2024
ed845d3
Merge branch 'master' into frev
ben-schwen Jan 13, 2024
8ba2d2d
add to NAMESPACE
ben-schwen Jan 13, 2024
30ae580
add to tests
ben-schwen Jan 13, 2024
9839ef5
copy names
ben-schwen Jan 13, 2024
3b6fa52
add man page
ben-schwen Jan 13, 2024
1d1b0df
update man
ben-schwen Jan 13, 2024
320678d
fix typos
ben-schwen Jan 13, 2024
6943ebd
update tests
ben-schwen Jan 13, 2024
67bd0c9
add coverage
ben-schwen Jan 13, 2024
812a854
add benchmark example
ben-schwen Jan 13, 2024
529028a
coverage
ben-schwen Jan 13, 2024
73d2fdb
NEWS
ben-schwen Jan 13, 2024
b9e167c
trim NEWS
ben-schwen Jan 13, 2024
59b59ab
update NEWS
ben-schwen Jan 13, 2024
88d1ff9
add bit64
ben-schwen Jan 13, 2024
f85922a
update naming in NEWS
ben-schwen Jan 14, 2024
e4324cf
1.15.0 on CRAN. Bump to 1.15.99
MichaelChirico Jan 6, 2024
18a7209
Fix transform slowness (#5493)
OfekShilon Jan 6, 2024
b6bd964
Improvements to the introductory vignette (#5836)
Anirban166 Jan 6, 2024
68f0e41
Vignette typo patch (#5402)
davidbudzynski Jan 6, 2024
7e1a950
Improved handling of list columns with NULL entries (#4250)
sritchie73 Jan 7, 2024
d9d17a7
clarify that list input->unnamed list output (#5383)
MichaelChirico Jan 8, 2024
da24f85
fix subsetting issue in split.data.table (#5368)
MichaelChirico Jan 8, 2024
58608a2
switch to 3.2.0 R dep (#5905)
MichaelChirico Jan 12, 2024
c84a123
Allow early exit from check for eval/evalq in cedta (#5660)
MichaelChirico Jan 12, 2024
513f20f
frollmax1: frollmax, frollmax adaptive, left adaptive support (#5889)
jangorecki Jan 12, 2024
daee139
Friendlier error in assignment with trailing comma (#5467)
MichaelChirico Jan 14, 2024
f5ef168
Link to ?read.delim in ?fread to give a closer analogue of expected b…
MLopez-Ibanez Jan 13, 2024
f658ff4
Run GHA jobs on 1-15-99 dev branch (#5909)
MichaelChirico Jan 14, 2024
53149ed
prohibit matrix
ben-schwen Jan 14, 2024
a99d32f
readd deleted line
ben-schwen Jan 14, 2024
a56b796
Make declarations static for covr (#5910)
MichaelChirico Jan 15, 2024
1bef92c
reorder code
ben-schwen Jan 15, 2024
6d6d1cd
Merge branch 'frev' of github.com:Rdatatable/data.table into frev
ben-schwen Jan 15, 2024
a6907ad
return invisible if inplace
ben-schwen Jan 15, 2024
1e9f481
cut to 1 line
ben-schwen Jan 15, 2024
07fbea8
use isTRUE for copy=NA
ben-schwen Jan 15, 2024
a285661
speedup strings and lists
ben-schwen Jan 15, 2024
4318bb7
add Hughs comments
ben-schwen Jan 16, 2024
86d3d59
add coverage
ben-schwen Jan 16, 2024
c507fa5
dedup INTSXP LGLSXP
ben-schwen Jan 16, 2024
08b3591
make tests lighter
ben-schwen Jan 16, 2024
97ea3ff
rm altrep include
ben-schwen Jan 16, 2024
df4f160
change testnum
ben-schwen Jan 17, 2024
461a97a
Merge branch '1-15-99' into frev
ben-schwen Jan 17, 2024
025a3c5
remove altrep
ben-schwen Jan 17, 2024
48ded0b
remove duplicated tests
ben-schwen Jan 17, 2024
526a4ed
Merge branch 'master' into frev
MichaelChirico Feb 22, 2024
be50528
mostly fix botched merge
MichaelChirico Feb 22, 2024
f15ae3c
migrate NEWS item
MichaelChirico Feb 22, 2024
976d3ba
revert bad search+replace
MichaelChirico Feb 22, 2024
796828d
update NEWS wording
ben-schwen Mar 15, 2024
181957e
add small body
ben-schwen Mar 15, 2024
c751124
Merge branch 'master' into frev
ben-schwen Mar 15, 2024
d02df36
add additional test cases
ben-schwen Mar 18, 2024
2001816
rerun benchmarks single threaded
ben-schwen Mar 18, 2024
3cc839c
update doc
ben-schwen Mar 18, 2024
e27a6f3
remove unnecessary assignment
ben-schwen Mar 18, 2024
276cdeb
Merge branch 'master' into frev
ben-schwen Mar 18, 2024
a7de0f8
change to frev/setrev
ben-schwen Mar 19, 2024
300ea93
add symbol for setrev
ben-schwen Mar 19, 2024
17319c6
update docs
ben-schwen Mar 19, 2024
b4fe534
update NEWS
ben-schwen Mar 20, 2024
832324c
add details about attributes
ben-schwen Mar 20, 2024
7d6aea9
Merge branch 'master' into frev
ben-schwen Mar 20, 2024
ccd9ee6
drop attributes except names, class and levels
ben-schwen May 18, 2024
6b6da26
Merge branch 'master' into frev
ben-schwen May 18, 2024
b2cde13
update docs
ben-schwen May 18, 2024
cabddd2
Merge branch 'master' into frev
ben-schwen May 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,4 @@ S3method(format_list_item, default)

export(fdroplevels)
S3method(droplevels, data.table)
export(frev)
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@

7. `melt` returns an integer column for `variable` when `measure.vars` is a list of length=1, consistent with the documented behavior, [#5209](https://github.com/Rdatatable/data.table/issues/5209). Thanks to @tdhock for reporting and fixing. Any users who were relying on this behavior can change `measure.vars=list("col_name")` (output `variable` was column name, now is column index/integer) to `measure.vars="col_name"` (`variable` still is column name).

8. New `frev(x, copy=TRUE)` as a faster analogue to `base::rev()` for atomic vectors/lists, [#5885](https://github.com/Rdatatable/data.table/issues/5885). Twice as fast as `base::rev()` on large inputs, and faster with more threads. Thanks to Benjamin Schwendinger for suggesting and implementing.

## NOTES

1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.
Expand Down
3 changes: 3 additions & 0 deletions R/wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,6 @@ isRealReallyInt = function(x) .Call(CisRealReallyIntR, x)
isReallyReal = function(x) .Call(CisReallyReal, x)

coerceAs = function(x, as, copy=TRUE) .Call(CcoerceAs, x, as, copy)

frev = function(x) .Call(Cfrev, x, TRUE)
setrev = function(x) invisible(.Call(Cfrev, x, FALSE))
56 changes: 56 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) {
setcoalesce = data.table:::setcoalesce
setdiff_ = data.table:::setdiff_
setreordervec = data.table:::setreordervec
setrev = data.table:::setrev
shallow = data.table:::shallow # until exported
.shallow = data.table:::.shallow
split.data.table = data.table:::split.data.table
Expand Down Expand Up @@ -18566,3 +18567,58 @@ test(2261.04, setNumericRounding(2L), 1L)
# or not an object is an invisible copy or not, and prints it anyways.
test(2261.05, capture.output(setNumericRounding(2L)), character(0))
setNumericRounding(old)

# 5885 implement frev
d = c(NA, NaN, Inf, -Inf)
test(2262.00, frev(c(FALSE, NA)), c(NA, FALSE))
test(2262.01, frev(c(0L, NA)), c(NA, 0L))
test(2262.02, frev(d), c(-Inf, Inf, NaN, NA))
test(2262.03, frev(c(NA, 1, 0+2i)), c(0+2i, 1, NA))
test(2262.04, frev(as.raw(0:1)), as.raw(1:0))
test(2262.05, frev(NULL), NULL)
test(2262.06, frev(character(5)), character(5))
test(2262.07, frev(integer(0)), integer(0))
test(2262.08, frev(list(1, "a")), list("a", 1))
test(2262.09, setrev(c(0L, NA)), c(NA, 0L))
test(2262.10, setrev(d), c(-Inf, Inf, NaN, NA))
test(2262.11, setrev(c(NA, 1, 0+2i)), c(0+2i, 1, NA))
test(2262.12, setrev(as.raw(0:1)), as.raw(1:0))
test(2262.13, setrev(NULL), NULL)
test(2262.14, setrev(character(5)), character(5))
test(2262.15, setrev(integer(0)), integer(0))
test(2262.16, setrev(list(1, "a")), list("a", 1))
test(2262.17, frev(1:1e2), rev(1:1e2))
# copy arguments
x = 1:3
test(2262.21, {frev(x); x}, 1:3)
test(2262.22, {setrev(x); x}, 3:1)
test(2262.23, address(x) == address(setrev(x)))
test(2262.24, address(x) != address(frev(x)))
# do not alter on subsets
test(2262.25, {setrev(x[1:2]); x}, 1:3)
# levels
f = as.factor(letters)
test(2262.31, frev(f), rev(f))
test(2262.32, frev(as.IDate(1:10)), as.IDate(10:1))
test(2262.33, frev(as.IDate(1:10)), as.IDate(10:1))
# names
x = c(a=1L, b=2L, c=3L)
test(2262.41, frev(x), rev(x))
test(2262.42, setrev(x), x)
# attributes
x = structure(1:10, class = c("IDate", "Date"), att = 1L)
test(2262.51, attr(frev(x), "att"), attr(rev(x), "att"))
test(2262.52, class(frev(x)), class(rev(x)))
test(2262.53, attr(setrev(x), "att"), 1L)
test(2262.54, class(setrev(x)), c("IDate", "Date"))
x = structure(integer(0), att = 1L)
test(2262.55, attr(frev(x), "att"), attr(rev(x), "att"))
# errors
test(2262.61, frev(data.table()), error="should not be data.frame or data.table")
test(2262.62, frev(expression(1)), error="is not supported by frev")
test(2262.63, frev(matrix(1)), error="should not be matrix or array")
if (test_bit64) {
x = as.integer64(c(1, NA, 3))
test(2262.71, frev(x), rev(x))
test(2262.72, setrev(x), x)
}
32 changes: 32 additions & 0 deletions man/frev.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
\name{frev}
\alias{frev}
\alias{rev}
\title{Fast reverse}
\description{
Similar to \code{\link[base]{rev}} but \emph{much faster}.
}

\usage{
frev(x)
}
\arguments{
\item{x}{ An atomic \code{vector} or \code{list}. }
}

\details{
\code{frev} does not retain attributes (similar to \code{\link[base]{rev}}).
}

\value{
\code{frev} returns the input reversed.
}

\examples{
# on vectors
x = setNames(1:26, letters)
frev(x[1:10])

# list
frev(list(1, "a"))
}
\keyword{ data }
1 change: 1 addition & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ SEXP islockedR(SEXP x);
bool need2utf8(SEXP x);
SEXP coerceUtf8IfNeeded(SEXP x);
SEXP coerceAs(SEXP x, SEXP as, SEXP copyArg);
SEXP frev(SEXP x, SEXP copyArg);

// types.c
char *end(char *start);
Expand Down
1 change: 1 addition & 0 deletions src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ R_CallMethodDef callMethods[] = {
{"CconvertDate", (DL_FUNC)&convertDate, -1},
{"Cnotchin", (DL_FUNC)&notchin, -1},
{"Cwarn_matrix_column_r", (DL_FUNC)&warn_matrix_column_r, -1},
{"Cfrev", (DL_FUNC) &frev, -1},
{NULL, NULL, 0}
};

Expand Down
106 changes: 106 additions & 0 deletions src/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -435,3 +435,109 @@ SEXP startsWithAny(const SEXP x, const SEXP y, SEXP start) {
return ScalarLogical(false);
}

SEXP frev(SEXP x, SEXP copyArg) {
SEXP names, klass, levels;
if (INHERITS(x, char_dataframe))
error(_("'x' should not be data.frame or data.table."));
if (!isNull(getAttrib(x, R_DimSymbol)))
error(_("'x' should not be matrix or array"));
if (!IS_TRUE_OR_FALSE(copyArg))
error(_("%s must be TRUE or FALSE."), "copy"); // # nocov
bool copy = LOGICAL(copyArg)[0];
R_xlen_t n = xlength(x);
int nprotect = 0;
if (copy) {
x = PROTECT(duplicate(x));
nprotect++;
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
}
if (n==0) {
UNPROTECT(nprotect);
return x;
}
switch (TYPEOF(x)) {
case LGLSXP: case INTSXP: {
int *restrict xd = INTEGER(x);
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const int tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
case REALSXP: if (INHERITS(x, char_integer64)) {
int64_t *xd = (int64_t *)REAL(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const int64_t tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} else {
double *xd = REAL(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const double tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
case STRSXP: {
const SEXP *xd = SEXPPTR_RO(x);
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const SEXP tmp = xd[i];
SET_STRING_ELT(x, i, xd[k]);
SET_STRING_ELT(x, k, tmp);
}
} break;
case VECSXP: {
const SEXP *xd = SEXPPTR_RO(x);
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const SEXP tmp = xd[i];
SET_VECTOR_ELT(x, i, xd[k]);
SET_VECTOR_ELT(x, k, tmp);
}
jangorecki marked this conversation as resolved.
Show resolved Hide resolved
} break;
case CPLXSXP: {
Rcomplex *xd = COMPLEX(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const Rcomplex tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
case RAWSXP: {
Rbyte *xd = RAW(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const Rbyte tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
default:
error(_("Type '%s' is not supported by frev"), type2char(TYPEOF(x)));
}
names = PROTECT(getAttrib(x, R_NamesSymbol));
klass = PROTECT(getAttrib(x, R_ClassSymbol));
levels = PROTECT(getAttrib(x, R_LevelsSymbol));
nprotect += 3;
if (copy) {
SET_ATTRIB(x, R_NilValue);
setAttrib(x, R_NamesSymbol, names);
setAttrib(x, R_ClassSymbol, klass);
setAttrib(x, R_LevelsSymbol, levels);
}
if (!isNull(names)) {
frev(names, ScalarLogical(FALSE));
}
UNPROTECT(nprotect);
return x;
}
Loading