Skip to content

Sugar functions - rowSums, colSums, rowMeans, colMeans #549

@nathan-russell

Description

@nathan-russell

I put together sugar functions for rowSums, colSums, rowMeans, and colMeans (for use with matrices, not data.frames) which all seem to be working as expected. However, early on in the process I switched from using Rcpp::traits::is_na<> to the R macro ISNAN to check for NAs / NaNs in numeric matrices because I noticed a large difference in performance. As an example,

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::LogicalVector macro_na(Rcpp::NumericVector x) {
    R_xlen_t i = 0, sz = x.size();
    Rcpp::LogicalVector res(sz);

    for ( ; i < sz; i++) {
        res[i] = ISNAN(x[i]);
    }

    return res;
}

// [[Rcpp::export]]
Rcpp::LogicalVector traits_na(Rcpp::NumericVector x) {
    R_xlen_t i = 0, sz = x.size();
    Rcpp::LogicalVector res(sz);

    for ( ; i < sz; i++) {
        res[i] = Rcpp::traits::is_na<REALSXP>(x[i]);
    }

    return res;
}

/*** R

set.seed(123); x <- sample(c(rnorm(4), NA), 1e6, TRUE)

microbenchmark::microbenchmark(
    "Macro" = macro_na(x),
    "Traits" = traits_na(x),
    times = 200L
)
# Unit: milliseconds
#    expr       min        lq      mean    median        uq      max neval
#   Macro  6.075743  6.158497  7.524288  6.259146  8.587615 26.61165   200
#  Traits 13.508780 13.621339 16.530404 13.753101 15.818473 92.80678   200

*/ 

I believe the difference is due to the memcmp call here. @kevinushey It looks like you authored this, so can you comment on whether or not it is safe to be using ISNAN in place of traits::is_na? I read the explanatory comment,

motivation: on 32bit architectures, we only see 'LargeNA'
as defined ahead; on 64bit architectures, R defaults to
'SmallNA' for R_NaReal, but this can get promoted to 'LargeNA'
if a certain operation can create a 'signalling' NA, e.g. NA_real_+1

and it made sense to me, but despite the fact that I am on a 64-bit machine with unsigned long long support, I was not able to reproduce the situation described. If I'm only using ISNAN to check values on input objects, is there any risk in it giving erroneous results? If so, is there some way to test this directly?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions