Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal cbind sugar function #407

Closed
nathan-russell opened this issue Dec 7, 2015 · 15 comments
Closed

Minimal cbind sugar function #407

nathan-russell opened this issue Dec 7, 2015 · 15 comments

Comments

@nathan-russell
Copy link
Contributor

I've been working on an implementation for a cbind sugar function, but since my current design lacks some of the features of base::cbind I'm not sure it's on par with the rest of the Rcpp::sugar functions, and was hoping to get some feedback and / or suggestions for improving it. The source can be viewed in this gist (I didn't want to dump 300 lines of code into this ticket), but to summarize the pros and cons:

Good:
  • Handles Matrix/Matrix, Matrix/Vector, Matrix/scalar, Vector/Vector, Vector/Matrix, Vector/scalar, scalar/Matrix, scalar/Vector argument pairs
  • Does appropriate checking for equal number of rows (or length(s), in the case of Vectors)
Shortcomings:
  • Does not handle arbitrary number of arguments in a single function call, as in base::cbind's ... argument (I'm not sure how to address this without dipping into C++11, or if it's even possible in C++98) , although nested calls seem to work correctly (e.g. cbind(cbind(cbind(arg1, arg2), arg3), arg4))
  • The dimnames attribute is not preserved
  • Does not handle DataFrames

From the testing I've done so far, the supported features above correctly mimic the behavior of base::cbind, but I'm not sure what the general consensus is on including less-than-comprehensive sugar functions in the package. Is it worthwhile to have a version of cbind with limited functionality? And of course, I'd be more than happy to look into any suggestions for improving this.

@eddelbuettel
Copy link
Member

Nice! And as a real quick reply: could we not 'trick it' by macro expansion to provide 2, 3, 4, 5, .... arguments? I can live without dimnames, and support for data.frame objects...

@kevinushey
Copy link
Contributor

I get a compile error + a segfault in the following example:

// [[Rcpp::export]]
NumericMatrix my_cbind(NumericVector x, NumericVector y) {
  return cbind(x, y);
}

/*** R
my_cbind(1, 2)
*/
> Rcpp::sourceCpp("Untitled.cpp")
Untitled.cpp:119:11: warning: binding reference member 'lhs' to a temporary value [-Wdangling-field]
    : lhs(lhs_), rhs(rhs_)
          ^~~~
Untitled.cpp:222:12: note: in instantiation of member function 'Cbind<14, Rcpp::Vector<14,
      PreserveStorage>, Rcpp::Vector<14, PreserveStorage> >::Cbind' requested here
    return Cbind< RTYPE, Rcpp::Vector<RTYPE>, Rcpp::Vector<RTYPE> >(lhs, rhs);
           ^
Untitled.cpp:289:10: note: in instantiation of function template specialization 'cbind<14>' requested
      here
  return cbind(x, y);
         ^
Untitled.cpp:114:23: note: reference member declared here
      const LHS_TYPE& lhs;
                      ^
Untitled.cpp:119:22: warning: binding reference member 'rhs' to a temporary value [-Wdangling-field]
    : lhs(lhs_), rhs(rhs_)
                     ^~~~
Untitled.cpp:115:21: note: reference member declared here
    const RHS_TYPE& rhs;
                    ^
2 warnings generated.

> my_cbind(1, 2)

 *** caught segfault ***
address 0x0, cause 'unknown'

Traceback:
 1: .Primitive(".Call")(<pointer: 0x111219530>, x, y)
 2: my_cbind(1, 2)
 3: eval(expr, envir, enclos)
 4: eval(ei, envir)
 5: withVisible(eval(ei, envir))
 6: source(file = srcConn, echo = TRUE)
 7: Rcpp::sourceCpp("Untitle.cpp")
aborting ...
Segmentation fault: 11

@nathan-russell
Copy link
Contributor Author

Yikes - I guess g++ didn't catch that for some reason. I'll take this back to the drawing board for now.

@nathan-russell
Copy link
Contributor Author

Compiling with clang++, I ran into the above issue. Removing the const and reference qualifiers from the offending Cbind members seems to have solved that problem. @kevinushey How does this version fare on your machine?

@nathan-russell
Copy link
Contributor Author

Regarding the macro suggestion, I was able to work something out with __VA_ARGS__ (presumably this is what you had in mind Dirk?). A sample header file can be found here, generated with this R script. A quick test for 20 arguments seemed to work correctly on clang and gcc. If this seems like it may be a plausible route, I will look into putting together some much more rigorous tests, as well as defensive code for macro portion.

@eddelbuettel
Copy link
Member

That comes close. We used to keep the scripts in sync with the generated code, but that fell to the side at some point. I'd rather keep it sync'ed but I wasn't the only one committing at the time ...

Scripts are in this repo and you should get an idea of which created what.

By the way, I quite like this. Thanks for doing it.

@kevinushey
Copy link
Contributor

One of the really unfortunate issues with __VA_ARGS__ (or, variadic macros in general) is that, strictly speaking, it is not part of the C++98 standard, which Rcpp must conform to (given the CRAN requirements). Even though, in practice, every compiler implements it as it is part of the C99 standard ...

@eddelbuettel
Copy link
Member

Doh. I forgot about that--the reason we do not use __VA_ARGS__ in the other scripted / macro-created interfaces.

@nathan-russell
Copy link
Contributor Author

That is unfortunate, but understandable. I'll see what else I can come up with.

@nathan-russell
Copy link
Contributor Author

I had some free time this past week and was able to take another look at this. I worked out a different approach to allowing for a variable number of arguments by overloading operator, on an expression template class + a few other underhanded tricks, instead of relying on __VA_ARGS__ (or any other C99 / C++11 features). An initial proof-of-concept is available here.

By and large, this implementation seems to be working correctly. I still have to iron out a couple of known issues -- mainly the current interface not accepting Rcpp::Logical* and Rcpp::Character* types -- plus whatever is yielded by more rigorous testing, but can hopefully have something more polished put together (with unit tests) within a couple of weeks, assuming this is still desired.

@eddelbuettel
Copy link
Member

💯
I like that plan. A lot.

nathan-russell added a commit to nathan-russell/Rcpp that referenced this issue Mar 12, 2016
@nathan-russell
Copy link
Contributor Author

Regarding the commit referenced above, I was able fix most of the issues noted in my previous comment: cbind handles vectors, matrices, and scalars correctly for each of the Numeric, Integer, Complex, and Logical types, and handles vectors and matrices correctly for Character / STRSXP. However, I'm having a hell of a time trying to get, e.g., the following to compile:

// devtools::install_github("nathan-russell/Rcpp")
#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::CharacterMatrix err1(Rcpp::CharacterMatrix x, std::string y) {
    return Rcpp::cbind(x, y);
}

// [[Rcpp::export]]
Rcpp::CharacterMatrix err2(Rcpp::CharacterMatrix x, Rcpp::String y) {
    return Rcpp::cbind(x, y);
}
// devtools::install_github("RcppCore/Rcpp")

I suspect this will take a bit longer to work out given the nature of R strings.

@eddelbuettel
Copy link
Member

Strings are officially 'difficult' with encoding issues, char vs wide char, and what not. I think the dplyr team learned the meaning of the blues over that too.

More seriously, don't let the perfect be the enemy of the (more than just reasonably) good.

If you have something with numbers that is clean I'd take it. Comments, anyone?

@nathan-russell
Copy link
Contributor Author

Okay that sounds good to me. I'll submit a PR shortly for easier code review.

@eddelbuettel
Copy link
Member

Thanks for staying on top of this and closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants