Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second pass at Rcpp::algorithm #481

Merged
merged 8 commits into from
May 22, 2016
Merged

Second pass at Rcpp::algorithm #481

merged 8 commits into from
May 22, 2016

Conversation

dcdillon
Copy link
Contributor

So here's a second pass (with an existing fork behind it) at implementing some iterator-based algorithms for Rcpp. Implemented so far for real types (to be extended) are:

  • min/max
  • sum
  • prod
  • sqrt
  • exp
  • log

At this point I need some other eyes on this, specifically with regards to the unit tests. These need to be as complete as possible and any help is welcome.

@eddelbuettel
Copy link
Member

Will peruse.

@dcdillon
Copy link
Contributor Author

dcdillon commented May 19, 2016

@eddelbuettel asked me to send a motivating reason for all of this so here it is:

  • For one, all of sugar seems to be coded to work on Rcpp::Vector which is just a small subset of what could be useful (think Matrix::Row).
  • Second, all of Rcpp::sugar is implemented as a variety of specialized classes that either accidentally reimplement the important functionality of what they're trying to replace...or in the worst case give classes with sometimes inappropriately const'ed cast operators (I could get into the evil of cast operators here, but it's a holy war that probably isn't worth having)

See this example:

namespace Rcpp{
namespace sugar{

    template <int RTYPE, bool NA, typename T>
    class Min {
    public:
        typedef typename Rcpp::traits::storage_type<RTYPE>::type STORAGE ;

        Min( const T& obj_) : obj(obj_) {}

        operator STORAGE() const {
            STORAGE min, current ;
            min = obj[0] ;
            if( Rcpp::traits::is_na<RTYPE>( min ) ) return min ;

            R_xlen_t n = obj.size() ;
            for( R_xlen_t i=1; i<n; i++){
                current = obj[i] ;
                if( Rcpp::traits::is_na<RTYPE>( current ) ) return current;
                if( current < min ) min = current ;
            }
            return min ;
        }

        const T& obj ;
    } ;

    // version for NA = false
    template <int RTYPE, typename T>
    class Min<RTYPE,false,T> {
    public:
        typedef typename Rcpp::traits::storage_type<RTYPE>::type STORAGE ;

        Min( const T& obj_) : obj(obj_) {}

        operator STORAGE() const {
            STORAGE min, current ;
            min = obj[0] ;

            R_xlen_t n = obj.size() ;
            for( R_xlen_t i=1; i<n; i++){
                current = obj[i] ;
                if( current < min ) min = current ;
            }
            return min ;
        }

        const T& obj ;
    } ;


} // sugar


template <int RTYPE, bool NA, typename T>
sugar::Min<RTYPE,NA,T> min( const VectorBase<RTYPE,NA,T>& x){
    return sugar::Min<RTYPE,NA,T>(x.get_ref()) ;
}

} // Rcpp

First off we have here the Rcpp::min function returning a sugar::Min< RTYPE, NA, T > type (which as it turns out happens to have a cast operator to Rcpp::traits::storage_type<RTYPE>). Fundamentally this is just weird. Rcpp::min should be a function that simply returns a Rcpp::traits::storage_type<RTYPE> for the sake of the sanity of everyone involved.

Additionally, I just recently had to add the const to the operator STORAGE() functions here to work with otherwise innocuous looking templates.

Finally, the Rcpp::sugar code would become simpler and potentially more general. Two options are indicated below (mind you this isn't tested code but just me coding on the spot without a compiler).

namespace Rcpp
{

template< int RTYPE, typename T >
typename traits::storage_type< RTYPE >::type min(const Vector< RTYPE, true, T> &v) {
    return Rcpp::algorithm::min(v.begin(), v.end());
}

template< int RTYPE, typename T >
typename traits::storage_type< RTYPE >::type min(const Vector< RTYPE, false, T> &v) {
    return Rcpp::algorithm::min_nona(v.begin(), v.end());
}

} // namespace Rcpp

This also gives the freedom to either the user (or the implementers) to do something equivalent to:

arma::vec minByRow(const NumericMatrix &m) {
    arma::colvec retval(m.nrow());

    for (R_xlen_t i = 0; i < m.nrow(); ++i) {
        retval[i] = algorithm::min(m.Row(i).begin(), m.Row(i).end());
    }

    return retval;
}

These could either be easy-to-maintain extensions of Rcpp::sugar or just added functionality for the user to expose.

Either way, I think the STL got it right here and we should use a similar pattern to what has been presented to us by the language for executing "algorithms" on our data structures.

@dcdillon
Copy link
Contributor Author

I decided to leave this point out because this is an Rcpp pull request, but @eddelbuettel suggested I add it.

This can also be used to apply R "algorithms" to any iterator based structure. Specifically RcppParallel::RVector and RcppParallel::RMatrix come to mind.

@kevinushey
Copy link
Contributor

FWIW, I think there were two main motivations behind the Rcpp sugar work:

  1. The Rcpp code a user will write is often nearly identical to the code one might write when working in R,
  2. The use of expression templates allows operations to compose efficiently; e.g. there will be no temporaries when the compiler is done optimizing x + y + z + w.

Overall, this makes Rcpp a lot more accessible to R developers new to C++, which I think is something we'd like to keep Rcpp tailored towards when possible.

That said, I do think that having iterator-based algorithm methods will be useful, but these are somewhat more 'expert-friendly' (coming from the perspective of an R user).

… the primitive arithmetic types or to std::string and applied this to the current algorithms.
@dcdillon
Copy link
Contributor Author

I buy the expression templates, but I am suspicious of whether it's been done correctly. Just looking at the cumsum implementation. It pretends to be an expression template, but really isn't.

So, I guess I will amend my earlier comments to suggest that things that aren't really expression templates should not be implemented to look like they are. They should just be implemented as the functions that they are. This applies mostly to things in include/Rcpp/sugar/functions and not to include/Rcpp/sugar/operators.

@kevinushey
Copy link
Contributor

Sounds good!

@eddelbuettel
Copy link
Member

Now that the conference week is behind me, some more time for Rcpp.

This PR is purely additive with new code, does not alter any existing interfaces, and adds tests. No reason not to like it. In it goes then.

@eddelbuettel eddelbuettel merged commit 90bc32c into RcppCore:master May 22, 2016
eddelbuettel added a commit that referenced this pull request May 23, 2016
rolled Version and Date
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants