Skip to content

Rcpp Beginners Vignette / New Rcpp Introduction #524

@coatless

Description

@coatless

Per #506 , one of the objectives is to introduce a new RMarkdown based vignette to replace the current Rcpp-introduction (which moves to Rcpp-JSS-Paper). Before creating a PR with the desired changes, I would like to layout the contents of the guide and solicit feedback in advance. The aim of this vignette is to be able to get a user up and going within the first 1-2 pages, add bit more depth the next 3-4 pages, and then do something amazing for 5-7 pages.

The first few parts are mostly done as I had to prep them for class. The later sections are not.

Outline:

  • Introduction
    • What is Rcpp?
      • The Rcpp package enables the seamless transition between R and C or C++ code by providing a powerful, feature rich collection of functions and classes.
    • What are some uses for Rcpp?
      • Speeding up slow loop, subsetting, recursive operations
      • Accessing advanced data structures and algorithms afforded by STL
      • Extending R via external C or C++ libraries.
    • Prerequisite: Installing Rcpp
# RHEL/CentOS (add EPEL?)
# sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum update
sudo yum install gcc-c++ R

# Debian
sudo apt-get update
sudo apt-get install r-base r-base-dev r-cran-rcpp
  • Using Rcpp
    • Quick check to see if it works via: Rcpp::evalCpp("2 + 2")
      • Explain that evalCpp is meant for quick expressions.
    • Next a brief "Hello World" example via Rcpp::sourceCpp().
      • Use color boxes within the hello world example to highlight the different components of the R vs. C++ function such as the different type declarations.
      • Link #include <Rcpp.h> to the library() and using namespace to package prefix that is automatically loaded (show via base::sum()).
      • Show call in R and result.
      • Emphasize function is given by Rcpp by printing the function contents to reveal a pointer.
      • ISSUE: Upon new C++ file within RStudio, the creation a bunch of filler text exists... Maybe better to directly explain and then the traditional hello world?
    • Proceed to demoing an addition or multiplication example that accepts 2 double parameters
      • Use cppFunction() to demonstrate inline usage?
# function written in R
hello_from_r = function(){
cat("Hello R/C++ World!\n")
}
// Hello world example
#include <Rcpp.h>
using namespace Rcpp;   // Import Statements

// [[Rcpp::export]]
void hello_from_Rcpp() {
    Rcout << "Hello R/C++ World!\n";
}
# Function written in R
add_in_r = function(a, b){
   return(a + b) # maybe just use a + b.... 
}
// Addition function in C++
#include <Rcpp.h>
using namespace Rcpp; // Import Statements

// [[Rcpp::export]]
double add_in_rcpp(double a, double b) { // Declaration
     double out = a + b; // Add `a` to `b`
     return out; // Return output
}
  • Data Structures
    • Cover scalar data structures in a table format connect R's atomic vectors of size 1 to double, int, bool, and std::string
      • Emphasis data structures matter by changing input type of the add() function from double to int show the truncation effects.
    • Introduce Rcpp data structures (homogenous)
      • Connect double => NumericVector, int => IntegerVector, complex => ComplexVector, bool => LogicalVector, and std::string => CharacterVector (leave out StringVector?)
        • Provide implementation of the mean and variance operating on NumericVectors.
        • Call the mean() function from within variance() and emphasize need to embed connected functions together or use a package for a shared library.
      • Using the typeof(x) command, provide a link from R's matrix to NumericMatrix,IntegerMatrix, ComplexMatrix, LogicalVector, CharacterMatrix (leave out StringMatrix?)
        • Demo usage by writing a scale() matrix operation using previously defined mean() and variance().
    • Introduce Rcpp data structures (heterogeneous)
      • List and DataFrame
        • Show how to create via ::create() and Named("y") = x
        • Mention DataFrame has a 20 column limit and the way to get around it is by not specifying a size onList creation.
// Typing error example
#include <Rcpp.h>
using namespace Rcpp; // Import Statements

// [[Rcpp::export]]
int add_ints_rcpp(int a, int b) { // Declaration
return add_in_rcpp(a, b); // Call previous function
}
mean_r = function(x){
   sum_x = 0
   n = length(x)
   for(i in seq_along(x)){
        sum_x = sum_x + x[i]
   }

   return(sum_x/n)
}
// mean example
#include <Rcpp.h>
using namespace Rcpp; // Import Statements

// [[Rcpp::export]]
double mean_rcpp(NumericVector x){
    int n = x.size();      // Size of vector
    double sum = 0;   // Sum value         

    // For loop, note cpp index shift to 0
    for(int i = 0; i < n; i++){ 
        // Shorthand for sum = sum + x[i]
        sum += x[i];            
    }

    return sum/n;  // Obtain and return the Mean
}
var_r = function(x, bias = TRUE){

   sum_total = 0
   n = length(x)
   mu = mean_r(x)

   for(i in seq_along(x)){
        sum_total = sum_total + (x[i] - mu)^2
   }

   return(sum_total/(n-bias))
}
// variance example
#include <Rcpp.h>
using namespace Rcpp; // Import Statements

// [[Rcpp::export]]
double var_rcpp(NumericVector x, bool bias = true){
    // Calculate the mean using C++ function
    double mean = mean_rcpp(x);  
    double sum = 0;
    int n = x.size();

    for(int i = 0; i < n; i++){   
        sum += pow(x[i] - mean, 2.0); // Square
    }

    return sum/(n-bias); // Return variance 
}
  • Rcpp Features
    • Introduce Rcpp sugar (and vectorization)
      • Show subset operation that separates positive and negative numbers
    • Emphasize plugin system by taking from quickrefs.
    • Reference other Rcpp ecosystem packages.
#include <Rcpp.h>
using namespace Rcpp; // Import Statements

// [[Rcpp::export]]
List separate_numbers(NumericVector x) {

    LogicalVector subset = (x >= 0);
    NumericVector positives = x[subset];
    NumericVector negatives = x[!subset]; // Not

    List out = List::create(Named("pos") = positive,
                                        Named("neg") = negatives);

    return out;
}
// built-in C++11 plugin
// [[Rcpp::plugins(cpp11)]]

// built-in C++11 plugin for older g++ compiler
// [[Rcpp::plugins(cpp0x)]]

// built-in C++14 plugin for C++14 standard
// [[Rcpp::plugins(cpp14)]]

// built-in C++1y plugin for C++14 and C++17 standard under development
// [[Rcpp::plugins(cpp1y)]]

// built-in OpenMP++11 plugin
// [[Rcpp::plugins(openmp)]]
// Use the RcppArmadillo package
// Requires different header file from Rcpp.h
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// Use the RcppEigen package
// Requires different header file from Rcpp.h
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
  • Working with STL
    • Demo algorithms like find, accumulate, et cetera...
    • Advanced data structures like map, iterator, and so on.
      • Provide counting example using a map and compare it against an implementation of a CharacterVector over a known set of values.
  • Common Mistakes
    • Bad file names and/or path names.
    • Missing // [[Rcpp::export]]
    • Avoid using the .push_back() on all Rcpp data objects.
    • Data Typing Errors
      • double => int truncation
      • unsigned errors
      • Extracting a string from CharacterMatrix

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions