Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable caching of sourceCpp compilations across R sessions #504

Merged
merged 17 commits into from
Jul 13, 2016

Conversation

jjallaire
Copy link
Member

This PR adds a cacheDir argument to sourceCpp to enable caching of shared libraries across R sessions. The argument defaults to tempdir() which emulates the existing default behavior of only caching compilations within the current R session.

The implementation changes the storage of the data structure that tracks previous compilations from a static C++ object to disk based storage (a serialized R list) within the specified cacheDir.

This feature is motivated by the desire to cache Rcpp chunks within knitr documents (particularly important for book authoring). See rstudio/bookdown#113 for additional discussion.

@eddelbuettel
Copy link
Member

eddelbuettel commented Jul 11, 2016

Regarding the Travis timeout, I am wrapping the test call in travis_wait which "works", but is annoying as all output is suppressed til done.

Do we know whether Travis is just slow? We used to get done in about 8 minutes ... so can we be sure it is not the changeset here?

@coatless
Copy link
Contributor

Apologies in advance for not contributing anything of substance next, but...

This is awesome!

@jjallaire
Copy link
Member Author

I think that it's the changeset. Here's why: the cache of compiled R
functions is now persisted to disk rather than stored in memory. Further,
when we lookup a function in the cache we read the entire "database" (an R
list) and then re-write the entire database after the compilation. This
introduces some O(N) behavior, which mostly doesn't matter for order of a
couple dozen C++ compilation units, but our unit tests introduce so many
C++ compilation units that we're hitting this in a material way.

Let me think about some possible ways out of this box....

On Mon, Jul 11, 2016 at 1:49 PM, Dirk Eddelbuettel <notifications@github.com

wrote:

Regarding the Travis timeout, I am wrapping the test call in travis-wait
https://github.com/eddelbuettel/rquantlib/blob/master/.travis.yml#L30-L31
which "works", but is annoying as all output is suppressed til done.

Do we know whether Travis is just slow? We used to get done in about 8
minutes ... so can we be sure it is not the changeset here?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXx8RhaRxbIMgDC6OgGXQc0Brxzfgfks5qUoJCgaJpZM4JJmwK
.

@eddelbuettel
Copy link
Member

eddelbuettel commented Jul 11, 2016

Good analysis. In which case we can surely just prefix with travis_wait which is documented here.

At least until we have a better fix.

@jjallaire
Copy link
Member Author

Kevin just helped me get to the bottom of things (it was actually a bug
that was preventing 2nd and subsequent calls to sourceCpp from working
correctly, which led to failed tests and ultimately a crash).

The latest commit should get us back to green on Travis.

On Mon, Jul 11, 2016 at 2:11 PM, JJ Allaire jj.allaire@gmail.com wrote:

I think that it's the changeset. Here's why: the cache of compiled R
functions is now persisted to disk rather than stored in memory. Further,
when we lookup a function in the cache we read the entire "database" (an R
list) and then re-write the entire database after the compilation. This
introduces some O(N) behavior, which mostly doesn't matter for order of a
couple dozen C++ compilation units, but our unit tests introduce so many
C++ compilation units that we're hitting this in a material way.

Let me think about some possible ways out of this box....

On Mon, Jul 11, 2016 at 1:49 PM, Dirk Eddelbuettel <
notifications@github.com> wrote:

Regarding the Travis timeout, I am wrapping the test call in travis-wait
https://github.com/eddelbuettel/rquantlib/blob/master/.travis.yml#L30-L31
which "works", but is annoying as all output is suppressed til done.

Do we know whether Travis is just slow? We used to get done in about 8
minutes ... so can we be sure it is not the changeset here?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXx8RhaRxbIMgDC6OgGXQc0Brxzfgfks5qUoJCgaJpZM4JJmwK
.

@eddelbuettel
Copy link
Member

Yay yay yay -- I was starting to fear you two would get turned off working on this forever and ever :)

@eddelbuettel
Copy link
Member

I just merged #503 which you can probably bring in here too.

@eddelbuettel
Copy link
Member

Running a rev.dep now with this PR #504 and the just merged #503.

@jjallaire
Copy link
Member Author

Note that you'll need to update from the Rcpp branch again to try this out.

@eddelbuettel
Copy link
Member

Nice work -- using the Fibonacci example referenced by @yihui:

R> library(Rcpp)
R> sourceCpp("fib.cpp", cacheDir=".")
R> sourceCpp("fib.cpp", cacheDir=".")
R> sourceCpp("fib.cpp", cacheDir=".")
R> system("tree")
.
├── fib.cpp
└── sourceCpp-0.12.5.6-x86_64-pc-linux-gnu-70800
    ├── cache.rds
    ├── sourcecpp_209b23f927e0
    │   ├── fib.cpp
    │   ├── fib.cpp.R
    │   ├── fib.o
    │   └── sourceCpp_2.so
    └── token.rds

2 directories, 7 files
R> 

@yihui
Copy link

yihui commented Jul 12, 2016

I quickly inspected cache.rds and fib.cpp.R, and saw absolute paths were used. I wonder if it is okay to just use relative paths, because absolute paths means you cannot move the cache directory to other places.

@jjallaire
Copy link
Member Author

I agree that it would be better to use relative paths here however the
absolute path assumption runs pretty deep through the sourceCpp
implementation so I think it would be too risky to try to unwind it right
now.

On Tue, Jul 12, 2016 at 2:22 PM, Yihui Xie notifications@github.com wrote:

I quickly inspected cache.rds and fib.cpp.R, and saw absolute paths were
used. I wonder if it is okay to just use relative paths, because absolute
paths means you cannot move the cache directory to other places.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXx_mFtO6NNvlIM3dU1OUTg-f0QQZJks5qU9t_gaJpZM4JJmwK
.

@dcdillon
Copy link
Contributor

What happens if you sourceCpp on two files with the same relative name in the same session? After a setwd for instance.

@jjallaire
Copy link
Member Author

That will work fine (the cache is keyed by absolute path not relative path).

@jjallaire
Copy link
Member Author

Note that this PR doesn't change anything about how the cache works other than it's now stored on disk rather than in memory. For cacheDir = tempdir() (the default) this will result in identical behavior to the current system. For cacheDir = ~/tmp/mycache it will result in a cache that persists across sessions. The motivation for this is interoperability with the knitr cache (see yihui/knitr#1239)

@kevinushey
Copy link
Contributor

What about allowing users to set a cache directory with an R option or something like that? E.g. rcpp.cache.dir?

@eddelbuettel
Copy link
Member

eddelbuettel commented Jul 12, 2016

I like that idea.

All it would take is to change

cacheDir = tempDir()

to

cacheDir = getOption("rcpp.cache.dir"), tempDir() )

The second argument (default=NULL) for getOption() is really useful.

@jjallaire
Copy link
Member Author

Okay, just added this.

On Tue, Jul 12, 2016 at 5:15 PM, Dirk Eddelbuettel <notifications@github.com

wrote:

I like that idea.

All it would take is to change

cacheDir = tempDir()

to

cacheDir = getOption("rcpp.cache.dir"), tempDir()

The second argument (default=NULL) for getOption() is really useful.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXxw_bAil3wGSR3cjqZPPIihH57Yvvks5qVAQBgaJpZM4JJmwK
.

@eddelbuettel
Copy link
Member

This too should be ready per the reverse depends check summarized here so I will merge this.

@eddelbuettel eddelbuettel merged commit cf4c162 into master Jul 13, 2016
@yihui
Copy link

yihui commented Jul 13, 2016

Cool. The knitr PR was also merged, so ````{Rcpp cache=TRUE}` is ready to be used with the dev versions of knitr and Rcpp now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants