Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable caching of sourceCpp compilations across R sessions #504

Merged
merged 17 commits into from Jul 13, 2016

Conversation

@jjallaire
Copy link
Member

commented Jul 11, 2016

This PR adds a cacheDir argument to sourceCpp to enable caching of shared libraries across R sessions. The argument defaults to tempdir() which emulates the existing default behavior of only caching compilations within the current R session.

The implementation changes the storage of the data structure that tracks previous compilations from a static C++ object to disk based storage (a serialized R list) within the specified cacheDir.

This feature is motivated by the desire to cache Rcpp chunks within knitr documents (particularly important for book authoring). See rstudio/bookdown#113 for additional discussion.

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 11, 2016

Regarding the Travis timeout, I am wrapping the test call in travis_wait which "works", but is annoying as all output is suppressed til done.

Do we know whether Travis is just slow? We used to get done in about 8 minutes ... so can we be sure it is not the changeset here?

@coatless

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2016

Apologies in advance for not contributing anything of substance next, but...

This is awesome!

@jjallaire

This comment has been minimized.

Copy link
Member Author

commented Jul 11, 2016

I think that it's the changeset. Here's why: the cache of compiled R
functions is now persisted to disk rather than stored in memory. Further,
when we lookup a function in the cache we read the entire "database" (an R
list) and then re-write the entire database after the compilation. This
introduces some O(N) behavior, which mostly doesn't matter for order of a
couple dozen C++ compilation units, but our unit tests introduce so many
C++ compilation units that we're hitting this in a material way.

Let me think about some possible ways out of this box....

On Mon, Jul 11, 2016 at 1:49 PM, Dirk Eddelbuettel <notifications@github.com

wrote:

Regarding the Travis timeout, I am wrapping the test call in travis-wait
https://github.com/eddelbuettel/rquantlib/blob/master/.travis.yml#L30-L31
which "works", but is annoying as all output is suppressed til done.

Do we know whether Travis is just slow? We used to get done in about 8
minutes ... so can we be sure it is not the changeset here?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXx8RhaRxbIMgDC6OgGXQc0Brxzfgfks5qUoJCgaJpZM4JJmwK
.

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 11, 2016

Good analysis. In which case we can surely just prefix with travis_wait which is documented here.

At least until we have a better fix.

jjallaire added 2 commits Jul 11, 2016
@jjallaire

This comment has been minimized.

Copy link
Member Author

commented Jul 11, 2016

Kevin just helped me get to the bottom of things (it was actually a bug
that was preventing 2nd and subsequent calls to sourceCpp from working
correctly, which led to failed tests and ultimately a crash).

The latest commit should get us back to green on Travis.

On Mon, Jul 11, 2016 at 2:11 PM, JJ Allaire jj.allaire@gmail.com wrote:

I think that it's the changeset. Here's why: the cache of compiled R
functions is now persisted to disk rather than stored in memory. Further,
when we lookup a function in the cache we read the entire "database" (an R
list) and then re-write the entire database after the compilation. This
introduces some O(N) behavior, which mostly doesn't matter for order of a
couple dozen C++ compilation units, but our unit tests introduce so many
C++ compilation units that we're hitting this in a material way.

Let me think about some possible ways out of this box....

On Mon, Jul 11, 2016 at 1:49 PM, Dirk Eddelbuettel <
notifications@github.com> wrote:

Regarding the Travis timeout, I am wrapping the test call in travis-wait
https://github.com/eddelbuettel/rquantlib/blob/master/.travis.yml#L30-L31
which "works", but is annoying as all output is suppressed til done.

Do we know whether Travis is just slow? We used to get done in about 8
minutes ... so can we be sure it is not the changeset here?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXx8RhaRxbIMgDC6OgGXQc0Brxzfgfks5qUoJCgaJpZM4JJmwK
.

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 11, 2016

Yay yay yay -- I was starting to fear you two would get turned off working on this forever and ever :)

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 12, 2016

I just merged #503 which you can probably bring in here too.

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 12, 2016

Running a rev.dep now with this PR #504 and the just merged #503.

@jjallaire

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2016

Note that you'll need to update from the Rcpp branch again to try this out.

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 12, 2016

Nice work -- using the Fibonacci example referenced by @yihui:

R> library(Rcpp)
R> sourceCpp("fib.cpp", cacheDir=".")
R> sourceCpp("fib.cpp", cacheDir=".")
R> sourceCpp("fib.cpp", cacheDir=".")
R> system("tree")
.
├── fib.cpp
└── sourceCpp-0.12.5.6-x86_64-pc-linux-gnu-70800
    ├── cache.rds
    ├── sourcecpp_209b23f927e0
    │   ├── fib.cpp
    │   ├── fib.cpp.R
    │   ├── fib.o
    │   └── sourceCpp_2.so
    └── token.rds

2 directories, 7 files
R> 
@yihui

This comment has been minimized.

Copy link

commented Jul 12, 2016

I quickly inspected cache.rds and fib.cpp.R, and saw absolute paths were used. I wonder if it is okay to just use relative paths, because absolute paths means you cannot move the cache directory to other places.

@jjallaire

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2016

I agree that it would be better to use relative paths here however the
absolute path assumption runs pretty deep through the sourceCpp
implementation so I think it would be too risky to try to unwind it right
now.

On Tue, Jul 12, 2016 at 2:22 PM, Yihui Xie notifications@github.com wrote:

I quickly inspected cache.rds and fib.cpp.R, and saw absolute paths were
used. I wonder if it is okay to just use relative paths, because absolute
paths means you cannot move the cache directory to other places.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXx_mFtO6NNvlIM3dU1OUTg-f0QQZJks5qU9t_gaJpZM4JJmwK
.

@dcdillon

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2016

What happens if you sourceCpp on two files with the same relative name in the same session? After a setwd for instance.

@jjallaire

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2016

That will work fine (the cache is keyed by absolute path not relative path).

@jjallaire

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2016

Note that this PR doesn't change anything about how the cache works other than it's now stored on disk rather than in memory. For cacheDir = tempdir() (the default) this will result in identical behavior to the current system. For cacheDir = ~/tmp/mycache it will result in a cache that persists across sessions. The motivation for this is interoperability with the knitr cache (see yihui/knitr#1239)

@kevinushey

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2016

What about allowing users to set a cache directory with an R option or something like that? E.g. rcpp.cache.dir?

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 12, 2016

I like that idea.

All it would take is to change

cacheDir = tempDir()

to

cacheDir = getOption("rcpp.cache.dir"), tempDir() )

The second argument (default=NULL) for getOption() is really useful.

@jjallaire

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2016

Okay, just added this.

On Tue, Jul 12, 2016 at 5:15 PM, Dirk Eddelbuettel <notifications@github.com

wrote:

I like that idea.

All it would take is to change

cacheDir = tempDir()

to

cacheDir = getOption("rcpp.cache.dir"), tempDir()

The second argument (default=NULL) for getOption() is really useful.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#504 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXxw_bAil3wGSR3cjqZPPIihH57Yvvks5qVAQBgaJpZM4JJmwK
.

@eddelbuettel

This comment has been minimized.

Copy link
Member

commented Jul 13, 2016

This too should be ready per the reverse depends check summarized here so I will merge this.

@eddelbuettel eddelbuettel merged commit cf4c162 into master Jul 13, 2016
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@yihui

This comment has been minimized.

Copy link

commented Jul 13, 2016

Cool. The knitr PR was also merged, so ````{Rcpp cache=TRUE}` is ready to be used with the dev versions of knitr and Rcpp now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.