Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.2.4
Bug fix
- typo in date
version 1.2.3
Bug fix
- minor url fix
version 1.2.2
New features
- removed several uses of
rgeos
- moved
paddedFloatToChar
to reproducible from SpaDES.core. - increased code coverage
- Pull in legacy
%>%
code frommagrittr
to allow the cached alternative,%C%
. With newmagrittr
pipe now in compiled source code, more of the legacy code is required here.
Bug fixes
- several minor
version 1.2.1
New features
- harmonized message colours that are use adjustable via options:
reproducible.messageColourPrepInputs
for allprepInputs
functions;reproducible.messageColourCache
for allCache
functions; andreproducible.messageColourQuestion
for questions that require user input. Defaults arecyan
,blue
andgreen
respectively. These are user-visible colour changes. - improved messaging for
Cache
cases where afile.link
is used instead of saving. - with improved messaging, now
options(reproducible.verbose = 0)
will turn off almost all messaging. postProcess
and family now havefilename2 = NULL
as the default, so not saved to disk. This is a change.verbose
is now an argument throughout, whose default isgetOption(reproducible.verbose)
, which is set by default to1
. Thus, individual function calls can be more or less verbose, or the whole session via option.
Bug fixes
RasterStack
objects were not correctly saved to disk under some conditions inpostProcess
- fixed- several minor
version 1.2.0
New features
postProcess
now uses a simpler single call togdalwarp
, if available, forRasterLayer
class to accomplishcropInputs
,projectInputs
,maskInputs
, andwriteOutputs
all at once. This should be faster, simpler and, perhaps, more stable. It will only be invoked if theRasterLayer
is too large to fit into RAM. To force it to be used the user must setuseGDAL = "force"
inprepInputs
orpostProcess
or globally withoptions("reproducible.useGDAL" = "force")
postProcess
when using the newgdalwarp
, has better persistence of colour table, and NA values as these are kept with better reliability- concurrent
Cache
now works as expected (e.g., with parallel processing, it will avoid collisions) with SQLite thanks to suggestion here: https://stackoverflow.com/a/44445010 - updated digesting of
Raster
class objects to account for more of the metadata (including the colortable). This will change the digest value of allRaster
layers, causing re-run ofCache
- removed
Require
,pkgDep
,trimVersionNumber
,normPath
,checkPath
that were moved toRequire
package. For backwards compatibility, these are imported and reexported - address permanently or temporarily new changes in GDAL>3 and PROJ>6 in the spatial packages.
- new function
file.move
used to rename/copy files across disks (a situation wherefile.rename
would fail) - all
DBI
type functions now have defaultcachePath
ofgetOption("reproducible.cachePath")
Cache(prepInputs, ...
on a file-backedRaster*
class object now gives the non-Cache repository folder as thefilename(returnRaster)
. Previously, the return object would contain the cache repository as the folder for the file-backedRaster*
Dependency changes
- net reduction in number of packages that are imported from by 14. Removed completely:
backports
,memoise
,quickPlot
,R.utils
,remotes
,tools
, andversions
; moved to Suggests:fastdigest
,gdalUtils
,googledrive
,httr
,qs
,rgdal
,sf
,testthat
; added:Require
. Now there are 12 non-base packages listed in Imports. This is down from 31 prior to Ver 1.0.0.
bug fixes
- fix over-wide tables in PDF manual (#144)
- use
file.link
notfile.symlink
forsaveToCache
. This would have resulted in C Stack overflow errors due to missing original file in thefile.symlink
- use system call to
unzip
when extracting large (>= 4GB) files (#145, @tati-micheletti) - several minor including
projectInputs
when converting to longlat projections,setMinMax
forgdalwarp
results Filenames
now consistently returns a character vector (#149)- improvements to file-backed Raster caching to accommodate a few more edge cases
version 1.1.1
New features
- none
Dependency changes
- none
bug fixes
- fix CRAN test failure when
file.link
does not succeed.
version 1.1.0
New features
- begin to accommodate changes in GDAL/PROJ and associated updates to other spatial packages.
More updates are expected as other spatial packages (namely
raster
) are updated. - can now change
options('reproducible.cacheSaveFormat')
on the fly; cache will look for the file bycacheId
and write it usingoptions('reproducible.cacheSaveFormat')
. If it is in another format, Cache will load it and resave it with the new format. Experimental still. - new
Copy
methods forrefClass
objects,SQLite
and movedenvironment
method intoANY
as it would be dispatched for unknown classes that inherit fromenvironment
, of which there are many and this should be intercepted Require
can now handle minimum version numbers, e.g.,Require("bit (>=1.1-15.2)")
; this can be worked into downstream tools. Still experimental.- Cache will do
file.link
orfile.symlink
if an existing Cache entry with identical output exists and it is large (currently1e6
bytes); this will save disk space. - Cache database now has tags for elapsed time of "digest", "original call", and "subsequent recovery from file",
elapsedTimeDigest
,elapsedTimeFirstRun
, andelapsedTimeLoad
, respectively. - Better management of temporary files in package and tests, e.g., during downloading (
preProcess
). Includes 2 new functions,tempdir2
andtempfile2
for use withreproducible
package - New option:
reproducible.tempPath
, which is used for the new control of temporary files. Defaults tofile.path(tempdir(), "reproducible")
. This feature was requested to help manage large amounts of temporary objects that were not being easily and automatically cleaned - Copying or moving of Cache directories now works automatically if using default
drv
andconn
; user may need to manually callmovedCache
if cache is not responding correctly. File-backed Rasters are automatically updated with new paths. - Cache now treats file-backed Rasters as though they had a relative path instead of their absolute path.
This means that Cache directories can be copied from one location to another and the file-backed
Raster*
will have their filenames updated on the fly during a Cache recovery. User doesn't need to do anything. postProcess
now will perform simple tests and skipcropInputs
andprojectInputs
with a message if it can, rather than usingCache
to "skip". This should speed uppostProcess
in many cases.- messaging with
Cache
has change. Now,cacheId
is shown in all cases, making it easier to identify specific items in the cache. - Automatically cleanup temporary (intermediate) raster files (with #110).
Dependency changes
- none
bug fixes
Copy
only creates a temporary directory for filebacked rasters; previously anyCopy
command was creating a temporary directory, regardless of whether it was neededcropInputs.spatialObjects
had a bug when object was a large non-Raster class.cropInputs
may have failed due to "self intersection" error when x was aSpatialPolygons*
object; now catches error, runsfixErrors
and retriescrop
. Great reprex by @tati-micheletti. Fixed in commit89e652ef111af7de91a17a613c66312c1b848847
.Filenames
bugfix related toRasterBrick
prepInputs
does a better job of keeping all temporary files in a temporary folder; and cleans up after itself better.prepInputs
now will not show message that it is loading object into R iffun = NULL
(#135).
version 1.0.0
New features
- This version is not backwards-compatible out of the box. To maintain backwards compatibility, set:
options("reproducible.useDBI" = FALSE)
- A new backend was introduced that uses
DBI
package directly, withoutarchivist
. This has much improved speed. - New option:
options("reproducible.cacheSaveFormat")
. This can be eitherrds
(default) orqs
. All cached objects will be saved with this format. Previously it wasrda
. - Cache objects can now be saved with with
qs::qsave
. In many cases, this has much improved speed and file sizes compared tords
; however, testing across a wide range of conditions will occur before it becomes the default. - Changed default behaviour for memoising
...
becauseCache
is now much faster, the default is to turn memoising off, viaoptions("reproducible.useMemoise" = FALSE)
. In cases of large objects, memoising should still be faster, so user can still activate it, setting the option toTRUE
. - Much better SQLite database handling for concurrent write attempts. Tested with dozens of write attempts per second by 3 cores with abundant locked database occurrences.
postProcess
arguseGDAL
can now take"force"
as the default behaviour is to not use GDAL if the problem can fit into RAM andsf
orraster
tools will be faster thanGDAL
toolsuseCloud
argument inCache
and family has slightly modified functionality (see ?Cache new sectionuseCloud
) and now has more tests including edge cases, such asuseCloud = TRUE, useCache = 'overwrite'
. The cloud version now will also follow the"overwrite"
command.
Dependency changes
- deprecating
archivist
; moved to Suggests. - removed imports for
bitops
,dplyr
,fasterize
,flock
,git2r
,lubridate
,RcppArmadillo
,RCurl
andtidyselect
. Some of these went to Suggests.
bug fixes
postProcess
calls that use GDAL made more robust (including #93).- Several minor, edge cases were detected and fixed.
version 0.2.11
Dependency changes
- remove
dplyr
as a direct dependency. It is still an indirect dependency throughDiagrammeR
New features
- new option:
reproducible.showSimilarDepth
allows for a deeper assessment of nested lists for differences between the nearest cached object and the present object. This greater depth may allow more fine tuned understanding of why an object is not correctly caching - for downloading large files from GoogleDrive (currently only implemented), if user has set
options("reproducible.futurePlan")
to something other thanFALSE
, then it will show download progress if the file is "large".
bug fixes
- Several minor, edge cases were detected and fixed.
version 0.2.10
Dependency changes
- made compatible with
googledrive
v 1.0.0 (#119)
New features
pkgDep2
, a new convenience function to get the dependencies of the "first order" dependencies.useCache
, used in many functions (inclCache
,postProcess
) can now be numeric, a qualitative indicator of "how deep" nestedCache
calls should setuseCache = TRUE
-- implemented as 1 or 2 inpostProcess
currently. See?Cache
bug fixes
pkgDep
was becoming unreliable for unknown reasons. It has been reimplemented, much faster, without memoising. The speed gains should be immediately noticeable (6 second to 0.1 second forpkgDep("reproducible")
)- improved
retry
to use exponential backoff when attempting to access online resources (#121)
version 0.2.9
New features
- Cache has 2 new arguments,
useCloud
andcloudFolderID
. This is a new approach to cloud caching. It has been tested with file backedRasterLayer
,RasterStack
andRasterBrick
and all normal R objects. It will not work for any other class of disk-backed files, e.g.,ff
orbigmatrix
, nor is it likely to work for R6 class objects. - Slowly deprecating cloudCache and family of functions in favour of a new approach using arguments to
Cache
, i.e.,useCache
andcloudFolderID
downloadData
from Google Drive now protects against HTTP2 error by capturing error and retrying. This is a curl issue for interrupted connections.
Bug fixes
- fixes for
rcnst
errors on R-devel, tested usingdevtools::check(env_vars = list("R_COMPILE_PKGS"=1, "R_JIT_STRATEGY"=4, "R_CHECK_CONSTANTS"=5))
- other minor improvements, including fixes for #115
version 0.2.8
New features
- new functions for accessing specific items from the
cacheRepo
:getArtifact
,getCacheId
,getUserTags
retry
, a new function, wrapstry
with an explicit attempt to retry the same code upon error. Useful for flaky functions, such asgoogldrive::drive_download
which sometimes fails due tocurl
HTTP2 error.- removed all
Rcpp
functionality as the functions were no longer faster than their R base alternatives.
Bug fixes
prepInputs
was not correctly passinguseCache
cropInputs
was reprojecting extent of y as a time saving approach, but this was incorrect ifstudyArea
is aSpatialPolygon
that is not close to filling the extent. It now reprojectsstudyArea
directly which will be slower, but correct. (#93)- other minor improvements
version 0.2.7
New features
CHECKSUMS.txt
should now be ordered consistently across operating systems (note:base::order
will not succeed in doing this --> now using.orderDotsUnderscoreFirst
)cloudSyncCache
has a new argument:cacheIds
. Now user can control entries bycacheId
, so can delete/upload individual objects bycacheId
- Experimental support within the
postProcess
family forsf
class objects
bug fixes
- mostly minor
cloudCache
bugfixes for more cases
version 0.2.6
Dependency changes
- remove
tibble
from Imports as it's no longer being used
New features
- remove
%>%
pipe that was long ago deprecated. User should use%C%
if they want a pipe that is Cache-aware. See examples. - Full rewrite of all
options
descriptions now inreproducible
, see?reproducibleOptions
- now
cacheRepo
andoptions("reproducible.cachePath")
can take a vector of paths. Similar to how .libPaths() works for libraries,Cache
will search first in the first entry in thecacheRepo
, then the second etc. until it finds an entry. It will only write to the first entry. - new value for the option:
options("reproducible.useCache" = "devMode")
. The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. IndevMode
, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It differs from the normal Cache if the Cache call does not find a copy in thecacheRepo
, but it does find an entry that matches based onuserTags
. In this case, it will delete the old entry in thecacheRepo
(identified based on matchinguserTags
), then continue with normalCache
. For this to work correctly,userTags
must be unique for each function call. This should be used with caution as it is still experimental. - change to how hashes are calculated. This will cause existing caches to not work correctly. To allow a user to keep old behaviour (during a transition period), the "old" algorithm can be used, with
options("reproducible.useNewDigestAlgorithm" = FALSE)
. There is a message of this change on package load. - add experimental
cloud*
functions, especiallycloudCache
which allows sharing of Cache among collaborators. Currently only works withgoogledrive
- updated
assessDataType
to consolidateassessDataTypeGDAL
andassessDataType
into single function (#71, @ianmseddy) cc
: new function -- a shortcut for some commonly used options forclearCache()
- added experimental capacity for
prepInputs
to handle.rar
archives, on systems with correct binaries to deal with them (#86, @tati-micheletti) - remove
fastdigest::fastdigest
as it is not return the identical hash across operating systems
Bug fixes
prepInputs
on GIS objects that don't useraster::raster
to load object were skippingpostProcess
. Fixed.- under some circumstances, the
prepInputs
would cause virtually all entries inCHECKSUMS.txt
to be deleted. 2 cases where this happened were identified and corrected. data.table
class objects would give an error sometimes due to use ofattr(DT)
. Internally, attributes are now added withdata.table::setattr
to deal with this.- calling
gdalwarp
fromprostProcess
now correctly matches extent (#73, @tati-micheletti) - files from url that have unknown extension are now guessed with by
preProcess
(#92, @tati-micheletti)
version 0.2.5
Dependency changes
- Added
remotes
to Imports and removeddevtools
New features
-
New value possible for
options(reproducible.useCache = 'overwrite')
, which allows use ofCache
in cases where the function call has an entry in thecacheRepo
, will purge it and add the output of the current call instead. -
New option
reproducible.inputPaths
(defaultNULL
) andreproducible.inputPathsRecursive
(defaultFALSE
), which will be used inprepInputs
as possible directory sources (searched recursively or not) for files being downloaded/extracted/prepared. This allows the using of local copies of files in (an)other location(s) instead of downloading them. If local location does not have the required files, it will proceed to download so there is little cost in setting this option. If files do exist on local system, the function will attempt to use a hardlink before making a copy. -
dlGoogle()
now setsoptions(httr_oob_default = TRUE)
if using Rstudio Server. -
Files in
CHECKSUMS
now sorted alphabetically. -
Checksums
can now have aCHECKSUMS.txt
file located in a different place than thedestinationPath
-
Attempt to select raster resampling method based on raster type if no method supplied (#63, @ianmseddy)
-
projectInputs
-
new function
assessDataTypeGDAL
, used inpostProcess
, to identify smallestdatatype
for large Raster* objects passed to GDAL system call- when masking and reprojecting large
Raster
objects, enactgdalwarp
system call ifraster::canProcessInMemory(x,4) = FALSE
for faster and memory-safe processing - better handling of various data types in
Raster
objects, including factor rasters
- when masking and reprojecting large
Bug fixes
- Work around internally inside
extractFromArchive
for large (>2GB) zip files. In theR
help manual,unzip
fails for zip files >2GB. This uses a system call if the zip file is too large and fails usingbase::unzip
. - Work around for
raster::getData
issues. - Speed up of
Cache()
when deeply nested, due togrep(sys.calls(), ...)
that would take long and hang. - Bugfix for
preProcess(url = NULL)
(#65, @tati-micheletti) - Improved memory performance of
clearCache
(#67), especially for largeRaster
objects that are stored as binaryR
files (i.e.,.rda
) - Other minor bugfixes
Other changes
- Deal with new
raster
package changes in development version ofraster
package - Added checks for float point number issues in raster resolutions produced by
raster::projectRaster
.robustDigest
now does not includeCache
-added attributes- Additional tests for
preProcess()
(#68, @tati-micheletti) - Many new unit tests written, which caught several minor bugs
version 0.2.3
- fix and skip downloading test on CRAN
version 0.2.2
Dependency changes
- Add
future
to Suggests.
New features
- new option on non-Windows OSs to use
future
forCache
saving to SQLite database, viaoptions("reproducible.futurePlan")
, if thefuture
package is installed. This isFALSE
by default. - If a
do.call
function is Cached, previously, it would be labelled in the database asdo.call
. Now it attempts to extract the actual function being called by thedo.call
. Messaging is similarly changed. - new option
reproducible.ask
, logical, indicating whetherclearCache
should ask for deletions when in an interactive session prepInputs
,preProcess
anddownloadFile
now havedlFun
, to pass a custom function for downloading (e.g., "raster::getData")prepInputs
will automatically usereadRDS
if the file is a.rds
.prepInputs
will return alist
iffun = "base::load"
, with a message; can still pass anenvir
to obtain standard behaviour ofbase::load
.clearCache
- new argumentask
.- new function
assessDataType
, used inpostProcess
, to identify smallestdatatype
for Raster* objects, if user does not pass an explicitdatatype
inprepInputs
orpostProcess
(#39, @CeresBarros).
Bug fixes
- fix problems with tests introduced by recent
git2r
update (@stewid, #36). .prepareRasterBackedFile
-- now will postpend an incremented numeric to a cached copy of a file-backed Raster object, if it already exists. This mirrors the behaviour of the.rda
file. Previously, if two Cache events returned the same file name backing a Raster object, even if the content was different, it would allow the same file name. If either cached object was deleted, therefore, it would cause the other one to break as its file-backing would be missing.- options were wrongly pointing to
spades.XXX
and should have beenreproducible.XXX
. copyFile
did not perform correctly under all cases; now better handling of these cases, often sending tofile.copy
(slower, but more reliable)extractFromArchive
needed a newChecksum
function call under some circumstances- several other minor bug fixes.
extractFromArchive
-- when dealing with nested zips, not all args were passed in recursively (#37, @CeresBarros)prepInputs
-- arguments that were same asCache
were not being correctly passed internally toCache
, and if wrapped in Cache, it was not passed into prepInputs. Fixed..prepareFileBackedRaster
was failing in some cases (specifically if it was inside ado.call
) (#40, @CeresBarros).Cache
was failing under some cases ofCache(do.call, ...)
. Fixed.Cache
-- when arguments to Cache were the same as the arguments inFUN
, Cache would "take" them. Now, they are correctly passed to theFUN
.preProcess
-- writing to checksums may have produced a warning ifCHECKSUMS.txt
was not present. Now it does not.- numerous other minor bugfixes
Other changes
- most tests now use a standardized approach to attaching libraries, creating objects, paths, enabling easier, error resistant test building
version 0.2.1
New features
-
new functions:
convertPaths
andconvertRasterPaths
to assist with renaming moved files.
-
prepInputs
-- new featuresalsoExtract
now has more options (NULL
,NA
,"similar"
) and defaults to extracting all files in an archive (NULL
).- skips
postProcess
altogether if nostudyArea
orrasterToMatch
. Previously, this would invoke Cache even if there was nothing topostProcess
.
Bug fixes
copyFile
correctly handles directory names containing spaces.makeMemoisable
fixed to handle additional edge cases.- other minor bug fixes.
version 0.2.0
New features
-
new functions:
prepInputs
to aid in data downloading and preparation problems, solved in a reproducible, Cache-aware way.postProcess
which is a wrapper for sequences of several other new functions (cropInputs
,fixErrors
,projectInputs
,maskInputs
,writeOutputs
, anddetermineFilename
)downloadFile
can handle Google Drive and ftp/http(s) fileszipCache
andmergeCache
compareNA
does comparisons with NA as a possible value e.g.,compareNA(c(1,NA), c(2, NA))
returnsFALSE, TRUE
-
Cache -- new features:
- new arguments
showSimilar
,verbose
which can help with debugging - new argument
useCache
which allows turning caching on and off at a high level (e.g., options("useCache")) - new argument
cacheId
which allows user to hard code a result from a Cache - deprecated arguments:
digestPathContent
-->quick
,compareRasterFileLength
-->length
- Cache arguments now propagate inward to nested
Cache
function calls, unless explicitly set on the inner functions - more precise messages provided upon each use
- many more
userTags
added automatically to cache entries so much more powerful searching viashowCache(userTags="something")
- new arguments
-
checksums
now returns a data.table with the same columns whetherwrite = TRUE
orwrite = FALSE
. -
clearCache
andshowCache
now give messages and require user intervention if request toclearCache
would be large quantities of data deleted -
memoise::memoise
now used on 3rd run through an identicalCache
call, dramatically speeding up in most cases -
new options:
reproducible.cachePath
,reproducible.quick
,reproducible.useMemoise
,reproducible.useCache
,reproducible.useragent
,reproducible.verbose
-
asPath
has a new argument indicating how deep should the path be considered when included in caching (only relevant whenquick = TRUE
) -
New vignette on using Cache
-
Cache is
parallel
-safe, meaning there aretryCatch
around every attempt at writing to SQLite database so it can be used safely on multi-threaded machines -
bug fixes, unit tests, more
imports
for packages e.g.,stats
-
updates for R 3.6.0 compact storage of sequence vectors
-
experimental pipes (
%>%
,%C%
) and assign%<%
-
several performance enhancements
version 0.1.4
-
mergeCache
: a new function to merge two different Cache repositories -
memoise::memoise
is now used onloadFromLocalRepo
, meaning that the 3rd timeCache()
is run on the same arguments (and the 2nd time in a session), the returned Cache will be from a RAM object via memoise. To stop this behaviour and use only disk-based Caching, setoptions(reproducible.useMemoise = FALSE)
. -
Cache assign --
%<%
can be used instead of normal assign, equivalent tolhs <- Cache(rhs)
. -
new option: reproducible.verbose, set to FALSE by default, but if set to true may help understand caching behaviour, especially for complex highly nested code.
-
all options now described in
?reproducible
. -
All Cache arguments other than FUN and ... will now propagate to internal, nested Cache calls, if they are not specified explicitly in each of the inner Cache calls.
-
Cached pipe operator
%C%
-- use to begin a pipe sequence, e.g.,Cache() %C% ...
-
Cache arg
sideEffect
can now be a path -
Cache arg
digestPathContent
default changed from FALSE (was for speed) to TRUE (for content accuracy) -
New function,
searchFull
, which shows the full search path, known alternatively as "scope", or "binding environments". It is where R will search for a function when requested by a user. -
Uses
memoise::memoise
for several functions (loadFromLocalRepo
,pkgDep
,package_dependencies
,available.packages
) for speed -- will impact memory at the expense of speed. -
New
Require
function- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
require
on those 20 packages, butrequire
does not check for dependencies and deal with them if missing: it just errors. This speed should be fast enough for many purposes. - can accept uncommented name, if length 1.
- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
-
remove
dplyr
from Imports -
Add
RCurl
to Imports -
change name of
digestRaster
to.digestRaster
version 0.1.3
- fix R CMD check errors on Solaris that were not previously resolved
version 0.1.2
- fix R CMD check errors on Solaris
- fix bug in
digestRaster
affecting in-memory rasters - move
rgdal
to Suggests
version 0.1.1
- cleanup examples and do run them (per CRAN)
- add tests to ensure all exported (non-dot) functions have examples
version 0.1.0
- A new package, which takes all caching utilities out of the
SpaDES
package.