Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.2.1
New features
- harmonized message colours:
bluefor allCachefunctions;cyanfor allprepInputsfunctions;greenfor questions that require user input. These are therefore user-visible colour changes. - improved messaging for
Cachecases where afile.linkis used instead of saving. postProcessand family now havefilename2 = NULLas the default, so not saved to disk. This is a change.- moved
paddedFloatToCharto reproducible from SpaDES.core.
Bug fixes
RasterStackobjects were not correctly saved to disk under some conditions inpostProcess- fixed- several minor
version 1.2.0
New features
postProcessnow uses a simpler single call togdalwarp, if available, forRasterLayerclass to accomplishcropInputs,projectInputs,maskInputs, andwriteOutputsall at once. This should be faster, simpler and, perhaps, more stable. It will only be invoked if theRasterLayeris too large to fit into RAM. To force it to be used the user must setuseGDAL = "force"inprepInputsorpostProcessor globally withoptions("reproducible.useGDAL" = "force")postProcesswhen using the newgdalwarp, has better persistence of colour table, and NA values as these are kept with better reliability- concurrent
Cachenow works as expected (e.g., with parallel processing, it will avoid collisions) with SQLite thanks to suggestion here: https://stackoverflow.com/a/44445010 - updated digesting of
Rasterclass objects to account for more of the metadata (including the colortable). This will change the digest value of allRasterlayers, causing re-run ofCache - removed
Require,pkgDep,trimVersionNumber,normPath,checkPaththat were moved toRequirepackage. For backwards compatibility, these are imported and reexported - address permanently or temporarily new changes in GDAL>3 and PROJ>6 in the spatial packages.
- new function
file.moveused to rename/copy files across disks (a situation wherefile.renamewould fail) - all
DBItype functions now have defaultcachePathofgetOption("reproducible.cachePath") Cache(prepInputs, ...on a file-backedRaster*class object now gives the non-Cache repository folder as thefilename(returnRaster). Previously, the return object would contain the cache repository as the folder for the file-backedRaster*
Dependency changes
- net reduction in number of packages that are imported from by 14. Removed completely:
backports,memoise,quickPlot,R.utils,remotes,tools, andversions; moved to Suggests:fastdigest,gdalUtils,googledrive,httr,qs,rgdal,sf,testthat; added:Require. Now there are 12 non-base packages listed in Imports. This is down from 31 prior to Ver 1.0.0.
bug fixes
- fix over-wide tables in PDF manual (#144)
- use
file.linknotfile.symlinkforsaveToCache. This would have resulted in C Stack overflow errors due to missing original file in thefile.symlink - use system call to
unzipwhen extracting large (>= 4GB) files (#145, @tati-micheletti) - several minor including
projectInputswhen converting to longlat projections,setMinMaxforgdalwarpresults Filenamesnow consistently returns a character vector (#149)- improvements to file-backed Raster caching to accommodate a few more edge cases
version 1.1.1
New features
- none
Dependency changes
- none
bug fixes
- fix CRAN test failure when
file.linkdoes not succeed.
version 1.1.0
New features
- begin to accommodate changes in GDAL/PROJ and associated updates to other spatial packages.
More updates are expected as other spatial packages (namely
raster) are updated. - can now change
options('reproducible.cacheSaveFormat')on the fly; cache will look for the file bycacheIdand write it usingoptions('reproducible.cacheSaveFormat'). If it is in another format, Cache will load it and resave it with the new format. Experimental still. - new
Copymethods forrefClassobjects,SQLiteand movedenvironmentmethod intoANYas it would be dispatched for unknown classes that inherit fromenvironment, of which there are many and this should be intercepted Requirecan now handle minimum version numbers, e.g.,Require("bit (>=1.1-15.2)"); this can be worked into downstream tools. Still experimental.- Cache will do
file.linkorfile.symlinkif an existing Cache entry with identical output exists and it is large (currently1e6bytes); this will save disk space. - Cache database now has tags for elapsed time of "digest", "original call", and "subsequent recovery from file",
elapsedTimeDigest,elapsedTimeFirstRun, andelapsedTimeLoad, respectively. - Better management of temporary files in package and tests, e.g., during downloading (
preProcess). Includes 2 new functions,tempdir2andtempfile2for use withreproduciblepackage - New option:
reproducible.tempPath, which is used for the new control of temporary files. Defaults tofile.path(tempdir(), "reproducible"). This feature was requested to help manage large amounts of temporary objects that were not being easily and automatically cleaned - Copying or moving of Cache directories now works automatically if using default
drvandconn; user may need to manually callmovedCacheif cache is not responding correctly. File-backed Rasters are automatically updated with new paths. - Cache now treats file-backed Rasters as though they had a relative path instead of their absolute path.
This means that Cache directories can be copied from one location to another and the file-backed
Raster*will have their filenames updated on the fly during a Cache recovery. User doesn't need to do anything. postProcessnow will perform simple tests and skipcropInputsandprojectInputswith a message if it can, rather than usingCacheto "skip". This should speed uppostProcessin many cases.- messaging with
Cachehas change. Now,cacheIdis shown in all cases, making it easier to identify specific items in the cache. - Automatically cleanup temporary (intermediate) raster files (with #110).
Dependency changes
- none
bug fixes
Copyonly creates a temporary directory for filebacked rasters; previously anyCopycommand was creating a temporary directory, regardless of whether it was neededcropInputs.spatialObjectshad a bug when object was a large non-Raster class.cropInputsmay have failed due to "self intersection" error when x was aSpatialPolygons*object; now catches error, runsfixErrorsand retriescrop. Great reprex by @tati-micheletti. Fixed in commit89e652ef111af7de91a17a613c66312c1b848847.Filenamesbugfix related toRasterBrickprepInputsdoes a better job of keeping all temporary files in a temporary folder; and cleans up after itself better.prepInputsnow will not show message that it is loading object into R iffun = NULL(#135).
version 1.0.0
New features
- This version is not backwards-compatible out of the box. To maintain backwards compatibility, set:
options("reproducible.useDBI" = FALSE) - A new backend was introduced that uses
DBIpackage directly, withoutarchivist. This has much improved speed. - New option:
options("reproducible.cacheSaveFormat"). This can be eitherrds(default) orqs. All cached objects will be saved with this format. Previously it wasrda. - Cache objects can now be saved with with
qs::qsave. In many cases, this has much improved speed and file sizes compared tords; however, testing across a wide range of conditions will occur before it becomes the default. - Changed default behaviour for memoising
...becauseCacheis now much faster, the default is to turn memoising off, viaoptions("reproducible.useMemoise" = FALSE). In cases of large objects, memoising should still be faster, so user can still activate it, setting the option toTRUE. - Much better SQLite database handling for concurrent write attempts. Tested with dozens of write attempts per second by 3 cores with abundant locked database occurrences.
postProcessarguseGDALcan now take"force"as the default behaviour is to not use GDAL if the problem can fit into RAM andsforrastertools will be faster thanGDALtoolsuseCloudargument inCacheand family has slightly modified functionality (see ?Cache new sectionuseCloud) and now has more tests including edge cases, such asuseCloud = TRUE, useCache = 'overwrite'. The cloud version now will also follow the"overwrite"command.
Dependency changes
- deprecating
archivist; moved to Suggests. - removed imports for
bitops,dplyr,fasterize,flock,git2r,lubridate,RcppArmadillo,RCurlandtidyselect. Some of these went to Suggests.
bug fixes
postProcesscalls that use GDAL made more robust (including #93).- Several minor, edge cases were detected and fixed.
version 0.2.11
Dependency changes
- remove
dplyras a direct dependency. It is still an indirect dependency throughDiagrammeR
New features
- new option:
reproducible.showSimilarDepthallows for a deeper assessment of nested lists for differences between the nearest cached object and the present object. This greater depth may allow more fine tuned understanding of why an object is not correctly caching - for downloading large files from GoogleDrive (currently only implemented), if user has set
options("reproducible.futurePlan")to something other thanFALSE, then it will show download progress if the file is "large".
bug fixes
- Several minor, edge cases were detected and fixed.
version 0.2.10
Dependency changes
- made compatible with
googledrivev 1.0.0 (#119)
New features
pkgDep2, a new convenience function to get the dependencies of the "first order" dependencies.useCache, used in many functions (inclCache,postProcess) can now be numeric, a qualitative indicator of "how deep" nestedCachecalls should setuseCache = TRUE-- implemented as 1 or 2 inpostProcesscurrently. See?Cache
bug fixes
pkgDepwas becoming unreliable for unknown reasons. It has been reimplemented, much faster, without memoising. The speed gains should be immediately noticeable (6 second to 0.1 second forpkgDep("reproducible"))- improved
retryto use exponential backoff when attempting to access online resources (#121)
version 0.2.9
New features
- Cache has 2 new arguments,
useCloudandcloudFolderID. This is a new approach to cloud caching. It has been tested with file backedRasterLayer,RasterStackandRasterBrickand all normal R objects. It will not work for any other class of disk-backed files, e.g.,fforbigmatrix, nor is it likely to work for R6 class objects. - Slowly deprecating cloudCache and family of functions in favour of a new approach using arguments to
Cache, i.e.,useCacheandcloudFolderID downloadDatafrom Google Drive now protects against HTTP2 error by capturing error and retrying. This is a curl issue for interrupted connections.
Bug fixes
- fixes for
rcnsterrors on R-devel, tested usingdevtools::check(env_vars = list("R_COMPILE_PKGS"=1, "R_JIT_STRATEGY"=4, "R_CHECK_CONSTANTS"=5)) - other minor improvements, including fixes for #115
version 0.2.8
New features
- new functions for accessing specific items from the
cacheRepo:getArtifact,getCacheId,getUserTags retry, a new function, wrapstrywith an explicit attempt to retry the same code upon error. Useful for flaky functions, such asgoogldrive::drive_downloadwhich sometimes fails due tocurlHTTP2 error.- removed all
Rcppfunctionality as the functions were no longer faster than their R base alternatives.
Bug fixes
prepInputswas not correctly passinguseCachecropInputswas reprojecting extent of y as a time saving approach, but this was incorrect ifstudyAreais aSpatialPolygonthat is not close to filling the extent. It now reprojectsstudyAreadirectly which will be slower, but correct. (#93)- other minor improvements
version 0.2.7
New features
CHECKSUMS.txtshould now be ordered consistently across operating systems (note:base::orderwill not succeed in doing this --> now using.orderDotsUnderscoreFirst)cloudSyncCachehas a new argument:cacheIds. Now user can control entries bycacheId, so can delete/upload individual objects bycacheId- Experimental support within the
postProcessfamily forsfclass objects
bug fixes
- mostly minor
cloudCachebugfixes for more cases
version 0.2.6
Dependency changes
- remove
tibblefrom Imports as it's no longer being used
New features
- remove
%>%pipe that was long ago deprecated. User should use%C%if they want a pipe that is Cache-aware. See examples. - Full rewrite of all
optionsdescriptions now inreproducible, see?reproducibleOptions - now
cacheRepoandoptions("reproducible.cachePath")can take a vector of paths. Similar to how .libPaths() works for libraries,Cachewill search first in the first entry in thecacheRepo, then the second etc. until it finds an entry. It will only write to the first entry. - new value for the option:
options("reproducible.useCache" = "devMode"). The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. IndevMode, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It differs from the normal Cache if the Cache call does not find a copy in thecacheRepo, but it does find an entry that matches based onuserTags. In this case, it will delete the old entry in thecacheRepo(identified based on matchinguserTags), then continue with normalCache. For this to work correctly,userTagsmust be unique for each function call. This should be used with caution as it is still experimental. - change to how hashes are calculated. This will cause existing caches to not work correctly. To allow a user to keep old behaviour (during a transition period), the "old" algorithm can be used, with
options("reproducible.useNewDigestAlgorithm" = FALSE). There is a message of this change on package load. - add experimental
cloud*functions, especiallycloudCachewhich allows sharing of Cache among collaborators. Currently only works withgoogledrive - updated
assessDataTypeto consolidateassessDataTypeGDALandassessDataTypeinto single function (#71, @ianmseddy) cc: new function -- a shortcut for some commonly used options forclearCache()- added experimental capacity for
prepInputsto handle.rararchives, on systems with correct binaries to deal with them (#86, @tati-micheletti) - remove
fastdigest::fastdigestas it is not return the identical hash across operating systems
Bug fixes
prepInputson GIS objects that don't useraster::rasterto load object were skippingpostProcess. Fixed.- under some circumstances, the
prepInputswould cause virtually all entries inCHECKSUMS.txtto be deleted. 2 cases where this happened were identified and corrected. data.tableclass objects would give an error sometimes due to use ofattr(DT). Internally, attributes are now added withdata.table::setattrto deal with this.- calling
gdalwarpfromprostProcessnow correctly matches extent (#73, @tati-micheletti) - files from url that have unknown extension are now guessed with by
preProcess(#92, @tati-micheletti)
version 0.2.5
Dependency changes
- Added
remotesto Imports and removeddevtools
New features
-
New value possible for
options(reproducible.useCache = 'overwrite'), which allows use ofCachein cases where the function call has an entry in thecacheRepo, will purge it and add the output of the current call instead. -
New option
reproducible.inputPaths(defaultNULL) andreproducible.inputPathsRecursive(defaultFALSE), which will be used inprepInputsas possible directory sources (searched recursively or not) for files being downloaded/extracted/prepared. This allows the using of local copies of files in (an)other location(s) instead of downloading them. If local location does not have the required files, it will proceed to download so there is little cost in setting this option. If files do exist on local system, the function will attempt to use a hardlink before making a copy. -
dlGoogle()now setsoptions(httr_oob_default = TRUE)if using Rstudio Server. -
Files in
CHECKSUMSnow sorted alphabetically. -
Checksumscan now have aCHECKSUMS.txtfile located in a different place than thedestinationPath -
Attempt to select raster resampling method based on raster type if no method supplied (#63, @ianmseddy)
-
projectInputs -
new function
assessDataTypeGDAL, used inpostProcess, to identify smallestdatatypefor large Raster* objects passed to GDAL system call- when masking and reprojecting large
Rasterobjects, enactgdalwarpsystem call ifraster::canProcessInMemory(x,4) = FALSEfor faster and memory-safe processing - better handling of various data types in
Rasterobjects, including factor rasters
- when masking and reprojecting large
Bug fixes
- Work around internally inside
extractFromArchivefor large (>2GB) zip files. In theRhelp manual,unzipfails for zip files >2GB. This uses a system call if the zip file is too large and fails usingbase::unzip. - Work around for
raster::getDataissues. - Speed up of
Cache()when deeply nested, due togrep(sys.calls(), ...)that would take long and hang. - Bugfix for
preProcess(url = NULL)(#65, @tati-micheletti) - Improved memory performance of
clearCache(#67), especially for largeRasterobjects that are stored as binaryRfiles (i.e.,.rda) - Other minor bugfixes
Other changes
- Deal with new
rasterpackage changes in development version ofrasterpackage - Added checks for float point number issues in raster resolutions produced by
raster::projectRaster .robustDigestnow does not includeCache-added attributes- Additional tests for
preProcess()(#68, @tati-micheletti) - Many new unit tests written, which caught several minor bugs
version 0.2.3
- fix and skip downloading test on CRAN
version 0.2.2
Dependency changes
- Add
futureto Suggests.
New features
- new option on non-Windows OSs to use
futureforCachesaving to SQLite database, viaoptions("reproducible.futurePlan"), if thefuturepackage is installed. This isFALSEby default. - If a
do.callfunction is Cached, previously, it would be labelled in the database asdo.call. Now it attempts to extract the actual function being called by thedo.call. Messaging is similarly changed. - new option
reproducible.ask, logical, indicating whetherclearCacheshould ask for deletions when in an interactive session prepInputs,preProcessanddownloadFilenow havedlFun, to pass a custom function for downloading (e.g., "raster::getData")prepInputswill automatically usereadRDSif the file is a.rds.prepInputswill return alistiffun = "base::load", with a message; can still pass anenvirto obtain standard behaviour ofbase::load.clearCache- new argumentask.- new function
assessDataType, used inpostProcess, to identify smallestdatatypefor Raster* objects, if user does not pass an explicitdatatypeinprepInputsorpostProcess(#39, @CeresBarros).
Bug fixes
- fix problems with tests introduced by recent
git2rupdate (@stewid, #36). .prepareRasterBackedFile-- now will postpend an incremented numeric to a cached copy of a file-backed Raster object, if it already exists. This mirrors the behaviour of the.rdafile. Previously, if two Cache events returned the same file name backing a Raster object, even if the content was different, it would allow the same file name. If either cached object was deleted, therefore, it would cause the other one to break as its file-backing would be missing.- options were wrongly pointing to
spades.XXXand should have beenreproducible.XXX. copyFiledid not perform correctly under all cases; now better handling of these cases, often sending tofile.copy(slower, but more reliable)extractFromArchiveneeded a newChecksumfunction call under some circumstances- several other minor bug fixes.
extractFromArchive-- when dealing with nested zips, not all args were passed in recursively (#37, @CeresBarros)prepInputs-- arguments that were same asCachewere not being correctly passed internally toCache, and if wrapped in Cache, it was not passed into prepInputs. Fixed..prepareFileBackedRasterwas failing in some cases (specifically if it was inside ado.call) (#40, @CeresBarros).Cachewas failing under some cases ofCache(do.call, ...). Fixed.Cache-- when arguments to Cache were the same as the arguments inFUN, Cache would "take" them. Now, they are correctly passed to theFUN.preProcess-- writing to checksums may have produced a warning ifCHECKSUMS.txtwas not present. Now it does not.- numerous other minor bugfixes
Other changes
- most tests now use a standardized approach to attaching libraries, creating objects, paths, enabling easier, error resistant test building
version 0.2.1
New features
-
new functions:
convertPathsandconvertRasterPathsto assist with renaming moved files.
-
prepInputs-- new featuresalsoExtractnow has more options (NULL,NA,"similar") and defaults to extracting all files in an archive (NULL).- skips
postProcessaltogether if nostudyAreaorrasterToMatch. Previously, this would invoke Cache even if there was nothing topostProcess.
Bug fixes
copyFilecorrectly handles directory names containing spaces.makeMemoisablefixed to handle additional edge cases.- other minor bug fixes.
version 0.2.0
New features
-
new functions:
prepInputsto aid in data downloading and preparation problems, solved in a reproducible, Cache-aware way.postProcesswhich is a wrapper for sequences of several other new functions (cropInputs,fixErrors,projectInputs,maskInputs,writeOutputs, anddetermineFilename)downloadFilecan handle Google Drive and ftp/http(s) fileszipCacheandmergeCachecompareNAdoes comparisons with NA as a possible value e.g.,compareNA(c(1,NA), c(2, NA))returnsFALSE, TRUE
-
Cache -- new features:
- new arguments
showSimilar,verbosewhich can help with debugging - new argument
useCachewhich allows turning caching on and off at a high level (e.g., options("useCache")) - new argument
cacheIdwhich allows user to hard code a result from a Cache - deprecated arguments:
digestPathContent-->quick,compareRasterFileLength-->length - Cache arguments now propagate inward to nested
Cachefunction calls, unless explicitly set on the inner functions - more precise messages provided upon each use
- many more
userTagsadded automatically to cache entries so much more powerful searching viashowCache(userTags="something")
- new arguments
-
checksumsnow returns a data.table with the same columns whetherwrite = TRUEorwrite = FALSE. -
clearCacheandshowCachenow give messages and require user intervention if request toclearCachewould be large quantities of data deleted -
memoise::memoisenow used on 3rd run through an identicalCachecall, dramatically speeding up in most cases -
new options:
reproducible.cachePath,reproducible.quick,reproducible.useMemoise,reproducible.useCache,reproducible.useragent,reproducible.verbose -
asPathhas a new argument indicating how deep should the path be considered when included in caching (only relevant whenquick = TRUE) -
New vignette on using Cache
-
Cache is
parallel-safe, meaning there aretryCatcharound every attempt at writing to SQLite database so it can be used safely on multi-threaded machines -
bug fixes, unit tests, more
importsfor packages e.g.,stats -
updates for R 3.6.0 compact storage of sequence vectors
-
experimental pipes (
%>%,%C%) and assign%<% -
several performance enhancements
version 0.1.4
-
mergeCache: a new function to merge two different Cache repositories -
memoise::memoiseis now used onloadFromLocalRepo, meaning that the 3rd timeCache()is run on the same arguments (and the 2nd time in a session), the returned Cache will be from a RAM object via memoise. To stop this behaviour and use only disk-based Caching, setoptions(reproducible.useMemoise = FALSE). -
Cache assign --
%<%can be used instead of normal assign, equivalent tolhs <- Cache(rhs). -
new option: reproducible.verbose, set to FALSE by default, but if set to true may help understand caching behaviour, especially for complex highly nested code.
-
all options now described in
?reproducible. -
All Cache arguments other than FUN and ... will now propagate to internal, nested Cache calls, if they are not specified explicitly in each of the inner Cache calls.
-
Cached pipe operator
%C%-- use to begin a pipe sequence, e.g.,Cache() %C% ... -
Cache arg
sideEffectcan now be a path -
Cache arg
digestPathContentdefault changed from FALSE (was for speed) to TRUE (for content accuracy) -
New function,
searchFull, which shows the full search path, known alternatively as "scope", or "binding environments". It is where R will search for a function when requested by a user. -
Uses
memoise::memoisefor several functions (loadFromLocalRepo,pkgDep,package_dependencies,available.packages) for speed -- will impact memory at the expense of speed. -
New
Requirefunction- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
requireon those 20 packages, butrequiredoes not check for dependencies and deal with them if missing: it just errors. This speed should be fast enough for many purposes. - can accept uncommented name, if length 1.
- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
-
remove
dplyrfrom Imports -
Add
RCurlto Imports -
change name of
digestRasterto.digestRaster
version 0.1.3
- fix R CMD check errors on Solaris that were not previously resolved
version 0.1.2
- fix R CMD check errors on Solaris
- fix bug in
digestRasteraffecting in-memory rasters - move
rgdalto Suggests
version 0.1.1
- cleanup examples and do run them (per CRAN)
- add tests to ensure all exported (non-dot) functions have examples
version 0.1.0
- A new package, which takes all caching utilities out of the
SpaDESpackage.