Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize `archivePackages()` and `pruneRepo()` #93

Merged
merged 21 commits into from Jun 10, 2020

Conversation

@pat-s
Copy link
Contributor

@pat-s pat-s commented Jun 5, 2020

Besides vectorizing both functions, I've also added the ability to archive packages for arbitrary R versions for every instance of "type".

By default (version = NULL), existing R versions within the given repopaths are checked and used.

Hence, by default all available "type"s are executed with all existing R version found in their respective directories.

I've already tested the approach on mlr3learners.drat in which we have 21 packages with binaries for different R versions.
So far it works fine.

I added two calls to {stringr} which could probably replaced by base R functions. If that's a blocker I can get rid of them. However, extracting substr with base R is not so nice because substr() does not support regexs.

If you want to try it out, you can do the following locally while having this branch installed (remotes::install_github("pat-s/drat@vectorize-archive")):

git clone git@github.com:mlr3learners/mlr3learners.drat.git
rm -rf bin/
rm -rf src/
git reset --hard 6c4ba1625f337e5dcf733eebd9fd0a7a73f071f8

and then call drat::archivePackages(".").

DESCRIPTION Outdated Show resolved Hide resolved
DESCRIPTION Outdated Show resolved Hide resolved
DESCRIPTION Outdated Show resolved Hide resolved
NAMESPACE Outdated Show resolved Hide resolved
##' This function is still undergoing development and polish and may
##' change in subsequent versions.
##'
##' @title Move older copies of packages to an archive
##' @importFrom stringr str_extract

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 5, 2020
Owner

Please remove / replace.

# respective "type" directory is scraped
if (is.null(version)) {
if (grepl("binary", type)) {
r_versions <- stringr::str_extract(list.dirs(gsub(

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 5, 2020
Owner

I appreciate your offer to rewrite this and not use extra dependencies for a function that is a bit more tangential to the package.

And if and when you rewite please avoid excess line breaks. 100 cols are fine. You can see in the rest of the file that we often use close to 100 cols (and the one at 111 cols could be considered too long ... but it works so no need for a PR or change of working code).

@pat-s pat-s requested a review from eddelbuettel Jun 7, 2020
Copy link
Owner

@eddelbuettel eddelbuettel left a comment

Thanks, looks better now.

"mac.binary.mavericks", "win.binary"
),
pkg = NULL,
version = NULL) {

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 7, 2020
Owner

The suggested change is now simpler and more focussed. One thing some people have learned to not be cavalier about is the change if existing interfaces. pkg and version defaulted to "possibly empty" with a test for is.missing. Having NULL is similar but not the same, and it is different. There may be users out there with an existing script calling this (I am actually not entirely sure if/how anybody uses archivePackages() but I cannot know). The use case of 'called with no argument' is preserved so I guess I can call that good enough.

dirs = list.dirs(gsub(
"/[0-9].[0-9]$", "", contrib.url(repopath, type)),
recursive = FALSE)
r_versions = substring(dirs, regexpr("[0-9].[0-9]$", dirs))
}

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 7, 2020
Owner

Still not my style of writing but then I just like others think how R Core writes code matters most, and i happen to use the same editor they use (or at least a number of them). So whatevs. Thanks for adhering to four space indentation though, that is appreciated. (And this repo is so small that I never bothered with ,editoconfig which may be overkill.)

file.rename(file.path(repodir, old$file), file.path(repodir, "Archive", old$package, old$file))

invisible(NULL)
return(invisible(NULL))

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 7, 2020
Owner

Adding invibsible is good style now that df may be larger.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 7, 2020

@jangorecki Any comments?

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 7, 2020

Also, and that is of course a little out there and not a "formal" requirement: how would you go about testing this? You told use several times already that you use it for mlr3 repo or subrepo so is the best answer "in prod" ?

Copy link
Contributor

@jangorecki jangorecki left a comment

not a functionality that I am after so not really following how it address requirements

for (f in rmfiles) {
fullfile <- file.path(repodir, f)
unlink(fullfile)
stop("Unknown package type. Valid values are 'source', 'mac.binary', 'mac.binary.el-capitan', 'mac.binary.mavericks' and 'win.binary'.", call. = FALSE)

This comment has been minimized.

This comment has been minimized.

@FelixErnst

FelixErnst Jun 8, 2020
Contributor

However, binary is not specific. binary means in this case all binary cases, win.binary, mac.binary and so on.

A logic for this is now implemented including also the broadest term both.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 7, 2020

The PR is also a roxygen run behind. I just added that.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 7, 2020

And just for completeness, the littler script dratInsert.r still works with this code and just up a new Rcpp dev version 1.0.4.12 so the core functionality is still there too...

@pat-s
Copy link
Contributor Author

@pat-s pat-s commented Jun 8, 2020

Also, and that is of course a little out there and not a "formal" requirement: how would you go about testing this? You told use several times already that you use it for mlr3 repo or subrepo so is the best answer "in prod" ?

Yes, and given the complexity in testing this, it is sufficient for me here.

One could of course replicate a small drat structure on the fly during tests and then reset it's state at the end of the tests. However, testing the results automatically is also not that easy.
Looking at the results in practice/production after triggering the function works.
The code snippet I posted should also work in the future since the repo won't go away.
Also there is only few risk that the initial repo structure is in a different state than what it should be (possibly causing the function to fail).


Feel free to make additional changes, including indention/coding style.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 8, 2020

@FelixErnst This PR changes some of the code you added recently and extends it. As you, if I remember correctly, use this "in prod" could you take a look and (if you have the time) a quick test drive of the branch / pr ?

Copy link
Contributor

@FelixErnst FelixErnst left a comment

I think that the comments made in the issue discussion maybe were not considered to the full extent. The redundancies and and short cuts taken are unnecessary and I think the issue, which is a really good and valid issue, can be solved more elegantly.

I suggested certain aspects of, what my ideas would be in the branch on my fork. Have a look at it here: FelixErnst/drat@master...FelixErnst:vectorize-archive-prune

edit: just to summarise what I think is a good solution for solving the issue and not just flapping some for in the code:

  • the first part of pruneRepo is actually solving the problem of getting information in the repo. Therefore this should be split of.
  • if it is split of, this is the only part, which needs to be vectorized, since both pruneRepo and archivePackge are using rows of a data.frame to figure out, what to do where
  • additional changes to contrib.url2 to allow for different types
  • matching the repodirs to type, which can be done by simple character subsetting

These changes are included in the branch on my fork

edit2: the code in this PR works as intended and solves the issue

if (!file.exists(archive)) {
if (!dir.create(archive, recursive = TRUE)) {
stop("Archive directory not found and couldn't be created\n", call. = FALSE)
for (type in type) {

This comment has been minimized.

@FelixErnst

FelixErnst Jun 8, 2020
Contributor

This is not good style and I would argue, that there is an underlying issue.

I would argue the following:

  • for is ok to be used, if there is a connection between each iteration.
  • That is not the case here, since path for different types should be independent.
  • types are overlapping(e.g. binary and win.binary), which makes iteration over them not ideal
}
}

mkArchive <- function(x) {

This comment has been minimized.

@FelixErnst

FelixErnst Jun 8, 2020
Contributor

The same as above. The function should not need to be redefined. I know, that repodir changes, but good style, would the refactoring of the mkArchive function for an additional parameter

@@ -35,93 +35,108 @@
##' be removed.
##' @author Dirk Eddelbuettel
pruneRepo <- function(repopath = getOption("dratRepo", "~/git/drat"),
type = "source",
pkg,
type = c(

This comment has been minimized.

@FelixErnst

FelixErnst Jun 8, 2020
Contributor

There is not check for enforcing correct type values before starting the loop. Therefore incorrect types leave the repo in an "undefined" state, if the second or later type produce a stop condition.

if (is.null(version)) {
if (grepl("binary", type)) {
dirs = list.dirs(gsub(
"/[0-9].[0-9]$", "", contrib.url(repopath, type)),

This comment has been minimized.

@FelixErnst

FelixErnst Jun 8, 2020
Contributor

I am not sure, but this regex does not masks . It works here, but I think /336 would also match. No problem here, but /[0-9]\\.[0-9]$ would be better

@pat-s
Copy link
Contributor Author

@pat-s pat-s commented Jun 8, 2020

Felix,

lot's can be made more sophisticated here, one could always add more assertions or tests.
This is a bare-bone version of extending functionality and making things easier for the user at the same time. The "bare-bone state" of most functions in this package before this PR was not different than with this new addition.

It works for the job that is should do and I also do not see much danger in wrong arguments inputs from users as the usage is pretty much tied to a properly structured drat repo anyways.

I do not have more time budget to refine this PR substantially. Happy to give you push access to the fork if you want to do so.

edit2: the code in this PR works as intended and solves the issue

I'm not sure what this means here - does it invalidate some of your previous comments?

@FelixErnst
Copy link
Contributor

@FelixErnst FelixErnst commented Jun 8, 2020

Hi Patrick @pat-s,

The code in your PR, works as intended. So my comments are just about the logic in the code, which I think need to be improved, so that the code can be modified in incremental steps from this point forward, so that user contributions can be made easily.

In R, in my opinion, and this just me talking, if you use the broad stroke functions for or lapply it has the danger of putting everything in a jacket and thus, hinders small step development.

I would be very happy to push the things I have in mind to your fork, if you like.

Copy link
Contributor

@FelixErnst FelixErnst left a comment

to work on all R versions, version should be set to NA, since NULL is usually interpreted by base R as the current version. Added dedicated versions for working on all R versions to avoid any unexpected behaviour, when upgrading drat

@FelixErnst
Copy link
Contributor

@FelixErnst FelixErnst commented Jun 9, 2020

I added fixed test packages in extdata for macOS 3.6 and 4.0, windows 4.0 and source with version 1.0, 1.1 and 1.2 each. I hope that is ok. It adds 100Kb to the package, but the tests did unearth some small fixes needed.

these test might break, when the R version gets bumped (probably April 2021)
@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 9, 2020

@FelixErnst That is pretty awesome -- but it also broke the tests:

image

Can you please look into restoring the package to passing tests. either by correcting what is currently broken (preferred) or removing broken tests (less ideal)?

Also, can we slow down a little and maybe discuss what may get committed next and get agreement by all (or most) before just going off and doing it? We're almost there here and I really appreciate the input but the repo has gotten a little hyperactive.

@FelixErnst
Copy link
Contributor

@FelixErnst FelixErnst commented Jun 9, 2020

@FelixErnst That is pretty awesome -- but it also broke the tests:

Can you please look into restoring the package to passing tests. either by correcting what is currently broken (preferred) or removing broken tests (less ideal)?

The error occurs because a certain version of the .Rbuildignore file is used on travis-CI. The last line ^.*\.tar\.gz$ is the problem and this was corrected on this branch, but somehow persists for the travis-CI build. I don't know, how I could fix this for travis-CI. (In addition, I don't know, why it was there in the first place. Maybe I overlooked something)

Also, can we slow down a little and maybe discuss what may get committed next and get agreement by all (or most) before just going off and doing it? We're almost there here and I really appreciate the input but the repo has gotten a little hyperactive.

Of course. I had the time and I was on a roll and really like the issues raised by @pat-s and the additional functionality if offers. I am done with the restructuring I had in mind.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 9, 2020

I see. The Travis setup uses the simple run.sh from this repo, notably the version in this file from docs/run.sh. I was worried it may tamper with that expression, but it doesn't. It does however add three items (if missing) on lines 65 to 73. Could it be as simple as a missing newline?

You can always cat a cat .Rbuildignore to .travis.yml to see what it is like...

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 9, 2020

Slight thinko on my part. The existing ^.*\.tar\.gz$ is in there because I tend to build tar.gz with the a simpler littler script that leaves them in the repo directory (to not clutter the parent directory). But one then needs to exclude the previous source tar.gz from the current one being built. So here maybe we want drat_.*\.tar\.gz$ instead as the final line. And ... make sure a new line gets added. I should probably robustify run.sh for that too.

@FelixErnst
Copy link
Contributor

@FelixErnst FelixErnst commented Jun 9, 2020

hmm. the error persists. Can one get a hold of the status of the virtual machine from travis-CI?

I encountered the same problem locally and fixed it, but without investigating what is going one travis-CI, I don't have any idea what might be going on here.

I can of course remove the tests...

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 9, 2020

I think we should remove the tests and table them for a later addition and get on with our lifes.

I think you can come close to mimicking Travis CI by starting in a Ubuntu 18.04 "bionic" container, but it is not perfect. (Which is one of the reasons I went [in some larger repos] to fully Dockerized CI tests where Travis CI simply calls the Docker container I provide -- much better control over what is present and how. Overkill for drat, of course).

drat_.*\.tar\.gz$

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 9, 2020
Owner

Change may now not be needed but whatevs.

Version: 0.1.6
Date: 2020-05-29
Version: 0.1.6.4
Date: 2020-06-09

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 9, 2020
Owner

I told @pat-s for "mucking" with Version and Date as that is what the release manager aka maintainer does. But at least you used the pattern I use which is a minimal delta increment (whereas he forcefully suggested that his changes should bump immediately to 0.2.0 which is .... something we agreed to reverse). This is fine.

Author: Dirk Eddelbuettel with contributions by Carl Boettiger, Neal Fultz,
Sebastian Gibb, Colin Gillespie, Jan Górecki, Matt Jones, Thomas Leeper,
Steven Pav, Jan Schulz, Christoph Stepper and Felix G.M. Ernst.
Steven Pav, Jan Schulz, Christoph Stepper, Felix G.M. Ernst and Patrick
Schratz.

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 9, 2020
Owner

Similar. My project. At the end of the day shouldn't I decide whom I anoint as a contributor? No need to change now, but please consider it next time.

This comment has been minimized.

@FelixErnst

FelixErnst Jun 9, 2020
Contributor

I completely agree. However, I didn't want to highjack Patrick's idea. It was his and he did the hard work of coming up with it.

archivePackages)
pruneRepoForAllRversions,
archivePackages,
archivePackagesForAllRversions)

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 9, 2020
Owner

I really want to be done with the PR but dropping new functions into the API without first discussing it is ... a little on the strong-willed side too. Here I am ok with your changes; the process was a little on the heavy side.

This comment has been minimized.

@FelixErnst

FelixErnst Jun 9, 2020
Contributor

Sorry for that. I didn't want to interfere with the expectation of current users, that more than R version would be targeted. I couldn't think of an alternative.

This comment has been minimized.

@eddelbuettel

eddelbuettel Jun 9, 2020
Owner

I think the design is entirely defensible and possibly even good :) (I don't really use these archiver and pruner functions in my workflow so I am mostly 🤷‍♂️ here) but the process of dropping them unannounced is not one I prefer in my repos. That was all.

This comment has been minimized.

@FelixErnst

FelixErnst Jun 9, 2020
Contributor

Totally agree, However, I didn't drop anything, just slotted the additional functions in there at their appropriate places.

drat.Rproj Show resolved Hide resolved
Copy link
Owner

@eddelbuettel eddelbuettel left a comment

See comments within.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 9, 2020

Lastly, I really like the code changes you brought. They help.

We had bad luck with the test, and had to revert. Stuff happens. Now that we reverted, should these not be deleted?

edd@rob:~/git/drat(vectorize-archive)$ tree inst/extdata/
inst/extdata/
├── 3.6
│   ├── foo_1.0.tgz
│   ├── foo_1.1.tgz
│   └── foo_1.2.tgz
├── 4.0
│   ├── foo_1.0.tgz
│   ├── foo_1.0.zip
│   ├── foo_1.1.tgz
│   ├── foo_1.1.zip
│   ├── foo_1.2.tgz
│   └── foo_1.2.zip
└── src
    ├── foo_1.0.tar.gz
    ├── foo_1.1.tar.gz
    └── foo_1.2.tar.gz

3 directories, 12 files
edd@rob:~/git/drat(vectorize-archive)$ 
@FelixErnst
Copy link
Contributor

@FelixErnst FelixErnst commented Jun 9, 2020

Nope. The CI problem was two fold.

  1. I didn't check my git add carefully enough. a .gitignore statement interefered with .tar.gz files being added (Hence, "the missing files... -_-" commit a32e27e)
  2. The .Rbuildignore removed them from the check dir.

The test works now as intended and the files listed above are used for testing.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 9, 2020

I will let this setlle for a bit and ponder, but it may now be ready for merge, maybe by tomorrow. Having an actual test is step up in rigour so thanks for putting it together!

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jun 10, 2020

So after 65 comments and 21 commits we appear to be ready.

@eddelbuettel eddelbuettel merged commit 5f7e03b into eddelbuettel:master Jun 10, 2020
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@pat-s
Copy link
Contributor Author

@pat-s pat-s commented Jun 14, 2020

I haven't had time to follow up here.
Thanks for everyone pushing this forward.

AFAICS there is now a new archivePackagesForAllRversions(). Why is there an extra function needed to loop over different R versions? This is a quite unusual design decision.

In addition, I tried archivePackagesForAllRversions() on the example repo of the first post in this PR and the macOS binaries for R 4.0 did not get archived. @FelixErnst Could you have a look? Setting up the repo should only take a few seconds (see the snippet in the first post).

@FelixErnst
Copy link
Contributor

@FelixErnst FelixErnst commented Jun 15, 2020

@FelixErnst
Copy link
Contributor

@FelixErnst FelixErnst commented Jun 20, 2020

So I looked into this

  • bin/macosx/mac.binary/contrib/4.0 is not a path valid for R 4.0 packages to be installed from. bin/macosx/contrib/4.0 is the correct path. So I guess your repo is broken at least for the mac binaries.
  • archivePackages used to work on packages of the current R version (current as in R version the user did run drat from). This changed by introducing the version argument in 0.1.6 and by the changes you introduced for vectorization. (You specified version = NULL, which I changed to version = NA to distinguish no version == all versions (NA), current version (NULL) and specific version.) By having version = NA as default, the user can very easily work on all folders of the repo, but there is a danger, that a surgical removal of packages for a specific version is not available or always overwritten by default. For your conveniance and a clear behavior I added the archivePackagesForAllRversions() function. The name is very bulky, but I couldn't think of an alternative.
@pat-s
Copy link
Contributor Author

@pat-s pat-s commented Jun 20, 2020

Thanks for taking the time.

bin/macosx/mac.binary/contrib/4.0 is not a path valid for R 4.0 packages to be installed from. bin/macosx/contrib/4.0 is the correct path. So I guess your repo is broken at least for the mac binaries.

The current repo state is ok but you are right, it had the wrong paths for the revision given in the first post.

Seems to work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.