New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems #12849
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work on this PR I commented some minor things but other than that I think it is ready to go. I tested it on my windows machine and it works!
I noticed that there are no tests for create_package_with_all_dependencies
so that might be something we could add in this PR.
As I am not a committer I will ask for a second opinion from someone who can merge this.
r/R/install-arrow.R
Outdated
# Only supports a small subset of bash substitution patterns. May have multiple | ||
# bash variables in `one_string` | ||
# Used as a helper to parse versions.txt | ||
..install_substitute_like_bash <- function(one_string, possible_values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not necessary to use the dot-prefix for internal functions as it is done nowhere else in the package. Nice job keeping a consistent naming scheme though! 👍
r/R/install-arrow.R
Outdated
# Substitute like the bash shell | ||
# | ||
# @param one_string A length-1 character vector | ||
# @param possible_values A dictionary-ish set of variables that could provide | ||
# values to substitute in. | ||
# @return `one_string`, with values substituted like bash would. | ||
# | ||
# Only supports a small subset of bash substitution patterns. May have multiple | ||
# bash variables in `one_string` | ||
# Used as a helper to parse versions.txt | ||
..install_substitute_like_bash <- function(one_string, possible_values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote some roxygen-like documentation, but didn't actually want to generate the
*.Rd
files, so just started the lines with#
instead of#'
.
You can prevent the creation of an Rd file by using @noRd
.
r/R/install-arrow.R
Outdated
)[[1]] # Subset [[1]] because one_string has length 1 | ||
# `matched_substrings` is a character vector with length equal to the number | ||
# of non-overlapping matches of `version_regex` in `one_string`. `match_list` | ||
# is a list (same length as `matched_substrings`), where each list element is | ||
# a length-3 character vector. The first element of the vector is the value | ||
# from `matched_substrings` (e.g. "${ARROW_ZZZ_VERSION//./_}"). The following | ||
# two values are the captured groups specified in `version_regex` e.g. | ||
# "ARROW_ZZZ_VERSION" and "//./_". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
r/R/install-arrow.R
Outdated
) | ||
|
||
if (any(failed_to_parse)) { | ||
stop( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use rlang::abort
here but that is very heterogenous in the package so 🤷 (and you aimed for base R which is good!)
@karldw thanks for taking a stab at this and apologies for missing your earlier ping on jira (I get lots of jira notifications). From the error message reported on the issue, it looks like the problem is the call to
(macOS has readlink but doesn't support -f; In general I think trying to download_dependencies.sh work on more platforms is the right solution, rather than hacking around it in R. |
Is there a setup agnostic way to run bash scripts on windows from R? The only possibility I see is using WSL or checking for git-bash (which is not on path by default afaik). |
On Windows I think we can take for granted that if a user is building a package they have already installed RTools (or should do so). It looks like pkgbuild has a function |
Yeah it is safe to assume bash these days, especially for scripts that CRAN won't run. |
@wjones127 In the end the R solution works everywhere and does not require new dependencies but adds a lot of code and is brittle to changes in I'll defer to your experience with the package and userbase @wjones127 @nealrichardson |
IMO the bash script should be fixed anyway since it's not robust. I suspect that will solve the issue here too, in which case we don't need all of this R code (though I recognize and appreciate the effort). |
Okay, great! @nealrichardson, just to confirm: do you want to go with the suggestion to add Specific replies: I think Just to repeat @assignUser's earlier comment, the use case I have in my head is that the package is downloaded on one machine, then installed on another. For that reason, I was trying not to make too many assumptions about the build capabilities on the downloading machine. But this offline build is a pretty niche demand, and it's probably okay to ask those users to make sure they have bash available when downloading.
Running @assignUser and @wjones127, thanks for the tips! |
There are three categories of failed test here:
I can add code to skip the test if |
I'll take a look tomorrow on my Windows machine. |
@karldw I checked and it looks like rtools40 does not come with |
Yeah maybe we need to download with curl instead? |
What about a different approach? We don't need to use the download script itself to download the files, but we do need to use versions.txt. We could use a slightly different bash script to generate R syntax that would download the files, then use R to download--no wget or curl or other system things required beyond what R already has, just need bash. To illustrate, I patched the existing download script:
in R we can get:
which you could then source. I'm also ok with just fixing the one |
Thanks, that's clever! Since we're only using I also left the |
If any of the folks here want to try this out: git clone --single-branch --branch=fix-15092 --depth 1 git@github.com:karldw/arrow.git
cd arrow/r
make sync-cpp
R CMD build --no-build-vignettes .
R -e 'source("R/install-arrow.R"); create_package_with_all_dependencies(source_file = "arrow_7.0.0.9000.tar.gz")' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this out, and while the tarball built successfully, it did not install because we needed the thirdparty dir as it was before. With the suggested changes, I was able to install successfully.
Not immediately relevant here but in tools/nixlibs.R, there is code in (Feel free to ignore for now or ticket for later, just commenting because I noticed it when trying to debug why it wasn't installing before.) |
One option might be to modify @@ -44,20 +44,25 @@ download_dependency() {
echo 'download.file("'${url}'", "'${out}'", quiet = TRUE)'
}
+print_tar_name() {
+ local url_var=$1
+ local tar_name=$2
+ echo "'${url_var}','${tar_name}'"
+}
+
main() {
- mkdir -p "${DESTDIR}"
# Load `DEPENDENCIES` variable.
source ${SOURCE_DIR}/cpp/thirdparty/versions.txt
+ echo 'env_var,filename'
for ((i = 0; i < ${#DEPENDENCIES[@]}; i++)); do
local dep_packed=${DEPENDENCIES[$i]}
# Unpack each entry of the form "$home_var $tar_out $dep_url"
IFS=" " read -r dep_url_var dep_tar_name dep_url <<< "${dep_packed}"
- local out=${DESTDIR}/${dep_tar_name}
- download_dependency "${dep_url}" "${out}"
+ print_tar_name "${dep_url_var}" "${dep_tar_name}"
done
}
Output
|
Yeah, something like that is probably the right solution. Would you mind making a JIRA for that? I don't think we need to tackle that here, and I want to make sure this fix gets in the upcoming release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Benchmark runs are scheduled for baseline = 20bc63a and contender = c73870a. c73870a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…ne build As Neal mentioned in #12849 (comment), the current code in nixlibs.R doesn't handle URL variable names components that have multiple words (because of the way it parses variable names from filenames). Until now, we've had a special case for the AWS variables, but `ARROW_GOOGLE_CLOUD_CPP_URL` and `ARROW_NLOHMANN_JSON_URL` also need handling. Instead of adding special cases, we can provide the correct `ARROW_*_URL` values with the new bash script added as part of ARROW-15092 (in PR #12849). Please let me know what you think! Closes #12973 from karldw/fix-16297 Lead-authored-by: karldw <karldw@users.noreply.github.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
This PR aims to address ARROW-15092 and ARROW-15608, allowing the function
create_package_with_all_dependencies()
to be run on non-linux systems.To accomplish this, the code parses the versions.txt file in R. This process is a little hairy, because the existing code in versions.txt and download_dependencies.sh depend on shell substitution and array parsing.
In writing this PR, I assumed that only base R functions were available, and the format of versions.txt couldn't be changed to make things easier. I wrote some roxygen-like documentation, but didn't actually want to generate the
*.Rd
files, so just started the lines with#
instead of#'
.I'd be very grateful for feedback here. The first question is whether this approach makes sense at a macro level, and then whether my implementation seems reasonable.