New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13398: [R] Update install.Rmd vignette #11521
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments!
L421 appears to be a duplicate of L417, might be an easy fix if you have it handy still. |
Thanks, good spot, have removed that now. |
c0f8fde
to
40b35c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good a few questions / comments
I think the answer is "we don't want to here" but do we want to discuss CMAKE_UNITY_BUILD
we had it on for a while but it's been blowing up memory on compilation which has led to some issues so I've disabled it (though people won't see that
until 7.0.0 release if they're not using a dev build).
One additional comment, it would be nice if there were a recommended section, either at the point of the initial
I say this anecdotally, but I can't imagine many people using arrow via R without at least parquet and dataset. |
Thanks for the suggestion @1beb. Our intention is to suggest folks use Additionally, the parquet/snappy flags are also now on by default so hopefully regardless of env vars set they should be enabled (well, unless you explicitly override them and set them to no, then they won't...). |
Oh this is great! When I read the article it wasn't immediately apparent that these were on by default and I must have glazed over the part of NOT_CRAN=TRUE. One thing that could be helpful is adding some additional description to each of the ENV var descriptions starting around L415. Answer the question: are these "ON" by default when using |
1d17c94
to
331f33b
Compare
Just marking this as WIP, folks, and converting it to a draft PR as I'm going to make some fairly substantial changes to this. Just want to say, thanks again for all your input so far, @1beb, as you've really helped me work out which bits could be a lot more intuitive here. |
Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
…into ARROW-13398_install.Rmd
|
||
This documented is intended specifically for arrow _developers_ who wish to know | ||
more about these scripts. If you are an arrow _user_ looking for help with | ||
installing arrow, please see [the installation guide](../install.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking better and better on each iteration (which is saying a lot cause it started off very nice!)
A few (mostly small) comments
The actions taken by these scripts to resolve dependencies and install the | ||
correct components are described below. | ||
|
||
There are a number of different ways you can install libarrow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two paragraphs seem a bit disjointed? Should there be a ### libarrow
line between them? Or some other kind of introduction to talking about libarrow (specifically)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree - have rearranged content now to hopefully flow better
The diagram below shows how the R package finds a libarrow installation on Windows. | ||
|
||
```{r, echo=FALSE, out.width="70%"} | ||
knitr::include_graphics("../img/install_diagram_windows.png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno why, but the diagrams didn't seem to get copied over when I ran pkgdown. I might have something wrong, but we should check it on the dev site when it merges (or double check it on someone else's machine now?)
Super minor: the windows diagram has a transparent background and the linux one has a white background. In most cases transparent would be better, but since the only other color is black, I think we actually want a white background for these (or maybe whatever the background color of the site is to fake transparency?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be to do with me putting them in a folder; I've now promoted them to the same level as the .Rmd file.
Good catch on the background - will update them all.
# libarrow dependencies | ||
|
||
You'll need to install `libparquet-dev` on Debian and Ubuntu, or `parquet-devel` on CentOS. | ||
This will also automatically install libarrow as a dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. Maybe we could include the openssl dependencies here too as "general system dependencies"? I think it's totally ok to duplicate in this case
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
…g, and make it flow better Merge branch 'ARROW-13398_install.Rmd' of github.com:thisisnic/arrow into ARROW-13398_install.Rmd # Conflicts: # r/vignettes/developers/install_details.Rmd
r/vignettes/install.Rmd
Outdated
When you install libarrow, its dependencies will be automatically downloaded. | ||
The environment variable `ARROW_DEPENDENCY_SOURCE` controls whether the libarrow | ||
installation also downloads or installs all dependencies (when set to `BUNDLED`), | ||
uses only system-installed dependencies (when set to `SYSTEM`) or checks | ||
system-installed dependencies first and only installs dependencies which aren't | ||
already present (when set to `AUTO`). | ||
|
||
These dependencies vary by platform; however, if you wish to install these | ||
yourself prior to libarrow installation, we recommend that you take a look at | ||
the [docker file for whichever of our CI builds](https://github.com/apache/arrow/tree/master/ci/docker) | ||
(the ones ending in "cpp" are for building Arrow's C++ libaries aka libarrow) | ||
corresponds most closely to your setup. This will contain the most up-to-date | ||
information about dependencies and minimum versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section added in response to #11881 (review)
I deleted the section in the end as this information is no longer true, and have an alternative dependency section now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This continues to be awesome — a few comments
r/vignettes/install.Rmd
Outdated
|
||
When compiling libarrow from source, you have the power to really fine-tune | ||
which features to install. You can set the environment variable | ||
`LIBARROW_MINIMAL` to enable a more full-featured build including S3 support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have something like LIBARROW_MINIMAL=FALSE
here since it's a little bit odd with this one variable that FALSE
is the one that someone wants and not TRUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
r/vignettes/install.Rmd
Outdated
``` | ||
|
||
By default this variable is unset; if set to `TRUE` a trimmed-down version of | ||
arrow is installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention that this is almost never needed? I doubt people will be confused but it might forestall questions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mentioned in the more expanded section so I've added a half-sentence about disabling features to make it clear what it does.
r/vignettes/install.Rmd
Outdated
|
||
Building libarrow from source requires more time and resources than installing | ||
a binary. We recommend that you set the environment variable `ARROW_R_DEV` to | ||
`TRUE` for more verbose output during the installation process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor / matter of opinion: Should we add "... if anything goes wrong."? It seems _ a little bit weird_ to recommend something that is not the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree, updated
r/vignettes/install.Rmd
Outdated
* Install the `arrow` package as usual. | ||
* Install the arrow package as usual. | ||
|
||
# Installing libarrow dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section should probably be further up. Especially the S3 dependencies are needed if you use a binary or not, so we should flag those up there before going into more detail about building from source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just sent a commit that reshuffles and renames a few of the sections. I'm trying to balance "having the dependency descriptions not be too buried" and "get a command that should work reasonably well very early on". Feel free to totally revert my change if you don't like splitting the release install into "the easy way" and "the less easy way". This seems the best to me right now, but I think we (I) might be splitting hairs right now and there is no perfect / we should just go ahead and merge it.
I love dividing it into "the easy way" and "the less easy way" and agree with your point on dependencies, though I'm not 100% sure about not having the different install methods back-to-back in the contents. However I am definitely splitting hairs at this point, and would rather get this merged and have my glass of wine - we can always adjust this content again at a later date depending on user feedback. |
Benchmark runs are scheduled for baseline = b9ac245 and contender = d3763a3. d3763a3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Benchmark runs are scheduled for baseline = b9ac245 and contender = d3763a3. d3763a3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
No description provided.