Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13398: [R] Update install.Rmd vignette #11521

Closed
wants to merge 48 commits into from

Conversation

thisisnic
Copy link
Member

No description provided.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@thisisnic thisisnic marked this pull request as ready for review November 17, 2021 15:55
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments!

r/vignettes/install.Rmd Show resolved Hide resolved
r/vignettes/install.Rmd Outdated Show resolved Hide resolved
@1beb
Copy link

1beb commented Nov 19, 2021

L421 appears to be a duplicate of L417, might be an easy fix if you have it handy still.

@thisisnic
Copy link
Member Author

L421 appears to be a duplicate of L417, might be an easy fix if you have it handy still.

Thanks, good spot, have removed that now.

@thisisnic thisisnic force-pushed the ARROW-13398_install.Rmd branch 2 times, most recently from c0f8fde to 40b35c2 Compare November 22, 2021 20:54
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good a few questions / comments

I think the answer is "we don't want to here" but do we want to discuss CMAKE_UNITY_BUILD we had it on for a while but it's been blowing up memory on compilation which has led to some issues so I've disabled it (though people won't see that
until 7.0.0 release if they're not using a dev build).

r/vignettes/install.Rmd Show resolved Hide resolved
r/vignettes/install.Rmd Show resolved Hide resolved
@1beb
Copy link

1beb commented Nov 23, 2021

One additional comment, it would be nice if there were a recommended section, either at the point of the initial install.packages() or shortly after. Perhaps also consider that the next vignette, you at the very least need ARROW_S3 set to "ON" when you install. I don't recall if there are others required for the vignettes but I think something basic like the following would cover a lot of bases:

Sys.setenv(
  "ARROW_S3" = "ON",
  "ARROW_PARQUET" = "ON",
  "ARROW_DATASET" = "ON",
  "ARROW_WITH_SNAPPY" = "ON"
)

install.packages("arrow")

I say this anecdotally, but I can't imagine many people using arrow via R without at least parquet and dataset.

@jonkeane
Copy link
Member

Thanks for the suggestion @1beb. Our intention is to suggest folks use NOT_CRAN=TRUE which should set all of those to on (along with a few other niceties like downloading binaries). Hopefully that helps get full builds with minimal hassle (only needing to remember one env var) Have you experienced that not working? If so, we would love to hear about it so we can figure out what happened + fix it.

Additionally, the parquet/snappy flags are also now on by default so hopefully regardless of env vars set they should be enabled (well, unless you explicitly override them and set them to no, then they won't...).

@1beb
Copy link

1beb commented Nov 24, 2021

Oh this is great! When I read the article it wasn't immediately apparent that these were on by default and I must have glazed over the part of NOT_CRAN=TRUE. One thing that could be helpful is adding some additional description to each of the ENV var descriptions starting around L415. Answer the question: are these "ON" by default when using install.packages()

@thisisnic thisisnic changed the title ARROW-13398: [R] Update install.Rmd vignette ARROW-13398: [R] Update install.Rmd vignette [WIP] Nov 29, 2021
@thisisnic thisisnic marked this pull request as draft November 29, 2021 14:28
@thisisnic
Copy link
Member Author

thisisnic commented Nov 29, 2021

Just marking this as WIP, folks, and converting it to a draft PR as I'm going to make some fairly substantial changes to this.

Just want to say, thanks again for all your input so far, @1beb, as you've really helped me work out which bits could be a lot more intuitive here.


This documented is intended specifically for arrow _developers_ who wish to know
more about these scripts. If you are an arrow _user_ looking for help with
installing arrow, please see [the installation guide](../install.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking better and better on each iteration (which is saying a lot cause it started off very nice!)

A few (mostly small) comments

r/vignettes/developers/install_details.Rmd Show resolved Hide resolved
The actions taken by these scripts to resolve dependencies and install the
correct components are described below.

There are a number of different ways you can install libarrow:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two paragraphs seem a bit disjointed? Should there be a ### libarrow line between them? Or some other kind of introduction to talking about libarrow (specifically)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree - have rearranged content now to hopefully flow better

The diagram below shows how the R package finds a libarrow installation on Windows.

```{r, echo=FALSE, out.width="70%"}
knitr::include_graphics("../img/install_diagram_windows.png")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno why, but the diagrams didn't seem to get copied over when I ran pkgdown. I might have something wrong, but we should check it on the dev site when it merges (or double check it on someone else's machine now?)

Super minor: the windows diagram has a transparent background and the linux one has a white background. In most cases transparent would be better, but since the only other color is black, I think we actually want a white background for these (or maybe whatever the background color of the site is to fake transparency?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be to do with me putting them in a folder; I've now promoted them to the same level as the .Rmd file.

Good catch on the background - will update them all.

Comment on lines 104 to 107
# libarrow dependencies

You'll need to install `libparquet-dev` on Debian and Ubuntu, or `parquet-devel` on CentOS.
This will also automatically install libarrow as a dependency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. Maybe we could include the openssl dependencies here too as "general system dependencies"? I think it's totally ok to duplicate in this case

thisisnic and others added 3 commits December 6, 2021 13:12
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
…g, and make it flow better

Merge branch 'ARROW-13398_install.Rmd' of github.com:thisisnic/arrow into ARROW-13398_install.Rmd

# Conflicts:
#	r/vignettes/developers/install_details.Rmd
Comment on lines 336 to 348
When you install libarrow, its dependencies will be automatically downloaded.
The environment variable `ARROW_DEPENDENCY_SOURCE` controls whether the libarrow
installation also downloads or installs all dependencies (when set to `BUNDLED`),
uses only system-installed dependencies (when set to `SYSTEM`) or checks
system-installed dependencies first and only installs dependencies which aren't
already present (when set to `AUTO`).

These dependencies vary by platform; however, if you wish to install these
yourself prior to libarrow installation, we recommend that you take a look at
the [docker file for whichever of our CI builds](https://github.com/apache/arrow/tree/master/ci/docker)
(the ones ending in "cpp" are for building Arrow's C++ libaries aka libarrow)
corresponds most closely to your setup. This will contain the most up-to-date
information about dependencies and minimum versions.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section added in response to #11881 (review)

@thisisnic
Copy link
Member Author

Yeah I agree. Maybe we could include the openssl dependencies here too as "general system dependencies"? I think it's totally ok to duplicate in this case

I deleted the section in the end as this information is no longer true, and have an alternative dependency section now.

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This continues to be awesome — a few comments


When compiling libarrow from source, you have the power to really fine-tune
which features to install. You can set the environment variable
`LIBARROW_MINIMAL` to enable a more full-featured build including S3 support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have something like LIBARROW_MINIMAL=FALSE here since it's a little bit odd with this one variable that FALSE is the one that someone wants and not TRUE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

```

By default this variable is unset; if set to `TRUE` a trimmed-down version of
arrow is installed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that this is almost never needed? I doubt people will be confused but it might forestall questions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mentioned in the more expanded section so I've added a half-sentence about disabling features to make it clear what it does.


Building libarrow from source requires more time and resources than installing
a binary. We recommend that you set the environment variable `ARROW_R_DEV` to
`TRUE` for more verbose output during the installation process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor / matter of opinion: Should we add "... if anything goes wrong."? It seems _ a little bit weird_ to recommend something that is not the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree, updated

* Install the `arrow` package as usual.
* Install the arrow package as usual.

# Installing libarrow dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section should probably be further up. Especially the S3 dependencies are needed if you use a binary or not, so we should flag those up there before going into more detail about building from source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the top

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just sent a commit that reshuffles and renames a few of the sections. I'm trying to balance "having the dependency descriptions not be too buried" and "get a command that should work reasonably well very early on". Feel free to totally revert my change if you don't like splitting the release install into "the easy way" and "the less easy way". This seems the best to me right now, but I think we (I) might be splitting hairs right now and there is no perfect / we should just go ahead and merge it.

@thisisnic
Copy link
Member Author

I just sent a commit that reshuffles and renames a few of the sections. I'm trying to balance "having the dependency descriptions not be too buried" and "get a command that should work reasonably well very early on". Feel free to totally revert my change if you don't like splitting the release install into "the easy way" and "the less easy way". This seems the best to me right now, but I think we (I) might be splitting hairs right now and there is no perfect / we should just go ahead and merge it.

I love dividing it into "the easy way" and "the less easy way" and agree with your point on dependencies, though I'm not 100% sure about not having the different install methods back-to-back in the contents. However I am definitely splitting hairs at this point, and would rather get this merged and have my glass of wine - we can always adjust this content again at a later date depending on user feedback.

@thisisnic thisisnic closed this in d3763a3 Dec 7, 2021
@ursabot
Copy link

ursabot commented Dec 7, 2021

Benchmark runs are scheduled for baseline = b9ac245 and contender = d3763a3. d3763a3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Scheduled] ursa-i9-9960x
[Scheduled] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Dec 8, 2021

Benchmark runs are scheduled for baseline = b9ac245 and contender = d3763a3. d3763a3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.22% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants