Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13403: [R] Update developing.Rmd vignette #10930

Closed

Conversation

thisisnic
Copy link
Member

No description provided.

@thisisnic thisisnic marked this pull request as draft August 13, 2021 12:21
@github-actions
Copy link

@thisisnic thisisnic marked this pull request as ready for review August 16, 2021 11:09
@thisisnic
Copy link
Member Author

CI failure is due to wider CI issues around a new Debian release, rather than a problem with the docs.

@thisisnic
Copy link
Member Author

The main changes in this PR were:

  • removing some redundant content to bring the vignette length down a bit
  • re-ordering the content, and adding extra headings/subheadings to make it easier to find the relevant sections and signpost what content is where
  • move some of the detail about the internals of the configure script and other installation scripts to install.Rmd
  • renaming headings to make it easier to browse the vignette
  • add in a warning about Windows development#
  • change "we" to "you" where appropriate

@thisisnic thisisnic force-pushed the ARROW-13403_developing_vignette branch from 73ce7f2 to fd45fd7 Compare August 23, 2021 20:00
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of this polishing. I think there are really great improvements here. I have a few comments + suggestions, the biggest one is if we do want to remove the tabbing from the developing vignette.

output:
html_document:
toc: true
toc_depth: 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if there's a general movement away from rmarkdown::html_vignette? I've always used rmarkdown::html_vignette because I thought it was streamlined for inclusion in the package (and pkgdown overrides this stuff anyway for the web docs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I chucked this in so it made it easier to preview the list of contents when I was rendering this as RMarkdown

vignette: >
%\VignetteIndexEntry{Arrow R Developer Guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit picky: is this new line needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove

r/vignettes/developing.Rmd Outdated Show resolved Hide resolved
2. Building the Arrow library — this actually compiles the Arrow library
3. Install the Arrow library — this organizes and moves the compiled Arrow library files into the location specified in the configuration
4. Building the R package — this builds the C++ code in the R package, and installs the R package for you
There are five major steps to the process — the first four are relevant to all Arrow developers, and the last one is specific to developers making changes to the R package.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very minor point, so we might want to ignore it totally / gloss over it here, but technically "the first four are relevant to all Arrow developers" isn't quite true. There are (small) differences in building the cpp code between the languages that use it (for example the flags recommended for python development are slightly different — we could in principle recommend that folks use a union of the flags and then they would be able to do either, though compilation would be longer, so I'm not sure we want to do that). Additionally, other languages, like Rust, don't rely on the cpp code at all AFAIK, so for those arrow developers this isn't super relevant.

All of that said, it might be best to leave it as is, or say something like "are similar to steps that Arrow developers working on other languages use" and leave it be. The details above in this comment are definitely not relevant for someone building Arrow for R development for the first time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for highlighting this, I'll come up with a different way of phrasing this to be more accurate!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just deleted it as it doesn't really add anything.


### Install dependencies {.tabset}
### Step 1 - Install dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to get rid of {.tabset} here? It's a bit funky how it works (and is very much non-standard/hacked on to pkgdown), but it gives us a nice tabbed interface for interacting with the setup steps (e.g. https://arrow.apache.org/docs/r/articles/developing.html#install-dependencies).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my intention was to make it really easy to navigate through the page, using the table of contents that appears at the side, and the heading levels, to signpost what bits applies to what.

I could add it back in though - will see how including it affects the contents list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to keep the tab titles in the table of contents, it's possible we could do that by changing some of the javascript that we manipulate to make this whole thing work. Something funny that we do there is that we load the javascript for sticky tabs, and then alter some of the functions to do the things we need them to do in-place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another think and actually, the tabset stuff is totally fine. When I built the docs and previewed them, I realised it doesn't actually matter to have those bits in the tables of contents. Good to know there are options there though

r/vignettes/developing.Rmd Outdated Show resolved Hide resolved
r/vignettes/developing.Rmd Outdated Show resolved Hide resolved
``` shell
R CMD build .
R CMD check arrow_*.tar.gz --as-cran
If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#configure-for-installing-to-a-user-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#configure-for-installing-to-a-user-directory) and you already have a previous installation of libarrow in a system directory or you get you may get errors like the following when you install the R package:
If rebuilding the Arrow library doesn't work and you are [installing from a user-level directory](#configure-for-installing-to-a-user-directory) and you already have a previous installation of libarrow in a system directory or you get errors like the following when you install the R package:

I'm not totally sure this is what you intended, but I think it is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, I think I just moved this section from elsewhere rather than rewriting it and had just skim-read it so didn't catch this. I'll rewrite the whole sentence to break it down a little.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have now rephrased this section a little.


## Other installation issues

There are a number of scripts that are triggered when the arrow R package is installed. For package users who are not interacting with the underlying code, these should all just work without configuration and pull in the most complete pieces (e.g. official binaries that we host). However, knowing about these scripts can help package developers troubleshoot if things go wrong in them or things go wrong in an install. See [the installation vignette](./install.html) for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could/should we link to the How dependencies are resolved section in the install vignette here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -123,6 +123,39 @@ you'll need to reinstall the package in order to enable S3 support.

# How dependencies are resolved

There are a number of scripts that are triggered when `R CMD INSTALL .` is run.
For Arrow users, these should all just work without configuration and pull in
the most complete pieces (e.g. official binaries that we host).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be good to flag that most of this information is really only intended to help developers and not day-to-day installers of Arrow? I don't mind (and actually kind of like!) that it's in the install vignette as opposed to the developing vignette, but most of the information is still probably targets at (and most relevant to) developers and not standard installers of Arrow.

There's also a reference at the end that says "(If you're looking at this script, and you've gotten this far, it might look
incredibly familiar: it's basically the contents of this guide in script form —
with a few important changes)" which we probably should link over to the developing vignette there.

I don't think we want to take out some of the content just because it's not relevant to installers (since it's useful info and we don't want to have two different largely overlapping sections in both vignettes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will have a look at doing a better job of signposting who this is for and change up that later sentence!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

thisisnic and others added 3 commits August 25, 2021 08:22
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
@thisisnic
Copy link
Member Author

thisisnic commented Aug 25, 2021

When changing the R package C++, I've again run into the issue that something needs editing to make sure that the lint script uses Python 3 (I think, can't remember the specific cause). I'm going to add something to this PR which covers that (or open a ticket to do something to the script) as I won't be the only person to run into this. (Now have submitted a PR for this: #11002)

@thisisnic thisisnic force-pushed the ARROW-13403_developing_vignette branch from 3b34e88 to 95aacf7 Compare August 26, 2021 08:03
@thisisnic thisisnic force-pushed the ARROW-13403_developing_vignette branch from daf3278 to 3cc6df7 Compare August 26, 2021 08:16
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I built it locally with pkgdown and re-reviewed it there. I caught a few typos (none of them introduced in this PR!) that I suggested corrections for.

* "Apache Arrow" or "Arrow" - the Apache Arrow project, including implementations in different languages
* "libarrow" or "the Arrow C++ library" - Arrow's C++ library (N.B. this term might not be used in documentation in other parts of the Arrow project)
* "arrow" or "the Arrow R package" - the R package
* `arrow` - a directory into which the Arrow GitHub repo has been checked out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more polish than anything else, but I wonder if this style section would be better as a footnote? It's really good info, but it feels a bit of a sidestep right at the beginning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - whilst writing this down was useful while I was working on the vignette, now that terminology is consistent, it's a little redundant.

I'm not 100% on board with the info as a footnote as I think it's important to have it up front still, so what I think will satisfy all requirements here is just explain what libarrow is, in parentheses, after the first mention of the term, and then leave everything else to be implicit.

The main point of doing this was the libarrow vs. arrow package distinction, so I think we've ticked those boxes now - though happy to adjust again if you think further changes are needed.

r/vignettes/developing.Rmd Outdated Show resolved Hide resolved
r/vignettes/developing.Rmd Outdated Show resolved Hide resolved
thisisnic and others added 5 commits September 2, 2021 11:42
@jonkeane jonkeane closed this in 5876e3f Sep 6, 2021
zeroshade pushed a commit to zeroshade/arrow that referenced this pull request Sep 12, 2021
Closes apache#10930 from thisisnic/ARROW-13403_developing_vignette

Lead-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
Closes apache#10930 from thisisnic/ARROW-13403_developing_vignette

Lead-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants