Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Package Arrow 11.0.0 for R/CRAN #33702

Closed
42 tasks done
paleolimbot opened this issue Jan 16, 2023 · 39 comments
Closed
42 tasks done

[R] Package Arrow 11.0.0 for R/CRAN #33702

paleolimbot opened this issue Jan 16, 2023 · 39 comments

Comments

@paleolimbot
Copy link
Member

paleolimbot commented Jan 16, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Packaging checklist for CRAN release

For a high-level overview of the release process see the
Apache Arrow Release Management Guide.

Before the release candidate is cut:

  • Create a GitHub issue
    entitled [R] CRAN packaging checklist for version X.X.X
    and copy this checklist to the issue.
  • Evaluate the status of any failing
    nightly tests and nightly packaging builds. These checks
    replicate most of the checks that CRAN runs, so we need them all to be passing
    or to understand that the failures may (though won't necessarily) result in a rejection from CRAN.
  • Check current CRAN check results
  • Ensure the contents of the README are accurate and up to date
  • Run urlchecker::url_check() on the R directory at the release candidate
    commit. Ignore any errors with badges as they will be removed in the CRAN release branch.
  • Polish NEWS but do not update version numbers (this is done automatically later).
  • For major releases, prepare tweet thread highlighting new features

Wait for the release candidate to be cut:

Make pull requests into the autobrew and
rtools-packages repositories
used by the configure script on MacOS and Windows. These pull requests will
use the release candidate as the source.

Prepare and check the .tar.gz that will be released to CRAN.

  • git fetch upstream && git checkout release-X.X.X-rcXX && git clean -f -d
  • Run make build. This copies Arrow C++ into tools/cpp, prunes some
    unnecessary components, and runs R CMD build to generate the source tarball.
    Because this will install the package, you will need to ensure that the version
    of Arrow C++ available to the configure script is the same as the version
    that is vendored into the R package (e.g., you may need to unset ARROW_HOME).
  • devtools::check_built("arrow_X.X.X.tar.gz") locally
  • Run reverse dependency checks. Currently this is a
    manual process;
    however, in the future it may be a crossbow nightly job.

Wait for the official release...

  • Release vote passed!
  • If the release candidate commit updated, rebase the CRAN release branch
    on that commit.
  • Pick any commits that were made to master since the release commit that
    were needed to fix CRAN-related submission issues identified in the above
    steps.
  • Remove badges from README.md
  • Run urlchecker::url_check() on the R directory
  • Create a PR entitled WIP: [R] Verify CRAN release-10.0.1-rc0. Add
    a comment @github-actions crossbow submit --group r to run all R crossbow
    jobs against the CRAN-specific release branch.
  • Regenerate arrow_X.X.X.tar.gz (i.e., make build)

Create new autobrew and r-windows PRs such that they use the release
instead of the release candidate; ensure linux binary packages are available:

  • PR into autobrew/homebrew-core (apache-arrow autobrew formula)
  • PR into autobrew/homebrew-core (apache-arrow-static autobrew formula)
  • PR into autobrew/scripts
  • PR into r-windows/rtools-packages
  • Ensure linux binaries are available in the artifactory:
    https://apache.jfrog.io/ui/repos/tree/General/arrow/r

Check binary Arrow C++ distributions specific to the R package:

  • Upload the .tar.gz to win-builder (r-devel only)
    and confirm (with Nic, who will automatically receive an email about the results) that the check is clean.
    This step cannot be completed before Jeroen has put the binaries in the MinGW repository, i.e. here, here, and here.
  • Upload the .tar.gz to MacBuilder
    and confirm that the check is clean
  • Check install.packages("arrow_X.X.X.tar.gz") on Ubuntu and ensure that the
    hosted binaries are used
  • devtools::check_built("arrow_X.X.X.tar.gz") locally one more time (for luck)

Submit!

  • Upload arrow_X.X.X.tar.gz to the
    CRAN submit page
  • Confirm the submission email

Wait for CRAN...

  • Accepted!
  • Tag the tip of the CRAN-specific release branch
  • Add a new line to the matrix in the backwards compatability job
  • (patch releases only) Update the package version in ci/scripts/PKGBUILD, dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb, r/DESCRIPTION, and r/NEWS.md
  • Update the packaging checklist template to reflect any new realities of the
    packaging process.
  • Wait for CRAN-hosted binaries on the
    CRAN package page to reflect the
    new version
  • Tweet!

Component(s)

R

@paleolimbot
Copy link
Member Author

@thisisnic @nealrichardson

@paleolimbot
Copy link
Member Author

Two nightly builds are failing: test-r-linux-valgrind (#33094) and test-r-rhub-debian-gcc-devel-lto-latest (#33701). The valgrind failures won't be solved for this release; however, I've run them locally with archery to make sure they are the same as they were before the last release (and they appear to be). I haven't been able to replicate those errors outside of that docker image and we haven't had any valgrind issues reported by CRAN. I'm working to see if the LTO error is transient or cache-related.

@paleolimbot
Copy link
Member Author

The current CRAN check status indicates a few issues. I checked to make sure the package would compile under gcc-13 (unreleased, requires building gcc from source) and I wasn't able to get any build errors (#33635). The warning for suppressed diagnostics may be a problem; however, we can't solve it without actively attempting to circumvent a CRAN policy which would be bad (I think): #33637.

@paleolimbot
Copy link
Member Author

README/urlchecker fix is (#33705 / #33706)

@paleolimbot
Copy link
Member Author

The link-time optimization nightly failure is probably related to our CI config...I'll continue to investigate and if there's a fix that's needed for CRAN we can pick it into the packaging branch.

@kou kou changed the title Package Arrow 11.0.0 for R/CRAN [R] Package Arrow 11.0.0 for R/CRAN Jan 17, 2023
@nealrichardson
Copy link
Member

gcc13 no longer shows up on https://cran.r-project.org/web/checks/check_results_arrow.html so we're good there. Just have to deal with clang 16 now :/

@paleolimbot
Copy link
Member Author

Is there an Arrow issue for clang16 yet? I didn't get to it today but I'm sure I can get a docker image together on Monday with clang16 and R.

@nealrichardson
Copy link
Member

Is there an Arrow issue for clang16 yet? I didn't get to it today but I'm sure I can get a docker image together on Monday with clang16 and R.

I haven't made one yet, just raised it on Zulip and the ML. I think @jonkeane tried to reproduce yesterday so he may have a dockerfile started.

@paleolimbot
Copy link
Member Author

I made #33819 to track. LLVM provides a Dockerfile for a source build on Debian 10 so we should be able to reproduce.

@nealrichardson
Copy link
Member

Re: maintainer change, CRAN policy says "Explain any change in the maintainer’s email address and if possible send confirmation from the previous address (by a separate email to CRAN-submissions@R-project.org) or explain why it is not possible."

I will do this at the same time as the submission.

@paleolimbot
Copy link
Member Author

paleolimbot commented Jan 26, 2023

Potential list of things we need to pick into the CRAN-specific release branch:

Neal mentioned removing the badges from the readme (but that's already on the checklist 🙂 )

@thisisnic
Copy link
Member

@paleolimbot As discussed separately, mind just clarifying what we need to do to proceed here?

@paleolimbot
Copy link
Member Author

The "just update boost" fix turned out to be more complicated than I expected and is a poor candidate for this point in the release cycle. The macOS install timeout thing is only an issue if when you run crossbow submit --group r you see that job fail (and re-running doesn't fix); the Boost fix is only an issue if our CRAN submission gets rejected.

That leaves the LTO fix. I think the best way to proceed would be to pick the LTO fix, remove the badges, and carry on until we have a reason to pick anything else.

@thisisnic
Copy link
Member

@paleolimbot Would you mind running the revdepchecks? I remember last time I spent ages and didn't even get it done, due to some weird quirk in the difference between our respective setups, and we did discuss moving it to a Crossbow job at some point but haven't yet.

@thisisnic
Copy link
Member

Builth the package locally and I got the output below, so nothing unexpected there :

── R CMD check results ──────────────────────────────────────────────────── arrow 11.0.0 ────
Duration: 7m 20.2s

❯ checking top-level files ... WARNING
  A complete check needs the 'checkbashisms' script.
  See section ‘Configure and cleanup’ in the ‘Writing R Extensions’
  manual.

❯ checking compilation flags used ... WARNING
  Compilation used the following non-portable flag(s):
    ‘-Wno-noexcept-type’ ‘-msse4.2’
  including flag(s) suppressing warnings

❯ checking installed package size ... NOTE
    installed size is 84.3Mb
    sub-directories of 1Mb or more:
      R      4.2Mb
      libs  79.5Mb

@paleolimbot
Copy link
Member Author

The first one you can probably solve by doing apt-get install checkbashisms (or brew install checkbashisms, or maybe something else if you're not on ubuntu or MacOS).

The second one is showing up because you have ARROW_R_DEV=true...you can maybe unset it to get a more realistic check of what CRAN will see.

The third one always shows up for us...we just have a really big shared library.

@thisisnic
Copy link
Member

Yep, re-ran after doing exactly that and the warnings disappeared.

@nealrichardson
Copy link
Member

The "just update boost" fix turned out to be more complicated than I expected

How so? I don't see problems mentioned on #33890

Up to y'all (ultimately Nic, since they'll be the one receiving the CRAN emails) if you want to submit without the boost upgrade. std::unary_function, what BDR suggested was the problem for some, only appears in one place I can find in the boost we use, and it is conditional on a check for its existence: https://github.com/boostorg/container_hash/blob/171c012d4723c5e93cc7cffe42919afdf8b27dfa/include/boost/container_hash/hash.hpp#L131

It's reasonable to submit without the upgrade, and if it still fails, respond that we ran on latest clang16 before submission and it passed so we believed the issue to be resolved. Downside is that I doubt we would get any extra time to fix, so I think it will still need to be resolved by Feb 16.

@paleolimbot
Copy link
Member Author

See #33890 (comment) ...something is missing in the bundle I created via the trim-boost.sh script. If you have a workflow for debugging what it might be I'd love to know before I have to invent one!

@thisisnic
Copy link
Member

That fix needs completing and merging at some point anyway, and if it causes us to fail CRAN checks we'd need to submit it again with the change made, so it seems like the only real consequence of waiting for it to be done ahead of the CRAN release (given we're not 100% sure it'll cause a failure) is delaying the initial submission and adding more to @paleolimbot's to-do list in the short-term? If that's the case, I'm happy to submit to CRAN without it, if only so that we know sooner whether it matters or not. If any of my assumptions above don't hold though, let's reconsider.

@thisisnic
Copy link
Member

On a different note, we still have the issue of the URL check failing as there is a URL in our docs which currently exists on the dev docs but not the live docs yet. Is there anything to do about this other than wait for the arrow-site update before submitting to CRAN?

@nealrichardson
Copy link
Member

Which link? Is it something that should be a relative URL? Otherwise sure, we can wait for the 11.0.0 docs to be published.

@nealrichardson
Copy link
Member

FTR the docs update PR is apache/arrow-site#308

@thisisnic
Copy link
Member

thisisnic commented Jan 30, 2023

Thanks for linking to that, I didn't realise that that had just been done.

Broken link is one of the new vignettes after the big docs refactor.

> urlchecker::url_check()
✖ Error: README.md:83:101 404: Not Found
need to install nightly builds, but if you do please see the article on [installing nightly builds](https://arrow.apache.org/docs/r/articles/install_nightly.html) for more information.

@thisisnic
Copy link
Member

Oh, I see, a relative URL would fix that.

@thisisnic
Copy link
Member

Actually...would it? It's in the README

@assignUser
Copy link
Member

I think that is only 404'ing because the website is not updated for 11.0.0 yet: https://arrow.apache.org/docs/dev/r/articles/install_nightly.html

@nealrichardson
Copy link
Member

pkgdown would handle the relative URL, but it would be a broken link in the GitHub web version, so yeah, let's merge the arrow-site PR first.

@nealrichardson
Copy link
Member

Looks like the docs PR has merged.

@thisisnic
Copy link
Member

Great; I'm just waiting for the crossbow builds to run now + Dewey is doing revdepchecks, so after that it'll just be the Linux binary to upload + winbuilder/macbuilder checks and then off to CRAN.

@nealrichardson
Copy link
Member

👍 sounds good, I'll email about the maintainer change now.

@thisisnic
Copy link
Member

The macOS install timeout thing is only an issue if when you run crossbow submit --group r you see that job fail (and re-running doesn't fix)

It did fail; re-running now 🤞

@paleolimbot
Copy link
Member Author

Reverse dependencies are clear, both via archery and via my hacked-together local thing. Output from the hacked together local thing below...we have 4 more revdeps since the last version!

Changes

Check Summary

CRAN

# A tibble: 32 x 5
   label pkg               notes     warnings  errors   
   <chr> <chr>             <list>    <list>    <list>   
 1 CRAN  analogsea         <chr [0]> <chr [0]> <chr [0]>
 2 CRAN  arkdb             <chr [0]> <chr [0]> <chr [0]>
 3 CRAN  CDMConnector      <chr [0]> <chr [0]> <chr [0]>
 4 CRAN  ClickHouseHTTP    <chr [0]> <chr [0]> <chr [0]>
 5 CRAN  dataversionr      <chr [0]> <chr [0]> <chr [0]>
 6 CRAN  diffdfs           <chr [0]> <chr [0]> <chr [0]>
 7 CRAN  disk.frame        <chr [0]> <chr [0]> <chr [1]>
 8 CRAN  duckdb            <chr [1]> <chr [0]> <chr [0]>
 9 CRAN  fastverse         <chr [1]> <chr [0]> <chr [0]>
10 CRAN  gbifdb            <chr [0]> <chr [0]> <chr [0]>
11 CRAN  MolgenisArmadillo <chr [0]> <chr [0]> <chr [0]>
12 CRAN  mrgsim.parallel   <chr [0]> <chr [0]> <chr [0]>
13 CRAN  nflreadr          <chr [0]> <chr [0]> <chr [0]>
14 CRAN  noctua            <chr [0]> <chr [0]> <chr [0]>
15 CRAN  opencpu           <NULL>    <NULL>    <NULL>   
16 CRAN  parqr             <chr [0]> <chr [0]> <chr [0]>
17 CRAN  parquetize        <chr [1]> <chr [0]> <chr [0]>
18 CRAN  pins              <chr [0]> <chr [0]> <chr [0]>
19 CRAN  plumber           <chr [0]> <chr [0]> <chr [0]>
20 CRAN  pointblank        <chr [1]> <chr [0]> <chr [0]>
21 CRAN  RAthena           <chr [0]> <chr [0]> <chr [0]>
22 CRAN  raveio            <chr [0]> <chr [0]> <chr [0]>
23 CRAN  receptiviti       <chr [0]> <chr [0]> <chr [1]>
24 CRAN  rio               <chr [0]> <chr [0]> <chr [0]>
25 CRAN  sfarrow           <chr [0]> <chr [0]> <chr [0]>
26 CRAN  sparklyr          <chr [1]> <chr [0]> <chr [0]>
27 CRAN  starvz            <chr [0]> <chr [0]> <chr [0]>
28 CRAN  strand            <chr [1]> <chr [0]> <chr [0]>
29 CRAN  targets           <chr [0]> <chr [0]> <chr [0]>
30 CRAN  tidyquery         <chr [0]> <chr [0]> <chr [0]>
31 CRAN  tradestatistics   <chr [0]> <chr [0]> <chr [0]>
32 CRAN  vetiver           <chr [0]> <chr [0]> <chr [0]>

Local

# A tibble: 32 x 5
   label pkg               notes     warnings  errors   
   <chr> <chr>             <list>    <list>    <list>   
 1 local analogsea         <chr [0]> <chr [0]> <chr [0]>
 2 local arkdb             <chr [0]> <chr [0]> <chr [0]>
 3 local CDMConnector      <chr [0]> <chr [0]> <chr [0]>
 4 local ClickHouseHTTP    <chr [0]> <chr [0]> <chr [0]>
 5 local dataversionr      <chr [0]> <chr [0]> <chr [0]>
 6 local diffdfs           <chr [0]> <chr [0]> <chr [0]>
 7 local disk.frame        <chr [0]> <chr [0]> <chr [1]>
 8 local duckdb            <chr [1]> <chr [0]> <chr [0]>
 9 local fastverse         <chr [1]> <chr [0]> <chr [0]>
10 local gbifdb            <chr [0]> <chr [0]> <chr [0]>
11 local MolgenisArmadillo <chr [0]> <chr [0]> <chr [0]>
12 local mrgsim.parallel   <chr [0]> <chr [0]> <chr [0]>
13 local nflreadr          <chr [0]> <chr [0]> <chr [0]>
14 local noctua            <chr [0]> <chr [0]> <chr [0]>
15 local opencpu           <NULL>    <NULL>    <NULL>   
16 local parqr             <chr [0]> <chr [0]> <chr [0]>
17 local parquetize        <chr [1]> <chr [0]> <chr [0]>
18 local pins              <chr [0]> <chr [0]> <chr [0]>
19 local plumber           <chr [0]> <chr [0]> <chr [0]>
20 local pointblank        <chr [1]> <chr [0]> <chr [0]>
21 local RAthena           <chr [0]> <chr [0]> <chr [0]>
22 local raveio            <chr [0]> <chr [0]> <chr [0]>
23 local receptiviti       <chr [0]> <chr [0]> <chr [1]>
24 local rio               <chr [0]> <chr [0]> <chr [0]>
25 local sfarrow           <chr [0]> <chr [0]> <chr [0]>
26 local sparklyr          <chr [1]> <chr [0]> <chr [0]>
27 local starvz            <chr [0]> <chr [0]> <chr [0]>
28 local strand            <chr [1]> <chr [0]> <chr [0]>
29 local targets           <chr [0]> <chr [0]> <chr [0]>
30 local tidyquery         <chr [0]> <chr [0]> <chr [0]>
31 local tradestatistics   <chr [0]> <chr [0]> <chr [0]>
32 local vetiver           <chr [0]> <chr [0]> <chr [0]>

All notes and errors

disk.frame

-- R CMD check results ----------------------------------- disk.frame 0.7.2 ----
Duration: 51.9s

> checking examples ... ERROR
  Running examples in ‘disk.frame-Ex.R’ failed
  The error most likely occurred in:
  
  > ### Name: anti_join.disk.frame
  > ### Title: Performs join/merge for disk.frames
  > ### Aliases: anti_join.disk.frame full_join.disk.frame
  > ###   inner_join.disk.frame left_join.disk.frame semi_join.disk.frame
  > 
  > ### ** Examples
  > 
  > df.df = as.disk.frame(data.frame(x = 1:3, y = 4:6), overwrite = TRUE)
  > df2.df = as.disk.frame(data.frame(x = 1:2, z = 10:11), overwrite = TRUE)
  > 
  > anti_joined.df = anti_join(df.df, df2.df) 
  Warning in anti_join.disk.frame(df.df, df2.df) :
    merge_by_chunk_id = FALSE. This will take significantly longer and the preparations needed are performed eagerly which may lead to poor performance. Consider making y a data.frame or set merge_by_chunk_id = TRUE for better performance.
  Appending disk.frames: 
  Appending disk.frames: 
  Error in anti_join(.x, .y, by = by, copy = copy, ..., overwrite = overwrite) : 
    `...` must be empty.
  ✖ Problematic arguments:
  • ..1 = xch
  • ..2 = ych
  • overwrite = overwrite
  ℹ Did you forget to name an argument?
  Calls: anti_join ... resolve.list -> signalConditionsASAP -> signalConditions
  Execution halted

1 error x | 0 warnings v | 0 notes v


receptiviti

-- R CMD check results ---------------------------------- receptiviti 0.1.3 ----
Duration: 24.4s

> checking tests ...
  See below...

-- Test failures ------------------------------------------------- testthat ----

> library(testthat)
> library(receptiviti)
> 
> test_check("receptiviti")
[ FAIL 1 | WARN 0 | SKIP 2 | PASS 12 ]

══ Skipped tests ═══════════════════════════════════════════════════════════════
• no API key (2)

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-receptiviti_status.R:4'): failures works ───────────────────────
Error in `curl_fetch_memory(url, handler)`: Could not resolve host: example.com
Backtrace:
    ▆
 1. ├─testthat::expect_identical(...) at test-receptiviti_status.R:4:2
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. ├─utils::capture.output(...)
 5. │ └─base::withVisible(...elt(i))
 6. └─receptiviti::receptiviti_status("example.com", key = 123, secret = 123)
 7.   └─curl::curl_fetch_memory(url, handler)

[ FAIL 1 | WARN 0 | SKIP 2 | PASS 12 ]
Error: Test failures
Execution halted

1 error x | 0 warnings v | 0 notes v


duckdb

-- R CMD check results --------------------------------------- duckdb 0.6.2 ----
Duration: 2m 37.1s

> checking installed package size ... NOTE
    installed size is 32.5Mb
    sub-directories of 1Mb or more:
      libs  31.9Mb

0 errors v | 0 warnings v | 1 note x


fastverse

-- R CMD check results ------------------------------------ fastverse 0.3.1 ----
Duration: 31.1s

> checking dependencies in R code ... NOTE
  Namespaces in Imports field not imported from:
    ‘collapse’ ‘data.table’ ‘kit’ ‘magrittr’
    All declared Imports should be used.

0 errors v | 0 warnings v | 1 note x


parquetize

-- R CMD check results ----------------------------------- parquetize 0.5.1 ----
Duration: 1m 0.5s

> checking dependencies in R code ... NOTE
  Namespace in Imports field not imported from: ‘dplyr’
    All declared Imports should be used.

0 errors v | 0 warnings v | 1 note x


pointblank

-- R CMD check results ---------------------------------- pointblank 0.11.2 ----
Duration: 1m 10.6s

> checking data for non-ASCII characters ... NOTE
    Note: found 1 marked UTF-8 string

0 errors v | 0 warnings v | 1 note x


sparklyr

-- R CMD check results ------------------------------------- sparklyr 1.7.9 ----
Duration: 1m 20.5s

> checking installed package size ... NOTE
    installed size is  6.8Mb
    sub-directories of 1Mb or more:
      R      2.1Mb
      java   3.4Mb

0 errors v | 0 warnings v | 1 note x


strand

-- R CMD check results --------------------------------------- strand 0.2.0 ----
Duration: 2m 57.9s

> checking data for non-ASCII characters ... NOTE
    Note: found 1 marked UTF-8 string

0 errors v | 0 warnings v | 1 note x


@thisisnic
Copy link
Member

Thanks @paleolimbot, will plough on through the list!

@thisisnic
Copy link
Member

thisisnic commented Feb 7, 2023

Latest:

  • I now have created branch r-11.0.0.1 which is being checked here: WIP: [R] Verify CRAN release-11.0.0.1 #34072
  • I've built locally and checked with devtools::check_built() - all clear
  • Submitted to Winbuilder/Macbuilder - all clear

Once the crossbow checks are done and we're happy with the results, I'll submit the latest version to CRAN.

@paleolimbot
Copy link
Member Author

I ran R CMD check with the clang-16 image and that was all clear, too ✅

@thisisnic
Copy link
Member

thisisnic commented Feb 9, 2023

CRAN's clang16 build had an outdated version of Boost, which the build ended up using and failing. New branch 11.0.0.2 contains fixes for that. Checks are being done here: #34103

Winbuilder passed. MacBuilder: passed

@thisisnic
Copy link
Member

I got a failure with devtools::check_built() but I've been messing around with my setup to try to get a working Arrow C++ build, and this may have interfered with this. Have asked @paleolimbot also to check; if it works on his machine, then I'm happy.

CMake Error at cmake_modules/ThirdpartyToolchain.cmake:2312 (add_library):
  add_library cannot create imported target "xsimd" because another target
  with the same name already exists.
Call Stack (most recent call first):
  CMakeLists.txt:498 (include)

@thisisnic
Copy link
Member

Follow-up tasks now completed so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants