Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release] Cherry-pick commits to 1.0.x maintenance branch #7933

Merged
merged 37 commits into from
Aug 17, 2020

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Aug 11, 2020

No description provided.

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@kszucs
Copy link
Member Author

kszucs commented Aug 11, 2020

I was using -X theirs during the cherry-pick which I'm going to remove to have explicit resolution about possible cherry-pick conflicts.

@kszucs kszucs force-pushed the maint-1.0.x branch 2 times, most recently from 7703bbc to 20d6479 Compare August 17, 2020 12:21
xhochy and others added 26 commits August 17, 2020 15:10
Closes apache#7810 from xhochy/test-windows-fix

Authored-by: Uwe L. Korn <uwe.korn@quantco.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#7827 from jorgecarleitao/fix_version

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
Closes apache#7836 from xhochy/ARROW-9560

Authored-by: Uwe L. Korn <uwelk@xhochy.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…ating release notes for the website

Closes apache#7828 from kszucs/release-notes-archery-changelog

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Following r-windows/rtools-packages@2babb63#diff-fec826feae04e51c0d94076385408bdcR24-R25

Closes apache#7840 from nealrichardson/fix-msys-keys

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Closes apache#7832 from nealrichardson/r-post-1.0.0

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Setting the version argument in `write_parquet()` did not work due to an incorrect function name. This PR fixes the bug, adds tests and amends the documentation.

Closes apache#7831 from Plebejer/master

Authored-by: Matthias <matthias.gomolka@posteo.de>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Saying "Arrow" is redundant and consumes unnecessary screen real estate, leading to:

![image](https://user-images.githubusercontent.com/2975928/88416853-ca88b700-cd95-11ea-82f8-fe67d80fdf66.png)

Closes apache#7841 from nealrichardson/sphinx-sidebar

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
See also conda-forge/r-arrow-feedstock#25

(yes, I'll also add the conda recipe to CI in the next days)

Closes apache#7856 from xhochy/r-conda-fixes

Authored-by: Uwe L. Korn <uwe.korn@quantco.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Closes apache#7855 from kszucs/macos-brew

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#7860 from nealrichardson/fix-homebrew-again-again

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
The cause of the failure itself turns out to be in cpp/src/parquet, not the R bindings.

This patch reworks the existing r-sanitizer nightly job to (1) build the bundled C++ build like what happens on CRAN, and (2) actually fail the build if there is a UBSAN error. Previously no UBSAN error was reported because the Arrow C++ library was not built with sanitizers, and even if it were, the build was not set up to detect them and fail.

Closes apache#7858 from nealrichardson/r-ubsan

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
When used as a crate dependency, arrow-flight is rebuilt on every invocation of cargo build

# Repro:

Create a new repo, add `arrow=1.0.0` as a dependency, and then run `cargo build`

*Expected behavior*: After the first successful invocation of `cargo build`, arrow-flight will not recompile if no other changes are made.

*Actual behavior*: After every invocation of `cargo build`, arrow-flight is recompiled, even when nothing has changed

Here is an example:

Create a new crate
```
alamb@ip-192-168-0-129 arrow_rebuilds % cargo new too_many_rebuilds --bin
cargo new too_many_rebuilds --bin
     Created binary (application) `too_many_rebuilds` package
```

Add arrow as a dependency in Cargo.toml:

```
diff --git a/Cargo.toml b/Cargo.toml
index a239680..44ed358 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -5,3 +5,6 @@ authors = ["alamb <andrew@nerdnetworks.org>"]
 edition = "2018"

 # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
+
+[dependencies]
+arrow = "1.0.0"

```
>
Now, all invocations of `cargo build` will rebuild arrow, even though nothing in the code has changed:

```
alamb@ip-192-168-0-129 too_many_rebuilds % cargo build
cargo build
   Compiling arrow-flight v1.0.0
   Compiling arrow v1.0.0
   Compiling too_many_rebuilds v0.1.0 (/Users/alamb/Software/bugs/arrow_rebuilds/too_many_rebuilds)
    Finished dev [unoptimized + debuginfo] target(s) in 8.70s
alamb@ip-192-168-0-129 too_many_rebuilds % cargo build
cargo build
   Compiling arrow-flight v1.0.0
   Compiling arrow v1.0.0
   Compiling too_many_rebuilds v0.1.0 (/Users/alamb/Software/bugs/arrow_rebuilds/too_many_rebuilds)
    Finished dev [unoptimized + debuginfo] target(s) in 8.65s
```
You can see what is happening by checking out a fresh copy of arrow/master (no Cargo.log) and running `cargo build` -- you'll see your local checkout has changes in rust/arrow-flight/src/arrow.flight.protocol.rs:

There is more detail on https://issues.apache.org/jira/browse/ARROW-9600

# Proposed Fix:
This proposed patch pins to the same version of proc-macro2 that was used to create the currently checked in version of rust/arrow-flight/src/arrow.flight.protocol.rs.

Alternately, I could pin to the newer version of proc-macro2 and update the checked in version of rust/arrow-flight/src/arrow.flight.protocol.rs.

Closes apache#7867 from alamb/alamb/arrow-9600

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
… different C and C++ compilers

CMake released 3.18 version two weeks ago. We may want to report this issue upstream, until it is resolved pinning cmake to version 3.17 fixes the toolchain build.

Closes apache#7865 from kszucs/appveyor

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
The previous PR that was merged for this did not correctly pin the version and did not check in the corresponding generated code for that version.

We need to use `=1.0.18` to pin an exact version since `1.0.18` means 1.0.18 or later.

https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html

Closes apache#7893 from andygrove/pin-proc-macro

Authored-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
…round

The core `arrow` crate had a dependency on the `arrow-flight` crate, which doesn't make sense. Arrow should have minimal dependencies and should not depend on protocols or servers. Flight should depend on Arrow instead.

I also changed the name of lib from `flight` to `arrow_flight` to match the Cargo manifest. I did this because I ran into compilation issues but I think this is worth changing anyway since there is already a `flight` crate on crates.io (unrelated to Arrow). I can try and roll back this change though if there are objections.

Closes apache#7892 from andygrove/ARROW-9631

Authored-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
… null

`ConvertOptions::include_missing_columns = true` was insufficient to produce the required behavior with missing columns: we need to read the csv file's header to find the names of columns actually present in the file before instantiating a StreamingReader. Otherwise the StreamingReader will fill absent columns with `null`, which prevents the projector from materializing them correctly later.

Closes apache#7896 from bkietz/9609-csv-empty-virtual

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Closes apache#7900 from bkietz/9573-expose-ignore_prefixes

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Miss parameters in PlasmaOutOfMemoryException.java

Closes apache#7815 from offthewall123/miss_parameter_bug_fix

Authored-by: offthewall123 <dingyu.xu@intel.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…etic

Vendor relevant code from the portable-snippets library (~ public domain):
https://github.com/nemequ/portable-snippets/tree/master/safe-math

Also fix some bugs in checked arithmetic (null values had their value slots checked).
Add compute scaffolding for stateful binary scalar functions.

Closes apache#7784 from pitrou/ARROW-9402-overflow-arith

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
reexports flight from arrow if enabled.
removes unnecessary datafusion deps.
now by default arrow uses no features. features should be handpicked by the user.

Closes apache#7894 from vertexclique/vcq/ARROW-9631-make-arrow-not-depend-on-flight

Authored-by: Mahmut Bulut <vertexclique@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
This enables predicate pushdown of `%in%` filters in the presence of compound partition information

@mpjdem

Closes apache#7911 from bkietz/9606-simplify-isin-query-nested-partitions

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Traverse the node hierarchy to ensure we capture the right value count.

Closes apache#7862 from emkornfield/verify_parquetfg

Authored-by: Micah Kornfield <emkornfield@gmail.com>
Signed-off-by: Wes McKinney <wesm@apache.org>
Closes apache#7929 from nealrichardson/r-find-cmake

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Also fix the error message given when `posix_madvise()` fails for another reason (its error code is given as the function return value, not as `errno`).

Closes apache#7904 from pitrou/ARROW-9577-madvise-error-message

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Should fix the following issues:
- https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24202
- https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24363

Closes apache#7927 from pitrou/ARROW-9684-oss-fuzz

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
trxcllnt and others added 7 commits August 17, 2020 15:21
…erReader

Related JIRA: [ARROW-9659](https://issues.apache.org/jira/browse/ARROW-9659)

Prior to 1.0.0, the `RecordBatchStreamReader` was capable of reading source CudaBuffers wrapped in a `CudaBufferReader`. In 1.0.0, the Array validation routines call into Buffer::data(), which throws an error if the source isn't in host memory.

This PR guards the call-sites I was able to find, but I may have missed others. I considered skipping Array validation if the buffers aren't on the host, but the other Array validation checks are still safe and useful to perform.

Closes apache#7909 from trxcllnt/fix/ARROW-9659

Lead-authored-by: ptaylor <paul.e.taylor@me.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#7950 from nealrichardson/r-1.0.1-news

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
If there is any `.` is included in path such as `/Users/xxx/anaconda3/envs/ray/lib/python3.6/site-packages/pyarrow/libarrow.100.dylib`, the current implementation generates wrong symlink path such as `/Users/xxx/anaconda3/envs/ray/lib/python3.dylib`.

Closes apache#7937 from chaokunyang/ARROW-9700

Authored-by: mubai <chaokun.yck@antfin.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…l code improvements

Closes apache#7947 from andygrove/parquetscan-error-handling

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
…ase_dir

I still apply ignore_prefixes to all segments of paths yielded by a selector which lie *outside* an explicit partition base directory.

Closes apache#7907 from bkietz/9644-ignore_prefixes-base_dir

Lead-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Fixes problem of path expansion (`open_dataset("~/path")` now works). Unfortunately, tests can only write to tmp, so I can't add a test that confirms this, but I did verify locally.

Closes apache#7960 from nealrichardson/r-dataset-path

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Closes apache#7952 from kszucs/ARROW-9556

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
@kszucs
Copy link
Member Author

kszucs commented Aug 17, 2020

@github-actions crossbow submit -g integration -g conda -g wheel -g linux

@github-actions
Copy link

Revision: af3cc93

Submitted crossbow builds: ursa-labs/crossbow @ actions-486

Task Status
centos-6-amd64 Github Actions
centos-7-aarch64 TravisCI
centos-7-amd64 Github Actions
centos-8-aarch64 TravisCI
centos-8-amd64 Github Actions
conda-clean Azure
conda-linux-gcc-py36-cpu Azure
conda-linux-gcc-py36-cuda Azure
conda-linux-gcc-py37-cpu Azure
conda-linux-gcc-py37-cuda Azure
conda-linux-gcc-py38-cpu Azure
conda-linux-gcc-py38-cuda Azure
conda-osx-clang-py36 Azure
conda-osx-clang-py37 Azure
conda-osx-clang-py38 Azure
conda-win-vs2017-py36 Azure
conda-win-vs2017-py37 Azure
conda-win-vs2017-py38 Azure
debian-buster-amd64 Github Actions
debian-buster-arm64 TravisCI
debian-stretch-amd64 Github Actions
debian-stretch-arm64 TravisCI
test-conda-python-3.6-pandas-0.23 Github Actions
test-conda-python-3.7-dask-latest Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-pandas-master Github Actions
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.7-turbodbc-latest Github Actions
test-conda-python-3.7-turbodbc-master Github Actions
test-conda-python-3.8-dask-master Github Actions
test-conda-python-3.8-jpype Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-spark-master Github Actions
ubuntu-bionic-amd64 Github Actions
ubuntu-bionic-arm64 TravisCI
ubuntu-eoan-amd64 Github Actions
ubuntu-eoan-arm64 TravisCI
ubuntu-focal-amd64 Github Actions
ubuntu-focal-arm64 TravisCI
ubuntu-xenial-amd64 Github Actions
ubuntu-xenial-arm64 TravisCI
wheel-manylinux1-cp35m Azure
wheel-manylinux1-cp36m Azure
wheel-manylinux1-cp37m Azure
wheel-manylinux1-cp38 Azure
wheel-manylinux2010-cp35m Azure
wheel-manylinux2010-cp36m Azure
wheel-manylinux2010-cp37m Azure
wheel-manylinux2010-cp38 Azure
wheel-manylinux2014-cp35m Azure
wheel-manylinux2014-cp36m Azure
wheel-manylinux2014-cp37m Azure
wheel-manylinux2014-cp38 Azure
wheel-osx-cp35m TravisCI
wheel-osx-cp36m TravisCI
wheel-osx-cp37m TravisCI
wheel-osx-cp38 TravisCI
wheel-win-cp35m Appveyor
wheel-win-cp36m Appveyor
wheel-win-cp37m Appveyor
wheel-win-cp38 Appveyor

Awaiting a proper fix (ARROW-9621, fsspec/filesystem_spec#367), let's disable the test so it doesn't cause noise in our CI.

Closes apache#7890 from jorisvandenbossche/ARROW-9621-skip-test

Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kszucs
Copy link
Member Author

kszucs commented Aug 17, 2020

@github-actions crossbow submit -g integration -g conda -g wheel -g linux

@github-actions
Copy link

Revision: 2c362d6

Submitted crossbow builds: ursa-labs/crossbow @ actions-487

Task Status
centos-6-amd64 Github Actions
centos-7-aarch64 TravisCI
centos-7-amd64 Github Actions
centos-8-aarch64 TravisCI
centos-8-amd64 Github Actions
conda-clean Azure
conda-linux-gcc-py36-cpu Azure
conda-linux-gcc-py36-cuda Azure
conda-linux-gcc-py37-cpu Azure
conda-linux-gcc-py37-cuda Azure
conda-linux-gcc-py38-cpu Azure
conda-linux-gcc-py38-cuda Azure
conda-osx-clang-py36 Azure
conda-osx-clang-py37 Azure
conda-osx-clang-py38 Azure
conda-win-vs2017-py36 Azure
conda-win-vs2017-py37 Azure
conda-win-vs2017-py38 Azure
debian-buster-amd64 Github Actions
debian-buster-arm64 TravisCI
debian-stretch-amd64 Github Actions
debian-stretch-arm64 TravisCI
test-conda-python-3.6-pandas-0.23 Github Actions
test-conda-python-3.7-dask-latest Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-pandas-master Github Actions
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.7-turbodbc-latest Github Actions
test-conda-python-3.7-turbodbc-master Github Actions
test-conda-python-3.8-dask-master Github Actions
test-conda-python-3.8-jpype Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-spark-master Github Actions
ubuntu-bionic-amd64 Github Actions
ubuntu-bionic-arm64 TravisCI
ubuntu-eoan-amd64 Github Actions
ubuntu-eoan-arm64 TravisCI
ubuntu-focal-amd64 Github Actions
ubuntu-focal-arm64 TravisCI
ubuntu-xenial-amd64 Github Actions
ubuntu-xenial-arm64 TravisCI
wheel-manylinux1-cp35m Azure
wheel-manylinux1-cp36m Azure
wheel-manylinux1-cp37m Azure
wheel-manylinux1-cp38 Azure
wheel-manylinux2010-cp35m Azure
wheel-manylinux2010-cp36m Azure
wheel-manylinux2010-cp37m Azure
wheel-manylinux2010-cp38 Azure
wheel-manylinux2014-cp35m Azure
wheel-manylinux2014-cp36m Azure
wheel-manylinux2014-cp37m Azure
wheel-manylinux2014-cp38 Azure
wheel-osx-cp35m TravisCI
wheel-osx-cp36m TravisCI
wheel-osx-cp37m TravisCI
wheel-osx-cp38 TravisCI
wheel-win-cp35m Appveyor
wheel-win-cp36m Appveyor
wheel-win-cp37m Appveyor
wheel-win-cp38 Appveyor

@kszucs
Copy link
Member Author

kszucs commented Aug 17, 2020

The three integration build failures and the two GHA errors are unrelated, so merging it to the upstream maintenance branch before creating the tag.

@kszucs
Copy link
Member Author

kszucs commented Aug 17, 2020

Rebase and merge is disabled, so I'm pushing it directly to the upstream branch.

@kszucs kszucs merged commit 193ba7e into apache:maint-1.0.x Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet