Skip to content

DO NOT MERGE: Testing create Java-jars against 15.0.0 commit release to test Linux dataset errors#40003

Closed
davisusanibar wants to merge 22 commits intoapache:mainfrom
davisusanibar:FIX-DATASET-ERROR-LINUX-ENV
Closed

DO NOT MERGE: Testing create Java-jars against 15.0.0 commit release to test Linux dataset errors#40003
davisusanibar wants to merge 22 commits intoapache:mainfrom
davisusanibar:FIX-DATASET-ERROR-LINUX-ENV

Conversation

@davisusanibar
Copy link
Copy Markdown
Contributor

Rationale for this change

We are seeing error to use Apache Arrow Java Dataset module on Linux environment. This PR aims to create Java-jars against 15.0.0 commit release to test Linux dataset errors.

What changes are included in this PR?

The same as V15 release https://github.com/apache/arrow/tree/a61f4af724cd06c3a9b4abd20491345997e532c0

Are these changes tested?

Yes

Are there any user-facing changes?

No

jonkeane and others added 21 commits January 9, 2024 09:57
### Rationale for this change

We sometimes need to use a more modern cmake, before this change although we downloaded a functioning cmake on macos, we didn't have the correct path for it. 

### What changes are included in this PR?

Resolves apache#38811 so that cmake is useable when downloaded on macos. This also restores the local source build jobs to be testing that source builds work (which is what the Ci jobs say they are doing). I believe these jobs started using binaries when we overhauled the build system last release.

### Are these changes tested?

Yes, in CI with the local (source) install jobs in crossbow)

### Are there any user-facing changes?

* Closes: apache#38811

Authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
…ghtly CI build (apache#39498)

Update version checks and assertions of pyarrow array equality for pandas failing tests on the CI: [test-conda-python-3.10-pandas-nightly](https://github.com/ursacomputing/crossbow/actions/runs/7391976015/job/20109720695)

* Closes: apache#39437

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…ct (apache#39522)

### Rationale for this change

With CMake > 3.28 the generated Makefile fails on the jemalloc_ep due to 'bad file descriptor'.

### What changes are included in this PR?

For a sequential build for jemalloc by setting -j1.

### Are these changes tested?

CI

### Are there any user-facing changes?

No.
* Closes: apache#39517

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
### Rationale for this change

The CRAN check on `fedora clang devel` builds with clang against libc++ and has a system re2 installed that was build with C++11  ABI which causes  linking to fail  due to the [abi:cxx11]-symbol annotation on the system version.

A user could manually use the bundled build or path hint a clang version of the library. To avoid extra work for the CRAN maintainers we can just default to the bundled build. The re2 build is small enough that users building from source will not really feel the difference and can still opt to use the system re2 via `EXTRA_CMAKE_FLAGS`. 

### What changes are included in this PR?

Default to use our bundled build to prevent the problems. 

### Are these changes tested?

On a local dev container replicating the cran env.

### Are there any user-facing changes?

Source build now default to use the bundled re2 version, this can be overridden.

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
…mpty and test_view (apache#39534)

Skipping dask tests `test_dataframe.py::test_describe_empty` and `test_dataframe.py::test_view` on our CI to stop the nightly dask test jobs to fail. 
* Closes: apache#39531

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: AlenkaF <frim.alenka@gmail.com>
…requirements for the 15.x release branch (apache#39538)

### Rationale for this change

PyArrow wheels for the 15.0.0 release will not be compatible with future numpy 2.0 packages, therefore it is recommended to add this upper pin now for _releases_. We will keep the more flexible pin on the development branch (by reverting this commit on main, but so it can be cherry-picked in the release branch)

* Closes: apache#39537

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…pache#39535)

### Rationale for this change

Removing usage of `np.core`, as that is deprecated and will be removed in numpy 2.0. 

For this specific case, we can just hardcode the list of data types instead of using a numpy api (this list doesn't typically change).

* Closes: apache#39533

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…e#39516)

### Rationale for this change

For Iceberg we want to add metadata type the type (the field-id), therefore we need to pass in the type analog to what we do for `ListArray.from_arrays(self, offsets, values, DataType type=None, MemoryPool pool=None, mask=None)`.

### What changes are included in this PR?

Updated a keyword argument for the `type`, and make sure that the the static method to create the MapType is exposed from the cpp side.

### Are these changes tested?

I've added a simple test.

### Are there any user-facing changes?

* Closes: apache#39515

Authored-by: Fokko Driesprong <fokko@tabular.io>
Signed-off-by: AlenkaF <frim.alenka@gmail.com>
…to run integration tests (apache#39502)

Integration verification tasks are currently failing on CI.

Install jpype and build JNI c-data to run integration tests

Yes via archery

No

* Closes: apache#38470

Lead-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
)

### Rationale for this change

The version set currently on the maintenance branch is incorrect for Java BOM.

### What changes are included in this PR?

Suggested changes to set specifically version for BOM and maven.

### Are these changes tested?

I will trigger java-jars via archery but I think this is currently only reproducible on the maintenance branch. So we will have to merge and validate there.

### Are there any user-facing changes?
No
* Closes: apache#39564

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
… to fix macOS build with conda (apache#39589)

### Rationale for this change

CI job has been failing since we added integration tests.

### What changes are included in this PR?

Add `CGO_ENABLED=1` to go build cdata_integration on the verification script.

### Are these changes tested?

Yes via archery.

### Are there any user-facing changes?

No
* Closes: apache#39588

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…apache#39602)

See apache#39601 

### Are these changes tested?

Existing CI should pass. This should also pass on macbuilder without downloading cmake, and if hardcoding `download_ok <- FALSE`, it should exit cleanly and informatively.

### Are there any user-facing changes?

Define "user".
* Closes: apache#39601

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
### Rationale for this change

Resolves apache#39584 

### What changes are included in this PR?

We now only check the checksum after the download succeeded, and try to be quieter about it when we do. We also use bundled boost and lz4 source on macos by default (to avoid system versions of each on cran that seem to have issues)

### Are these changes tested?

I submitted a download-malignant (and verbose) build to [CRAN's macbuilder](https://mac.r-project.org/macbuilder/results/1705088784-991a5beacf4ec26e/) and it succeeds.

### Are there any user-facing changes?

In principle the macos source build is slightly altered + we have a cleaner path when file downloads fail. But both of these should be relatively non-impactful since most macos users are getting binaries from CRAN. Most importantly it helps us stay on CRAN. 

**This PR contains a "Critical Fix".**
* Closes: apache#39584

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
…pache#39625)

### Rationale for this change

CMake is now a sysreq and we don't want to default to using nightly builds in CI

### Are these changes tested?

Crossbos
* Closes: apache#39624

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
### What changes are included in this PR?

The verification script is modified to look for the versions of .NET now supported by the package.

### Are these changes tested?

Manually tested the verification command.

* Closes: apache#39598

Authored-by: Curt Hagenlocher <curt@hagenlocher.org>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
…_filtering (apache#39632)

### Rationale for this change

`ParquetFileFragment` stores a `SchemaManifest` that has a raw pointer to a `SchemaDescriptor`. The `SchemaDescriptor` is originally provided by a `FileMetadata` instance but, in some cases, the `FileMetadata` instance can be destroyed while the `ParquetFileFragment` is still in use. This can typically lead to bugs or crashes.

### What changes are included in this PR?

Ensure that `ParquetFileFragment` keeps an owning pointer to the `FileMetadata` instance that provides its `SchemaManifest`'s schema descriptor.

### Are these changes tested?

An assertion is added that would fail deterministically in the Python test suite.

### Are there any user-facing changes?

No.

* Closes: apache#39562

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 8, 2024

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@davisusanibar
Copy link
Copy Markdown
Contributor Author

@assignUser is it possible to run java-jar task again to rebuild jar using v15 release?

@danepitkin

@davisusanibar
Copy link
Copy Markdown
Contributor Author

@github-actions crossbow submit java-jars

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 8, 2024

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/7834804015

@vibhatha
Copy link
Copy Markdown
Contributor

vibhatha commented Feb 8, 2024

@github-actions crossbow submit java-jars

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 8, 2024

Revision: 0853744

Submitted crossbow builds: ursacomputing/crossbow @ actions-33a76f212b

Task Status
java-jars GitHub Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.