Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14625: [Python][CI] Enable Python test on s390x #11688

Closed
wants to merge 21 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Nov 12, 2021

No description provided.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

.travis.yml Outdated
<<: *global_env
ARCH: s390x
ARROW_CI_MODULES: "PYTHON"
DOCKER_IMAGE_ID: python-sdist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably use ubuntu-python instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@pitrou
Copy link
Member

pitrou commented Nov 22, 2021

It seems there is an issue with the dataset CMake script requiring Parquet to be present: https://issues.apache.org/jira/browse/ARROW-14793

@kiszk
Copy link
Member Author

kiszk commented Nov 24, 2021

Another issue is that dataset.py and related cython file has parquet code unconditionally.

If declare ARROW_PARQUET=OFF, a compilation error occurs w/o parquet include file.
If declare ARROW_PARQUET=ON, a test error occurs.

@pitrou
Copy link
Member

pitrou commented Nov 25, 2021

Another issue is that dataset.py and related cython file has parquet code unconditionally.

Hmm, interesting. The fix may be non-trivial... @jorisvandenbossche

@kiszk
Copy link
Member Author

kiszk commented Nov 25, 2021

I am trying to use conditional compilation for definition in cython files. But, I have to investigate how to handle references.

In addition, how do we handle conditional imports in a Python file?

@pitrou
Copy link
Member

pitrou commented Nov 25, 2021

The proper way to do it would to be to separate Dataset Parquet support into a separate module, as is already done for ORC.

@kiszk
Copy link
Member Author

kiszk commented Nov 25, 2021

Thank you for the good suggestion. I will follow this approach.

@jorisvandenbossche
Copy link
Member

Yes, for ORC we indeed did it separately, so you can have Dataset enabled in the python bindings without having ORC. We discussed in the past if we should do the same for Parquet, but at that point didn't have a direct pressing use case to change it. But it should indeed be possible the follow the same approach.

@kiszk
Copy link
Member Author

kiszk commented Dec 1, 2021

@jorisvandenbossche I tried to make modules separately. However, I still got the compilation error regarding parquet include files. I think that this is because the parquet include files are required transitively thru _dataset_parquet.pyx and _parquet.pxd.

What do you think?

@pitrou
Copy link
Member

pitrou commented Dec 7, 2021

@kiszk I'm trying to fix this.

@pitrou
Copy link
Member

pitrou commented Dec 7, 2021

@github-actions crossbow submit -g python

@github-actions
Copy link

github-actions bot commented Dec 7, 2021

Revision: de4de1e

Submitted crossbow builds: ursacomputing/crossbow @ actions-1264

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.6 Github Actions
test-conda-python-3.6-pandas-0.23 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-hdfs-3.2.1 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-0.24 Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-dask-latest Github Actions
test-conda-python-3.9-dask-master Github Actions
test-conda-python-3.9-pandas-master Github Actions
test-conda-python-3.9-spark-master Github Actions
test-debian-11-python-3 Azure
test-fedora-33-python-3 Azure
test-ubuntu-18.04-python-3 Azure

@pitrou
Copy link
Member

pitrou commented Dec 7, 2021

@github-actions crossbow submit -g python

@github-actions
Copy link

github-actions bot commented Dec 7, 2021

Revision: e617823

Submitted crossbow builds: ursacomputing/crossbow @ actions-1265

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.6 Github Actions
test-conda-python-3.6-pandas-0.23 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-hdfs-3.2.1 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-0.24 Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-dask-latest Github Actions
test-conda-python-3.9-dask-master Github Actions
test-conda-python-3.9-pandas-master Github Actions
test-conda-python-3.9-spark-master Github Actions
test-debian-11-python-3 Azure
test-fedora-33-python-3 Azure
test-ubuntu-18.04-python-3 Azure

@kiszk
Copy link
Member Author

kiszk commented Dec 7, 2021

@pitrou I really appreciate your cooperation

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for doing this. Only a few minor thoughts. I didn't look too closely at the code that was moved as I assume it was unchanged (except for the writtenfile stuff which did change and looks better).

ci/scripts/util_checkout.sh Show resolved Hide resolved
cpp/cmake_modules/FindArrowDataset.cmake Outdated Show resolved Hide resolved
cpp/cmake_modules/FindArrowDataset.cmake Outdated Show resolved Hide resolved

# cython: language_level = 3

"""Dataset support for Parquest file format."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Dataset support for Parquest file format."""
"""Dataset support for Parquet file format."""

@pitrou
Copy link
Member

pitrou commented Dec 7, 2021

Travis build: https://app.travis-ci.com/github/pitrou/arrow/jobs/551279512

I note that Pandas and perhaps Numpy are built from source, which takes a lot of time. Perhaps we should use the Ubuntu packages instead?

@pitrou
Copy link
Member

pitrou commented Dec 7, 2021

The s390x test failures are due to scipy/oldest-supported-numpy#29

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

.travis.yml Show resolved Hide resolved
python/pyarrow/_dataset.pyx Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member

The s390x test failures are due to scipy/oldest-supported-numpy#29

Are those failures depending on the runtime version of numpy? Or actually on the build time version?
Because if it is only runtime, we can install a more recent version of numpy in the test environment (we are misusing oldest-supported-numpy a bit here to also use it for installing the runtime version of numpy)

@jorisvandenbossche
Copy link
Member

I note that Pandas and perhaps Numpy are built from source, which takes a lot of time. Perhaps we should use the Ubuntu packages instead?

And probably it's also building numpy twice (once more for building pandas in the isolated build env) ..

@pitrou
Copy link
Member

pitrou commented Dec 8, 2021

Are those failures depending on the runtime version of numpy? Or actually on the build time version?

The runtime version.

Because if it is only runtime, we can install a more recent version of numpy in the test environment

Hmm, perhaps, but we should find a nice way to do that while using our Docker setup.

@pitrou
Copy link
Member

pitrou commented Dec 8, 2021

scipy/oldest-supported-numpy#29 was fixed now and a new version was uploaded to PyPI. Hopefully that will fix the issues once the mirrors catch up.

@pitrou
Copy link
Member

pitrou commented Dec 8, 2021

Hmm, there's something weird here. First oldest-supported-numpy version 0.13 is downloaded, then version 0.12 is selected...
https://app.travis-ci.com/github/pitrou/arrow/jobs/551380321#L1997-L2034

@jorisvandenbossche Do you know what might be happening?

edit: I've found this issue, it seems to be a bug in oldest-supported-numpy

@pitrou
Copy link
Member

pitrou commented Dec 9, 2021

Ok, oldest-supported-numpy 0.14 should (hopefully this time :-)) fix the issue.

@pitrou
Copy link
Member

pitrou commented Dec 9, 2021

Also, ideally we shouldn't install pandas in this build if binaries are not available (because it's very long to build). How we should go about that? @jorisvandenbossche @kszucs

@jorisvandenbossche
Copy link
Member

That might only be possible by splitting the requirements files?

@pitrou
Copy link
Member

pitrou commented Dec 9, 2021

That's possible indeed. There could be a requirements-test-minimal.txt. Also can a requirements file inherit from another one?

@pitrou
Copy link
Member

pitrou commented Dec 9, 2021

@pitrou
Copy link
Member

pitrou commented Dec 9, 2021

Note that if/when we're able to cache Docker images again (https://issues.apache.org/jira/browse/ARROW-15023), the build should be much faster (Numpy and Pandas are compiled during the image build phase).


if(ARROW_FOUND AND PARQUET_FOUND)
if(ARROW_FOUND)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou Do you have any concerns about the CMake changes here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.

@pitrou
Copy link
Member

pitrou commented Dec 9, 2021

@github-actions crossbow submit -g python

@github-actions
Copy link

github-actions bot commented Dec 9, 2021

Revision: c4a1f4d

Submitted crossbow builds: ursacomputing/crossbow @ actions-1272

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.6 Github Actions
test-conda-python-3.6-pandas-0.23 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-hdfs-3.2.1 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-0.24 Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-dask-latest Github Actions
test-conda-python-3.9-dask-master Github Actions
test-conda-python-3.9-pandas-master Github Actions
test-conda-python-3.9-spark-master Github Actions
test-debian-11-python-3 Azure
test-fedora-33-python-3 Azure
test-ubuntu-18.04-python-3 Azure

@pitrou pitrou closed this in ba273db Dec 13, 2021
@kiszk
Copy link
Member Author

kiszk commented Dec 13, 2021

Thank you very much. Sorry for being late to come here.

@ursabot
Copy link

ursabot commented Dec 13, 2021

Benchmark runs are scheduled for baseline = 3f7f245 and contender = ba273db. ba273db is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.9% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.62% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants