ARROW-14625: [Python][CI] Enable Python test on s390x #11688

kiszk · 2021-11-12T10:29:52Z

No description provided.

github-actions · 2021-11-12T10:30:13Z

https://issues.apache.org/jira/browse/ARROW-14625

github-actions · 2021-11-12T10:30:14Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

pitrou · 2021-11-17T11:36:17Z

.travis.yml

+        <<: *global_env
+        ARCH: s390x
+        ARROW_CI_MODULES: "PYTHON"
+        DOCKER_IMAGE_ID: python-sdist


You should probably use ubuntu-python instead.

pitrou · 2021-11-22T15:47:13Z

It seems there is an issue with the dataset CMake script requiring Parquet to be present: https://issues.apache.org/jira/browse/ARROW-14793

kiszk · 2021-11-24T22:12:42Z

Another issue is that dataset.py and related cython file has parquet code unconditionally.

If declare ARROW_PARQUET=OFF, a compilation error occurs w/o parquet include file.
If declare ARROW_PARQUET=ON, a test error occurs.

pitrou · 2021-11-25T11:32:43Z

Another issue is that dataset.py and related cython file has parquet code unconditionally.

Hmm, interesting. The fix may be non-trivial... @jorisvandenbossche

kiszk · 2021-11-25T12:50:36Z

I am trying to use conditional compilation for definition in cython files. But, I have to investigate how to handle references.

In addition, how do we handle conditional imports in a Python file?

pitrou · 2021-11-25T12:54:32Z

The proper way to do it would to be to separate Dataset Parquet support into a separate module, as is already done for ORC.

kiszk · 2021-11-25T13:15:42Z

Thank you for the good suggestion. I will follow this approach.

jorisvandenbossche · 2021-11-25T16:39:35Z

Yes, for ORC we indeed did it separately, so you can have Dataset enabled in the python bindings without having ORC. We discussed in the past if we should do the same for Parquet, but at that point didn't have a direct pressing use case to change it. But it should indeed be possible the follow the same approach.

kiszk · 2021-12-01T10:42:06Z

@jorisvandenbossche I tried to make modules separately. However, I still got the compilation error regarding parquet include files. I think that this is because the parquet include files are required transitively thru _dataset_parquet.pyx and _parquet.pxd.

What do you think?

pitrou · 2021-12-07T17:02:40Z

@kiszk I'm trying to fix this.

pitrou · 2021-12-07T18:18:34Z

@github-actions crossbow submit -g python

github-actions · 2021-12-07T18:19:45Z

Revision: de4de1e

Submitted crossbow builds: ursacomputing/crossbow @ actions-1264

Task	Status
test-conda-python-3.10
test-conda-python-3.6
test-conda-python-3.6-pandas-0.23
test-conda-python-3.7
test-conda-python-3.7-hdfs-2.9.2
test-conda-python-3.7-hdfs-3.2.1
test-conda-python-3.7-kartothek-latest
test-conda-python-3.7-kartothek-master
test-conda-python-3.7-pandas-0.24
test-conda-python-3.7-pandas-latest
test-conda-python-3.7-spark-v3.1.2
test-conda-python-3.8
test-conda-python-3.8-hypothesis
test-conda-python-3.8-pandas-latest
test-conda-python-3.8-pandas-nightly
test-conda-python-3.8-spark-v3.2.0
test-conda-python-3.9
test-conda-python-3.9-dask-latest
test-conda-python-3.9-dask-master
test-conda-python-3.9-pandas-master
test-conda-python-3.9-spark-master
test-debian-11-python-3
test-fedora-33-python-3
test-ubuntu-18.04-python-3

pitrou · 2021-12-07T18:22:44Z

@github-actions crossbow submit -g python

github-actions · 2021-12-07T18:23:57Z

Revision: e617823

Submitted crossbow builds: ursacomputing/crossbow @ actions-1265

Task	Status
test-conda-python-3.10
test-conda-python-3.6
test-conda-python-3.6-pandas-0.23
test-conda-python-3.7
test-conda-python-3.7-hdfs-2.9.2
test-conda-python-3.7-hdfs-3.2.1
test-conda-python-3.7-kartothek-latest
test-conda-python-3.7-kartothek-master
test-conda-python-3.7-pandas-0.24
test-conda-python-3.7-pandas-latest
test-conda-python-3.7-spark-v3.1.2
test-conda-python-3.8
test-conda-python-3.8-hypothesis
test-conda-python-3.8-pandas-latest
test-conda-python-3.8-pandas-nightly
test-conda-python-3.8-spark-v3.2.0
test-conda-python-3.9
test-conda-python-3.9-dask-latest
test-conda-python-3.9-dask-master
test-conda-python-3.9-pandas-master
test-conda-python-3.9-spark-master
test-debian-11-python-3
test-fedora-33-python-3
test-ubuntu-18.04-python-3

kiszk · 2021-12-07T19:10:02Z

@pitrou I really appreciate your cooperation

westonpace

This looks great. Thanks for doing this. Only a few minor thoughts. I didn't look too closely at the code that was moved as I assume it was unchanged (except for the writtenfile stuff which did change and looks better).

ci/scripts/util_checkout.sh

cpp/cmake_modules/FindArrowDataset.cmake

westonpace · 2021-12-07T19:44:59Z

python/pyarrow/_dataset_parquet.pyx

+
+# cython: language_level = 3
+
+"""Dataset support for Parquest file format."""


Suggested change

"""Dataset support for Parquest file format."""

"""Dataset support for Parquet file format."""

pitrou · 2021-12-07T23:02:56Z

Travis build: https://app.travis-ci.com/github/pitrou/arrow/jobs/551279512

I note that Pandas and perhaps Numpy are built from source, which takes a lot of time. Perhaps we should use the Ubuntu packages instead?

pitrou · 2021-12-07T23:14:10Z

The s390x test failures are due to scipy/oldest-supported-numpy#29

jorisvandenbossche

Looks good!

.travis.yml

python/pyarrow/_dataset.pyx

jorisvandenbossche · 2021-12-08T09:04:45Z

The s390x test failures are due to scipy/oldest-supported-numpy#29

Are those failures depending on the runtime version of numpy? Or actually on the build time version?
Because if it is only runtime, we can install a more recent version of numpy in the test environment (we are misusing oldest-supported-numpy a bit here to also use it for installing the runtime version of numpy)

jorisvandenbossche · 2021-12-08T09:05:40Z

I note that Pandas and perhaps Numpy are built from source, which takes a lot of time. Perhaps we should use the Ubuntu packages instead?

And probably it's also building numpy twice (once more for building pandas in the isolated build env) ..

pitrou · 2021-12-08T10:32:00Z

Are those failures depending on the runtime version of numpy? Or actually on the build time version?

The runtime version.

Because if it is only runtime, we can install a more recent version of numpy in the test environment

Hmm, perhaps, but we should find a nice way to do that while using our Docker setup.

pitrou · 2021-12-08T19:06:56Z

scipy/oldest-supported-numpy#29 was fixed now and a new version was uploaded to PyPI. Hopefully that will fix the issues once the mirrors catch up.

pitrou · 2021-12-08T19:35:05Z

Hmm, there's something weird here. First oldest-supported-numpy version 0.13 is downloaded, then version 0.12 is selected...
https://app.travis-ci.com/github/pitrou/arrow/jobs/551380321#L1997-L2034

@jorisvandenbossche Do you know what might be happening?

edit: I've found this issue, it seems to be a bug in oldest-supported-numpy

pitrou · 2021-12-09T16:55:46Z

Ok, oldest-supported-numpy 0.14 should (hopefully this time :-)) fix the issue.

pitrou · 2021-12-09T17:08:09Z

Also, ideally we shouldn't install pandas in this build if binaries are not available (because it's very long to build). How we should go about that? @jorisvandenbossche @kszucs

jorisvandenbossche · 2021-12-09T17:11:13Z

That might only be possible by splitting the requirements files?

pitrou · 2021-12-09T17:25:00Z

That's possible indeed. There could be a requirements-test-minimal.txt. Also can a requirements file inherit from another one?

pitrou · 2021-12-09T17:26:48Z

Success! https://app.travis-ci.com/github/pitrou/arrow/jobs/551589637

pitrou · 2021-12-09T17:27:38Z

Note that if/when we're able to cache Docker images again (https://issues.apache.org/jira/browse/ARROW-15023), the build should be much faster (Numpy and Pandas are compiled during the image build phase).

pitrou · 2021-12-09T18:25:07Z

cpp/cmake_modules/FindArrowDataset.cmake


-if(ARROW_FOUND AND PARQUET_FOUND)
+if(ARROW_FOUND)


@kou Do you have any concerns about the CMake changes here?

No problem.

pitrou · 2021-12-09T18:38:29Z

@github-actions crossbow submit -g python

github-actions · 2021-12-09T18:41:45Z

Revision: c4a1f4d

Submitted crossbow builds: ursacomputing/crossbow @ actions-1272

Task	Status
test-conda-python-3.10
test-conda-python-3.6
test-conda-python-3.6-pandas-0.23
test-conda-python-3.7
test-conda-python-3.7-hdfs-2.9.2
test-conda-python-3.7-hdfs-3.2.1
test-conda-python-3.7-kartothek-latest
test-conda-python-3.7-kartothek-master
test-conda-python-3.7-pandas-0.24
test-conda-python-3.7-pandas-latest
test-conda-python-3.7-spark-v3.1.2
test-conda-python-3.8
test-conda-python-3.8-hypothesis
test-conda-python-3.8-pandas-latest
test-conda-python-3.8-pandas-nightly
test-conda-python-3.8-spark-v3.2.0
test-conda-python-3.9
test-conda-python-3.9-dask-latest
test-conda-python-3.9-dask-master
test-conda-python-3.9-pandas-master
test-conda-python-3.9-spark-master
test-debian-11-python-3
test-fedora-33-python-3
test-ubuntu-18.04-python-3

kiszk · 2021-12-13T13:45:58Z

Thank you very much. Sorry for being late to come here.

ursabot · 2021-12-13T20:41:13Z

Benchmark runs are scheduled for baseline = 3f7f245 and contender = ba273db. ba273db is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.9% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.62% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

kiszk mentioned this pull request Nov 17, 2021

ARROW-14686: [Python][C++] make byte order detection for numpy builtin type correct #11687

Closed

pitrou reviewed Nov 17, 2021

View reviewed changes

github-actions bot added the Component: C++ label Nov 24, 2021

github-actions bot added the Component: Python label Nov 26, 2021

kiszk and others added 15 commits December 7, 2021 18:16

add Python for s390 to .travis.yml

041af1d

change docker image name

7747059

test for ARROW-14793

8975b3f

enable PARQUET

dd2194f

put dataset parquet into a separate module

e373f43

reorder build dataset packages

6f7ee13

fix build failure

32dc9b9

put dataset parquet into a separate module

16a306c

fix build failure

7347fa5

fix build failure

cab62d6

workaround to avoid compilation error

aebf751

fix build failure

818b9a6

fix compilation errors

56023e1

fix build error w parquet

eb51dfd

More fixes to Cython Parquet-dataset integration

3759133

Disable more arcane git code

e617823

Remove print() calls

8575517

westonpace approved these changes Dec 7, 2021

View reviewed changes

jorisvandenbossche reviewed Dec 8, 2021

View reviewed changes

.travis.yml Show resolved Hide resolved

python/pyarrow/_dataset.pyx Outdated Show resolved Hide resolved

pitrou added 2 commits December 9, 2021 18:29

Set minimum oldest-supported-numpy version

ed90b25

Apply review comments

c4a1f4d

pitrou reviewed Dec 9, 2021

View reviewed changes

pitrou closed this in ba273db Dec 13, 2021


		# cython: language_level = 3

		"""Dataset support for Parquest file format."""

	"""Dataset support for Parquest file format."""
	"""Dataset support for Parquet file format."""


		if(ARROW_FOUND AND PARQUET_FOUND)
		if(ARROW_FOUND)

ARROW-14625: [Python][CI] Enable Python test on s390x #11688

ARROW-14625: [Python][CI] Enable Python test on s390x #11688

Conversation

kiszk commented Nov 12, 2021

github-actions bot commented Nov 12, 2021

github-actions bot commented Nov 12, 2021

pitrou Nov 17, 2021

Choose a reason for hiding this comment

kiszk Nov 17, 2021

Choose a reason for hiding this comment

pitrou commented Nov 22, 2021

kiszk commented Nov 24, 2021 • edited Loading

pitrou commented Nov 25, 2021

kiszk commented Nov 25, 2021 • edited Loading

pitrou commented Nov 25, 2021

kiszk commented Nov 25, 2021

jorisvandenbossche commented Nov 25, 2021

kiszk commented Dec 1, 2021

pitrou commented Dec 7, 2021

pitrou commented Dec 7, 2021

github-actions bot commented Dec 7, 2021

pitrou commented Dec 7, 2021

github-actions bot commented Dec 7, 2021

kiszk commented Dec 7, 2021

westonpace left a comment

Choose a reason for hiding this comment

westonpace Dec 7, 2021

Choose a reason for hiding this comment

pitrou commented Dec 7, 2021

pitrou commented Dec 7, 2021

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 8, 2021

jorisvandenbossche commented Dec 8, 2021

pitrou commented Dec 8, 2021 • edited Loading

pitrou commented Dec 8, 2021

pitrou commented Dec 8, 2021 • edited Loading

pitrou commented Dec 9, 2021

pitrou commented Dec 9, 2021

jorisvandenbossche commented Dec 9, 2021

pitrou commented Dec 9, 2021

pitrou commented Dec 9, 2021

pitrou commented Dec 9, 2021

pitrou Dec 9, 2021

Choose a reason for hiding this comment

kou Dec 9, 2021

Choose a reason for hiding this comment

pitrou commented Dec 9, 2021

github-actions bot commented Dec 9, 2021

kiszk commented Dec 13, 2021 • edited Loading

ursabot commented Dec 13, 2021 • edited Loading

kiszk commented Nov 24, 2021 •

edited

Loading

kiszk commented Nov 25, 2021 •

edited

Loading

pitrou commented Dec 8, 2021 •

edited

Loading

pitrou commented Dec 8, 2021 •

edited

Loading

kiszk commented Dec 13, 2021 •

edited

Loading

ursabot commented Dec 13, 2021 •

edited

Loading