-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-14625: [Python][CI] Enable Python test on s390x #11688
Conversation
|
.travis.yml
Outdated
<<: *global_env | ||
ARCH: s390x | ||
ARROW_CI_MODULES: "PYTHON" | ||
DOCKER_IMAGE_ID: python-sdist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably use ubuntu-python
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
It seems there is an issue with the dataset CMake script requiring Parquet to be present: https://issues.apache.org/jira/browse/ARROW-14793 |
Another issue is that dataset.py and related cython file has parquet code unconditionally. If declare |
Hmm, interesting. The fix may be non-trivial... @jorisvandenbossche |
I am trying to use conditional compilation for definition in cython files. But, I have to investigate how to handle references. In addition, how do we handle conditional imports in a Python file? |
The proper way to do it would to be to separate Dataset Parquet support into a separate module, as is already done for ORC. |
Thank you for the good suggestion. I will follow this approach. |
Yes, for ORC we indeed did it separately, so you can have Dataset enabled in the python bindings without having ORC. We discussed in the past if we should do the same for Parquet, but at that point didn't have a direct pressing use case to change it. But it should indeed be possible the follow the same approach. |
@jorisvandenbossche I tried to make modules separately. However, I still got the compilation error regarding parquet include files. I think that this is because the parquet include files are required transitively thru _dataset_parquet.pyx and _parquet.pxd. What do you think? |
@kiszk I'm trying to fix this. |
@github-actions crossbow submit -g python |
Revision: de4de1e Submitted crossbow builds: ursacomputing/crossbow @ actions-1264 |
@github-actions crossbow submit -g python |
Revision: e617823 Submitted crossbow builds: ursacomputing/crossbow @ actions-1265 |
@pitrou I really appreciate your cooperation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks for doing this. Only a few minor thoughts. I didn't look too closely at the code that was moved as I assume it was unchanged (except for the writtenfile stuff which did change and looks better).
|
||
# cython: language_level = 3 | ||
|
||
"""Dataset support for Parquest file format.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Dataset support for Parquest file format.""" | |
"""Dataset support for Parquet file format.""" |
Travis build: https://app.travis-ci.com/github/pitrou/arrow/jobs/551279512 I note that Pandas and perhaps Numpy are built from source, which takes a lot of time. Perhaps we should use the Ubuntu packages instead? |
The s390x test failures are due to scipy/oldest-supported-numpy#29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Are those failures depending on the runtime version of numpy? Or actually on the build time version? |
And probably it's also building numpy twice (once more for building pandas in the isolated build env) .. |
The runtime version.
Hmm, perhaps, but we should find a nice way to do that while using our Docker setup. |
scipy/oldest-supported-numpy#29 was fixed now and a new version was uploaded to PyPI. Hopefully that will fix the issues once the mirrors catch up. |
Hmm, there's something weird here. First @jorisvandenbossche Do you know what might be happening? edit: I've found this issue, it seems to be a bug in |
Ok, oldest-supported-numpy 0.14 should (hopefully this time :-)) fix the issue. |
Also, ideally we shouldn't install pandas in this build if binaries are not available (because it's very long to build). How we should go about that? @jorisvandenbossche @kszucs |
That might only be possible by splitting the requirements files? |
That's possible indeed. There could be a |
Note that if/when we're able to cache Docker images again (https://issues.apache.org/jira/browse/ARROW-15023), the build should be much faster (Numpy and Pandas are compiled during the image build phase). |
|
||
if(ARROW_FOUND AND PARQUET_FOUND) | ||
if(ARROW_FOUND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou Do you have any concerns about the CMake changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
@github-actions crossbow submit -g python |
Revision: c4a1f4d Submitted crossbow builds: ursacomputing/crossbow @ actions-1272 |
Thank you very much. Sorry for being late to come here. |
Benchmark runs are scheduled for baseline = 3f7f245 and contender = ba273db. ba273db is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
No description provided.