Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41910: [Python] Add support for Pyodide #37822

Open
wants to merge 125 commits into
base: main
Choose a base branch
from

Conversation

joemarshall
Copy link
Contributor

@joemarshall joemarshall commented Sep 21, 2023

pyarrow knows about ARROW_ENABLE_THREADING and doesn't use threads if they are not enabled in libarrow.

Split from #37696

@joemarshall
Copy link
Contributor Author

@kou And these are the python changes

@jorisvandenbossche jorisvandenbossche changed the title GH23221 - python changes for pyodide build GH-23221: [Python] python changes for pyodide build Sep 25, 2023
@apache apache deleted a comment from github-actions bot Sep 25, 2023
@jorisvandenbossche
Copy link
Member

@joemarshall thanks for the PR!

We might want to expose is_threading_enabled() in pyarrow publicly (it might be useful for downstream packages as well?), in __init__.py

2. pyarrow sets defaults for inclusion of submodules based on their inclusion in the arrow build. e.g. pyarrow.parquet is built only if ARROW_PARQUET is set. This makes it possible to build in situations where you don't have access to set the build environment variables (e.g. in cross compiling situations like pyodide).

This part is not actually included here? (or I don't understand the sentence)
(and it might also make sense to leave that for a third PR since it is changing the build process beyond emscriptem?)

@joemarshall
Copy link
Contributor Author

@joemarshall thanks for the PR!

We might want to expose is_threading_enabled() in pyarrow publicly (it might be useful for downstream packages as well?), in __init__.py

  1. pyarrow sets defaults for inclusion of submodules based on their inclusion in the arrow build. e.g. pyarrow.parquet is built only if ARROW_PARQUET is set. This makes it possible to build in situations where you don't have access to set the build environment variables (e.g. in cross compiling situations like pyodide).

This part is not actually included here? (or I don't understand the sentence) (and it might also make sense to leave that for a third PR since it is changing the build process beyond emscriptem?)

Sorry, I missed out putting in the setup.py changes. They're in now.

About is_threading_enabled(), it is currently in pyarrow.lib.is_threading_enabled(). Does it need to be top-level?

@joemarshall
Copy link
Contributor Author

joemarshall commented Sep 28, 2023

Oh and for now I have blocked the auto-setting of PYARROW_* to happen only on emscripten - I don't know if that makes sense or not, but it isn't possible to build for emscripten without that change or something similar right now.

python/CMakeLists.txt Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 3, 2023
python/pyarrow/io.pxi Outdated Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
python/CMakeLists.txt Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the awaiting committer review Awaiting committer review label Oct 3, 2023
@joemarshall
Copy link
Contributor Author

@jorisvandenbossche That's a weird error - I think it must have not loaded pyodide in node that time for some reason. Looking at the output I think it may be because on the CI system buffering of the pipe between node and python runner is happening differently and it is losing some data to stdin. I've pushed a fix which should hopefully make this not happen.

@joemarshall
Copy link
Contributor Author

@github-actions crossbow submit test-conda-python-emscripten

Copy link

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/9695347629

@kou
Copy link
Member

kou commented Jun 27, 2024

@github-actions crossbow submit test-conda-python-emscripten

Copy link

Revision: 1a19b36

Submitted crossbow builds: ursacomputing/crossbow @ actions-73fe2c1343

Task Status
test-conda-python-emscripten GitHub Actions

ci/docker/conda-python-emscripten.dockerfile Outdated Show resolved Hide resolved

# Install basic build stuff, don't pin versions, hence ignore lint
# hadolint ignore=DL3008
RUN apt-get update && apt-get install --no-install-recommends -y -q unzip zip libpthread-stubs0-dev build-essential && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use conda to install them?

ci/scripts/install_chromedriver.sh Outdated Show resolved Hide resolved
ci/scripts/python_test_emscripten.sh Outdated Show resolved Hide resolved
ci/docker/conda-python-emscripten.dockerfile Show resolved Hide resolved
ci/docker/conda-python-emscripten.dockerfile Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
ci/scripts/python_test_emscripten.sh Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 27, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 27, 2024
@joemarshall
Copy link
Contributor Author

@kou I've updated based on those comments - it now gets emscripten version from pyodide, and uses conda to install nodejs etc. I also removed the use of multiprocessing in the python test runner, to see if that is causing the weird hang in node tests, which I just can't reproduce in docker here even using archery. If you could run tests again that would be great.

@ianmcook
Copy link
Member

@github-actions crossbow submit test-conda-python-emscripten

Copy link

Revision: e2d8d73

Submitted crossbow builds: ursacomputing/crossbow @ actions-a69d7a8e48

Task Status
test-conda-python-emscripten GitHub Actions

@joemarshall
Copy link
Contributor Author

Aha, it doesn't work if there isn't a tty. I think I've fixed the node.js tests now (at least they work with nohup now)

@ianmcook
Copy link
Member

@github-actions crossbow submit test-conda-python-emscripten

Copy link

Revision: 46e179a

Submitted crossbow builds: ursacomputing/crossbow @ actions-255fa63cba

Task Status
test-conda-python-emscripten GitHub Actions

@bitsondatadev
Copy link

@joemarshall every time the build fails...

snow

Keep it up it's soooo close! You probably deserve another huge cash dump for all the work done here.

@kou
Copy link
Member

kou commented Jun 28, 2024

FYI: We can run the job on local: PYTHON=3.12 UBUNTU=22.04 archery docker run conda-python-emscripten
In general, it's faster than @github-actions crossbow submit test-conda-python-emscripten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants