-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] pyarrow does not disable SIMD CPU optimizations when set to do so. #34277
Comments
Hi, did you compile pyarrow by yourself or use a prebuilt pyarrow package? ARROW_USER_SIMD_LEVEL would only be effective if you compile pyarrow by yourself. |
I have tried building/installing it using pip and conda. I have tried build it from source, but I get stuck here:
|
error: no matching function for call to ‘arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::RunEndEncodedType&)’ |
I tried another method to build it using cmake instead of setup.py and I got: |
How did you specify Does |
Nope: |
Trying build from source code I keep getting: |
What is the instruction that is failing? You should be able to get this by reproducing the crash in I seem to recall that popcnt was required (regardless of SIMD support): #21840 |
popcnt appears to be one issue. Searching the internet I am finding a ton of people having similar issues. |
The issue I see above is that you try to build pyarrow 11.0.0 with the 12.0.0 C++ libarrow which is likely the cause for the issues due to changes on main that diverged from the 11.0.0 library thus being incompatible. Regarding the SIMD level you will have to build C++ libarrow from source as well for the switch to work but as you mentioned popcnt is required regardless of SIMD level which means that the fix for this would be #23013 |
I have tried both downloading the pyarrow code from pypi (pip) and downloading the current github arrow code. What exactly do I need to download to build a working pyarrow python package? What are the perquisites? |
You will have to build arrow from source (with SIMD turned off) first and install it. Afterwards you can build pyarrow against that. The exact process is detailed here: https://arrow.apache.org/docs/developers/python.html#building-on-linux-and-macos This is aimed at developing on main but you can of course use the tag The only modification you will need is to add But this will likely still fail due to the missing popcnt but if you want to test a fix this would be the way. (Though I have no idea how to add a software implementation of a cpu instruction so I can not assist in that regard.) |
I am still struggling to compile and test pyarrow, partly because I don't know where to patch the popcnt instruction, but... Quote: I have no idea how to add a software implementation of a cpu instruction so I can not assist in that regard. THAT answer I believe I have... |
I think you would need to update the preprocessor macros in this file (maybe with #ifdef something?) to not return the builtin |
I came up with this patch which seems to work and is cross platform. If you have hardware that supports the popcnt instruction, it will still use that, but if not it will use a standard C++ library function. Actually the function uses the hardware if present. Please feel free to test and benchmark this, I think the performance will be very close. So far I have built my patch against version 4.0.1 code because I had trouble building newer code from source. I am still working on that. I am using gcc version 7 and it seems your project has also included some code that may require gcc version 11. You can look at the link I sent earlier for some existing benchmarks. |
This patch seems to work on arrow versions 4.0.1 and 5.0.0, anything newer than 5.0.0 doesn't link using gcc 7, at least on my system. The patch seems like it applies even to arrow version 12.0.0 but my gcc version 7 compiler will not build a shared libarrow library, so I can't test the patch yet on anything newer than arrow 5.0.0 because I can't build anything newer with or without this patch. I will keep working on this, but at least now I have a working pyarrow 5.0.0 and streamlit 1.19 needs pyarrow 4.0 or newer, so it's working. |
We should support gcc version 7. We have a nightly test that runs on Ubuntu 18.04 and (I think) uses gcc 7.5.0 so we should support that. However, that test is failing. Also, it looks like we fixed a somewhat related (fails on older gcc) issue (#34317) recently so maybe that will help. |
Re: your patch. Did you confirm that was actually needed? Or would compiling with |
Probably what we want to do is create a small file that crashes and then use cmake's try_run to set a definition that we can use to fallback to the software implementation. I can do that part if you can help me figure out what program crashes. Can you see if the following program crashes on your system?
|
Interestingly THAT does NOT crash. Perhaps one of the optimization flags being passed to the compiler is at fault? I am testing more. |
I am running gcc 7.1. |
Ok, with the proper settings/compile flags I was able to build a pyarrow version 5 wheel file that installs and works on my hardware from un-patched version 5 source. I made this somewhat portable and I have tested it on the system I had the issue on and also on a modern Centos 8 virtual machine. After downloading it can be installed with the command: It is built for Python 3.9, and I have tested it with python 3.9.16. It is too big at 49MB to attach/upload directly, for here is a Google Drive link: Try this if you have been getting an error of illegal instruction when running import pyarrow. |
PS: If possible, please archive the wheel file I linked to, I have very little space left on my Google Drive and I may not leave it there forever. |
Can you help describe what some of the changes you made were or what the process was? The only official builds from an Apache project are builds that have been signed by the PMC and so, unfortunately, I am hesitant to offer hosting or any form of legitimacy to your wheel (there are security implications with downloading untrusted wheels). I suppose we can leave the link up though for others to download at their own risk. |
The google drive link is to a folder that contains both the whl and the source code for the python folder. I used gcc 7.1.1 to build that source. I the main change I ended up making was to the python/CMakeLists.txt, changing both setting for SIMD level to "NONE". To build it portably, you will also need a Linux system running a fairly old distribution. This is the build script I used for the cpp folder...
Then I ran this script to build the python (pyarrow) part...
|
I used openssl version 1.1.1s static libs to liink against so the resulting whl file is not dependent on a openssl shared library . |
Describe the bug, including details regarding any error messages, version, and platform.
pyarrow exits immediately on my server with illegal instruction after running
python
import pyarrow
It does this even when setting the environment variable ARROW_USER_SIMD_LEVEL to NONE.
This problem has been reported by several users and occurs on a few of my systems.
Component(s)
Python
The text was updated successfully, but these errors were encountered: