Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] parquet.read_table causes crashes on Windows Server 2016 w/ Xeon Processor #25432

Open
asfimport opened this issue Jul 7, 2020 · 9 comments

Comments

@asfimport
Copy link

Call to read_all() crashes immediately with a windows fatal error on version 0.16.0, 0.17.0, 0.17.1.  Downgrading to 0.15.1 fixes the problem.

Python scripts work fine on all other PCs, but on Server w/ Windows Server 2016 and Xeon processor, it crashes immediately.

ERROR:

Windows fatal exception: code 0xc000001d

Current thread 0x00001950 (most recent call first):
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 253 in read
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 605 in read
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1137 in read
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1281 in read_table
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 888 in upload_file
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 938 in upload_df_chunk
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in call
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 572 in init
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 208 in apply_async
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 765 in _dispatch
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 847 in dispatch_one_batch
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 1029 in call
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\utils\api_utils.py", line 32 in threaded
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in push_table
  File "pythonscript.py", line 152 in main
  File "pythonscript.py", line 171 in
Fatal Python error: Illegal instruction

Current thread 0x00001950 (most recent call first):
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 253 in read
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 605 in read
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1137 in read
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1281 in read_table
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 888 in upload_file
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 938 in upload_df_chunk
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in call
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 572 in init
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 208 in apply_async
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 765 in _dispatch
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 847 in dispatch_one_batch
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 1029 in call
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\utils\api_utils.py", line 32 in threaded
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in
  File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in push_table
  File "pythonscript.py", line 152 in main
  File "pythonscript.py", line 171 in

Environment: OS Name Microsoft Windows Server 2016 Standard
Version 10.0.14393 Build 14393
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name XXXXXXXXXXXXXXXXXXXXXXXXXXX
System Manufacturer VMware, Inc.
System Model VMware7,1
System Type x64-based PC
System SKU Unsupported
Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s)
BIOS Version/Date VMware, Inc. VMW71.00V.0.B64.1704120155, 4/12/2017
SMBIOS Version 2.7
BIOS Mode UEFI
BaseBoard Manufacturer Intel Corporation
BaseBoard Model Not Available
BaseBoard Name Base Board
Platform Role Desktop
Secure Boot State Unsupported
PCR7 Configuration Not Available
Windows Directory C:\Windows
System Directory C:\Windows\system32
Boot Device \Device\HarddiskVolume2
Locale United States
Hardware Abstraction Layer Version = "10.0.14393.3297"
User Name Not Available
Time Zone Eastern Daylight Time
Installed Physical Memory (RAM) 32.0 GB
Total Physical Memory 32.0 GB
Available Physical Memory 29.7 GB
Total Virtual Memory 40.0 GB
Available Virtual Memory 37.9 GB
Page File Space 8.00 GB
Page File C:\pagefile.sys
Device Guard Virtualization based security Not enabled
A hypervisor has been detected. Features required for Hyper-V will not be displayed.
Reporter: Kristopher Jong

Note: This issue was originally created as ARROW-9349. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Can you let us know what the processor ID is? This is almost certainly ARROW-7939

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Sorry I missed that it's E5-2699. This processor has AVX2 so I don't think it's ARROW-7939. If you can find out any information to determine what is the illegal instruction that is crashing the application that would help us figure out what's wrong

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Could you try using a nightly build pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --pre pyarrow. That would help us rule out the BMI2 issue

@asfimport
Copy link
Author

Kristopher Jong:
I installed pyarrow-0.18.0.dev551 build and it has the same behavior.

@asfimport
Copy link
Author

Kristopher Jong:
I can't figure out what the illegal instruction is, it appears to be the point when it switches into the DLL compiled instructions which is why my python script isn't picking up the exception stack.  The original behavior was no errors at all, it just causes the python script to fail with no exceptions.  Once I turned on the faulthandler, I was able to find it was a windows fatal error that was happening causing the failure.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
[~mparry] can you offer any guidance to [~kmj1104213] about how to determine what illegal instruction is causing the problem like you did in ARROW-9114?

@asfimport
Copy link
Author

Morgan Parry:
We just attached the Visual Studio debugger to the Python process. It trapped the illegal instruction and it was then apparent from the disassembly what the offender was.

I see from the env reported that this is running under VMWare. Note that this can mask CPU features, depending on the software version, configuration, specs of other machines in the cluster (i.e. it may mask to the lowest common denominator), etc. This is exactly what was happening in our case, which took a while to figure out - i.e. the physical CPU had AVX2 but the virtual one didn't.

@asfimport
Copy link
Author

Charles Surett:
Can someone make a debug build? I've been attempting to get a build with debug symbols but haven't had any luck.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
[~kmj1104213] do you know if you still run into this issue with the latest pyarrow release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant