Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] ORC Reader aborts when timezone file is missing #40633

Open
WillAyd opened this issue Mar 18, 2024 · 10 comments
Open

[Python] ORC Reader aborts when timezone file is missing #40633

WillAyd opened this issue Mar 18, 2024 · 10 comments

Comments

@WillAyd
Copy link
Contributor

WillAyd commented Mar 18, 2024

Describe the bug, including details regarding any error messages, version, and platform.

This is an upstream report of pandas-dev/pandas#56292

I noticed when running the pandas test suite I was getting this error:

pandas/tests/io/test_orc.py::test_orc_reader_basic terminate called after throwing an instance of 'orc::TimezoneError'
  what():  Can't open /usr/share/zoneinfo/US/Pacific
Fatal Python error: Aborted

Current thread 0x00007eff1a912780 (most recent call first):

The workaround is to create that timezone file:

$ sudo mkdir -p /usr/share/zoneinfo/US
$ sudo ln -s /usr/share/zoneinfo/America/Los_Angeles /usr/share/zoneinfo/US/Pacific

Although I think the error should be handled more gracefully than via abort

Component(s)

Python

@kou
Copy link
Member

kou commented Mar 19, 2024

@wgtmac will improve this.
See also:

@wgtmac wgtmac self-assigned this Mar 19, 2024
@wgtmac
Copy link
Member

wgtmac commented Mar 19, 2024

This seems to be related to the installed version of tz database on the test machine. I checked my laptop and the path /usr/share/zoneinfo/US/Pacific exists. Could you verify the version by checking /usr/share/doc/tzdata/version file? @WillAyd

@WillAyd
Copy link
Contributor Author

WillAyd commented Mar 19, 2024

That file does not exist for me. This is running popOS 22.04

@kou
Copy link
Member

kou commented Mar 20, 2024

Could you try installing the tzdata-legacy package?

@WillAyd
Copy link
Contributor Author

WillAyd commented Mar 20, 2024

I don't see that package for 22.04 - I think first appeared in 23.04?

@kou
Copy link
Member

kou commented Mar 21, 2024

Oh, sorry. Could you install tzdata?

@WillAyd
Copy link
Contributor Author

WillAyd commented Mar 21, 2024

It is already installed - tzdata is already the newest version (2024a-0ubuntu0.22.04).

@kou
Copy link
Member

kou commented Mar 21, 2024

Hmm. tzdata must install /usr/share/zoneinfo/US/Pacific: https://packages.ubuntu.com/jammy/all/tzdata/filelist

@WillAyd
Copy link
Contributor Author

WillAyd commented Mar 21, 2024

Ah OK - interesting indeed. That must have been deleted off of my system somehow, but I do see that in a recovery OS.

Happy to close this issue if we want to chalk it up to an unsupported system configuration

dongjoon-hyun pushed a commit to apache/orc that referenced this issue Mar 22, 2024
### What changes were proposed in this pull request?

Enable TestTimezone.testMissingTZDB unit test to run on Windows.

### Why are the changes needed?

When /usr/share/zoneinfo is unavailable and TZDIR env is unset, creating C++ ORC reader will crash on Windows. We need to better deal with this case. See context from the Apache Arrow community: apache/arrow#36026 and apache/arrow#40633

### How was this patch tested?

Make sure the test passes on Windows.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #1856 from wgtmac/win_tz_test.

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit to apache/orc that referenced this issue Mar 22, 2024
### What changes were proposed in this pull request?

Enable TestTimezone.testMissingTZDB unit test to run on Windows.

### Why are the changes needed?

When /usr/share/zoneinfo is unavailable and TZDIR env is unset, creating C++ ORC reader will crash on Windows. We need to better deal with this case. See context from the Apache Arrow community: apache/arrow#36026 and apache/arrow#40633

### How was this patch tested?

Make sure the test passes on Windows.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #1856 from wgtmac/win_tz_test.

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@aureliobarbosa
Copy link

Could you try installing the tzdata-legacy package?

I also observed pyarrow breaking while processing ORC files, due to inexistent IANA keys. Those were observed on running the pandas test suit locally, but just trying to read some pre-existent ORC files completely broke python and ipython. My setup includes Ubuntu Mantic, Python 3.11 and tzdata version 2024.1.

At least in my case, installing tzdata-legacy system wide was enough to get ride of those errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants