Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Allow setting path to timezone db through python API #35600

Closed
rohanjain101 opened this issue May 15, 2023 · 7 comments · Fixed by #37436
Closed

[Python] Allow setting path to timezone db through python API #35600

rohanjain101 opened this issue May 15, 2023 · 7 comments · Fixed by #37436

Comments

@rohanjain101
Copy link

Describe the enhancement requested

The timezone DB is required to be at %USERPROFILE%\Downloads\tzdata in pyarrow on windows. It would be nice to have a python API to change the path where this db should be found, I believe this is currently only possible through the C++ implementation using arrow:initialize.

It would also be nice if the timezone db was not required if tzdata/pytz are installed.

Component(s)

Python

@jorisvandenbossche
Copy link
Member

It would also be nice if the timezone db was not required if tzdata/pytz are installed.

If I remember correctly, the problem here is that this uses a different format than what the tz implementation we use (https://howardhinnant.github.io/date/tz.html) expects.
See eg #28868 (comment) and #31472 for previous discussion about this.

@jorisvandenbossche jorisvandenbossche changed the title Allow setting path to timezone db in pyarrow through python API [Python] Allow setting path to timezone db through python API May 16, 2023
@jorisvandenbossche
Copy link
Member

And the R bindings already expose this, which essentially is a small wrapper around the setting a C++ option (arrow::GlobalOptions.timezone_db_path):

arrow/r/src/config.cpp

Lines 38 to 47 in f6e4479

void set_timezone_database(cpp11::strings path) {
auto paths = cpp11::as_cpp<std::vector<std::string>>(path);
if (path.size() != 1) {
cpp11::stop("Must provide a single path to the timezone database.");
}
arrow::GlobalOptions options;
options.timezone_db_path = std::make_optional(paths[0]);
arrow::StopIfNotOk(arrow::Initialize(options));
}

So it should certainly be possible to add a similar function to pyarrow.

@mattijn
Copy link

mattijn commented Jun 7, 2023

Thanks for all the work on arrow!

I'm on windows and when running the following:

from datetime import datetime
import pyarrow.compute as pc

pc.strftime(datetime(2004, 8, 1))

I get the following error:

ArrowInvalid: Cannot locate timezone 'UTC': Timezone database not found at "D:\Users\Hoek\Downloads\tzdata"

Observe, that the error path refers to D:.

But in OP the following is mentioned:

The timezone DB is required to be at %USERPROFILE%\Downloads\tzdata in pyarrow on windows.

But if I check:

! echo %USERPROFILE%
C:\Users\Hoek

and

from pathlib import Path
Path.home()
WindowsPath('C:/Users/Hoek')

It is both linking to C:. How you know it is on D:? D: is right though..
There is no Downloads folder for me on C:\Users\Hoek (therefor https://stackoverflow.com/a/74292266/2459096 errored)

I assume it is just early days and this user experience will improve soon.
Again, thanks for all the hard work!

@jorisvandenbossche
Copy link
Member

@mattijn what does import os; os.path.expanduser("~/Downloads") return for you?

@mattijn
Copy link

mattijn commented Jun 7, 2023

'C:\\Users\\Hoek/Downloads'

Apparently my Downloads location is not on C. See Downloads dialog:

@jorisvandenbossche
Copy link
Member

So then the error message could be expected? On the C++ side, it is using wordexp for the path expansion of ~\Downloads, no idea how that exactly behaves on Windows.

@mattijn
Copy link

mattijn commented Jun 7, 2023

Yes. The error message is right indeed, C++ gets the right location using wordexp.
Using %USERPROFILE% or os.path.expanduser("~/Downloads") not.

@AlenkaF AlenkaF self-assigned this Aug 29, 2023
@AlenkaF AlenkaF added this to the 14.0.0 milestone Oct 3, 2023
jorisvandenbossche pushed a commit that referenced this issue Oct 5, 2023
…PI (#37436)

### Rationale for this change

Add a function to change the path where timezone db should be found as a small wrapper around the setting of a C++ option `GlobalOptions`.

### What changes are included in this PR?

New function `configure_tzdb`.

### Are these changes tested?

### Are there any user-facing changes?

No.
* Closes: #35600

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
jorisvandenbossche pushed a commit that referenced this issue Oct 10, 2023
…nstall docs (#38146)

### What changes are included in this PR?

The option to set custom path to timezone database through python API has been implemented in #35600 and is documented in `docs/source/python/install.rst` with this PR.

* Closes: #38145

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…thon API (apache#37436)

### Rationale for this change

Add a function to change the path where timezone db should be found as a small wrapper around the setting of a C++ option `GlobalOptions`.

### What changes are included in this PR?

New function `configure_tzdb`.

### Are these changes tested?

### Are there any user-facing changes?

No.
* Closes: apache#35600

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…thon install docs (apache#38146)

### What changes are included in this PR?

The option to set custom path to timezone database through python API has been implemented in apache#35600 and is documented in `docs/source/python/install.rst` with this PR.

* Closes: apache#38145

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…thon API (apache#37436)

### Rationale for this change

Add a function to change the path where timezone db should be found as a small wrapper around the setting of a C++ option `GlobalOptions`.

### What changes are included in this PR?

New function `configure_tzdb`.

### Are these changes tested?

### Are there any user-facing changes?

No.
* Closes: apache#35600

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…thon install docs (apache#38146)

### What changes are included in this PR?

The option to set custom path to timezone database through python API has been implemented in apache#35600 and is documented in `docs/source/python/install.rst` with this PR.

* Closes: apache#38145

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…thon API (apache#37436)

### Rationale for this change

Add a function to change the path where timezone db should be found as a small wrapper around the setting of a C++ option `GlobalOptions`.

### What changes are included in this PR?

New function `configure_tzdb`.

### Are these changes tested?

### Are there any user-facing changes?

No.
* Closes: apache#35600

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…thon install docs (apache#38146)

### What changes are included in this PR?

The option to set custom path to timezone database through python API has been implemented in apache#35600 and is documented in `docs/source/python/install.rst` with this PR.

* Closes: apache#38145

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants