Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timestampformat option for csv copy ignores configured timezone with timestamptz #11660

Closed
2 tasks done
tjbrown-intel opened this issue Apr 15, 2024 · 8 comments · Fixed by #11711
Closed
2 tasks done

Comments

@tjbrown-intel
Copy link

tjbrown-intel commented Apr 15, 2024

What happens?

The timestampformat argument described here allows you to customize the formatting of timestamps when converting to CSV but it ignores the configured TimeZone. When converting to CSV without specifying the timestampformat, the generated string uses the configured TimeZone. When converting to CSV and specifying the timestampformat, however, the generated string ignores the configured TimeZone and writes the timestamp in UTC.

To Reproduce

With the TimeZone set to America/Phoenix:

COPY (
    SELECT make_timestamptz(1713193669561000) AS t
) TO 'timestamp-noformat.csv' (FORMAT CSV);

generates 2024-04-15 08:07:49.561-07. Note the -07 timezone offset.

COPY (
    SELECT make_timestamptz(1713193669561000) AS t
) TO 'timestamp-format.csv' (FORMAT CSV, timestampformat '%x %X.%g%z');

generates 2024-04-15 15:07:49.561+00. Note that the timezone offset is now +00 even though the TimeZone is set to America/Phoenix.

OS:

Windows, Linux

DuckDB Version:

0.10.1

DuckDB Client:

JDBC

Full Name:

TJ Brown

Affiliation:

Intel

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@Tishj
Copy link
Contributor

Tishj commented Apr 15, 2024

With the TimeZone set to America/Phoenix
This can be done in multiple ways, please provide the code indicating the way you did it

I have a hunch this might be related to #11546

@tjbrown-intel
Copy link
Author

I hadn't explicitly set the TimeZone myself, but the following query

SELECT value
FROM duckdb_settings()
WHERE name = 'TimeZone';

returns America/Phoenix.

@Tishj
Copy link
Contributor

Tishj commented Apr 15, 2024

I think the cause is this:

   658                              Timestamp::Convert(input, date, time);
   659                              idx_t len = GetLength(date, time, 0, nullptr);
   660                              string_t target = StringVector::EmptyString(result, len);
-> 661                              FormatString(date, time, target.GetDataWriteable());
   662                              target.Finalize();
   663                              return target;
   664                      } else {

This has a version which takes a timezone, but that is never used here

@hawkfish
Copy link
Contributor

It can't use the TZ because it doesn't know how to do the conversion from the instant in the column to the offset value. ICU overrides strftime for TSTZ in ICUStrftime::Operation and does exactly this. So I would say the problem is here:

		} else if (!csv_data.options.write_date_format[LogicalTypeId::TIMESTAMP].Empty() &&
		           (csv_data.sql_types[col_idx].id() == LogicalTypeId::TIMESTAMP ||
		            csv_data.sql_types[col_idx].id() == LogicalTypeId::TIMESTAMP_TZ)) {
			// use the timestamp format to cast the chunk
=>			csv_data.options.write_date_format[LogicalTypeId::TIMESTAMP].ConvertTimestampVector(
			    input.data[col_idx], cast_chunk.data[col_idx], input.size());
		} else {

as this function is not extension-aware.

@Tishj
Copy link
Contributor

Tishj commented Apr 15, 2024

It can't use the TZ because it doesn't know how to do the conversion from the instant in the column to the offset value. ICU overrides strftime for TSTZ in ICUStrftime::Operation and does exactly this. So I would say the problem is here:

		} else if (!csv_data.options.write_date_format[LogicalTypeId::TIMESTAMP].Empty() &&
		           (csv_data.sql_types[col_idx].id() == LogicalTypeId::TIMESTAMP ||
		            csv_data.sql_types[col_idx].id() == LogicalTypeId::TIMESTAMP_TZ)) {
			// use the timestamp format to cast the chunk
=>			csv_data.options.write_date_format[LogicalTypeId::TIMESTAMP].ConvertTimestampVector(
			    input.data[col_idx], cast_chunk.data[col_idx], input.size());
		} else {

as this function is not extension-aware.

If we want to support this:

I wonder if we need to make the CopyFunctionCatalogEntry more modular
So CSV/JSON etc would create a CopyFunctionCatalogEntry, this contains a CatalogSet and inside that you can register CopyFunctionFormatCatalogEntry entries, then we could autoload the CopyFunctionFormatCatalogEntry for TZ when required.

Though this is part of a bigger problem of interleaving extensions (even though CSV isn't an extension, it still makes use of the Catalog to be discovered)

@Tishj
Copy link
Contributor

Tishj commented Apr 15, 2024

...

as this function is not extension-aware.

Though, thinking more about this, we have Cast functions registered by ICU right?
I imagine that's how the CSV COPY would solve this, by looking up the Cast function and using that?

@hawkfish
Copy link
Contributor

...
as this function is not extension-aware.

Though, thinking more about this, we have Cast functions registered by ICU right? I imagine that's how the CSV COPY would solve this, by looking up the Cast function and using that?

Casting doesn't take a format, but ICU also overrides strptime so maybe we could hook into that? Looking at the builtin strptime, it is just a wrapper for ConvertTimestampVector, so it wouldn't be a big change.

@Tishj
Copy link
Contributor

Tishj commented Apr 19, 2024

This has been fixed 👍

@Tishj Tishj closed this as completed Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants