Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FORMAT clause to convert datetime types to string and vice versa #2388 #7629

Conversation

TreeHunter9
Copy link
Contributor

@TreeHunter9 TreeHunter9 commented Jun 15, 2023

1. DATETIME TO STRING

The following flags are currently implemented for datetime to string conversion:

Format Pattern Description
YEAR Year (1 - 9999)
YYYY Last 4 digits of Year (0001 - 9999)
YYY Last 3 digits of Year (000 - 999)
YY Last 2 digits of Year (00 - 99)
Y Last 1 digits of Year (0 - 9)
Q Quarter of the Year (1 - 4)
MM Month (01 - 12)
MON Short Month name (Apr)
MONTH Full Month name (APRIL)
RM Roman representation of the Month (I - XII)
WW Week of the Year (01 - 53)
W Week of the Month (1 - 5)
D Day of the Week (1 - 7)
DAY Full name of the Day (MONDAY)
DD Day of the Month (01 - 31)
DDD Day of the Year (001 - 366)
DY Short name of the Day (Mon)
J Julian Day (number of days since January 1, 4712 BC)
HH / HH12 Hour of the Day (01 - 12) with period (AM, PM)
HH24 Hour of the Day (00 - 23)
MI Minutes (00 - 59)
SS Seconds (00 - 59)
SSSSS Seconds after midnight (0 - 86399)
FF1 - FF9 Fractional seconds with the specified accuracy
TZH Time zone in Hours (-14 - 14)
TZM Time zone in Minutes (00 - 59)
TZR Time zone Name

The dividers are:

Dividers
.
/
,
;
:
'space'
-

Patterns can be used without any dividers:

SELECT CAST(CURRENT_TIMESTAMP AS VARCHAR(50) FORMAT 'YEARMMDD HH24MISS') FROM RDB$DATABASE;
=========================
20230719 161757

However, be careful with patterns like DDDDD, it will be interpreted as DDD + DD.

It is possible to insert raw text into a format string with "": ... FORMAT '"Today is" DAY' - Today is MONDAY.
Also the format is case-insensitive, so YYYY-MM == yyyy-mm.
Example:

SELECT CAST(CURRENT_TIMESTAMP AS VARCHAR(45) FORMAT 'DD.MM.YEAR HH24:MI:SS "is" J "Julian day"') FROM RDB$DATABASE;
=========================
14.6.2023 15:41:29 is 2460110 Julian day

2. STRING TO DATETIME

The following flags are currently implemented for string to datetime conversion:

Format Pattern Description
YEAR Year
YYYY Last 4 digits of Year
YYY Last 3 digits of Year
YY Last 2 digits of Year
Y Last 1 digits of Year
MM Month (1 - 12)
MON Short Month name (Apr)
MONTH Full Month name (APRIL)
RM Roman representation of the Month (I - XII)
DD Day of the Month (1 - 31)
J Julian Day (number of days since January 1, 4712 BC)
HH / HH12 Hour of the Day (1 - 12) with period (AM, PM)
HH24 Hour of the Day (0 - 23)
MI Minutes (0 - 59)
SS Seconds (0 - 59)
SSSSS Seconds after midnight (0 - 86399)
FF1 - FF4 Fractional seconds with the specified accuracy
TZH Time zone in Hours (-14 - 14)
TZM Time zone in Minutes (0 - 59)

Dividers are the same as for datetime to string conversion and can also be omitted.
Example:

SELECT CAST('2000.12.08 12:35:30.5000' AS TIMESTAMP FORMAT 'YEAR.MM.DD HH24:MI:SS.FF4') FROM RDB$DATABASE;
=====================
2000-12-08 12:35:30.5000

@sim1984
Copy link

sim1984 commented Jun 16, 2023

This functionality is in the SQL:2016 standard, so it would be nice to add it to Firebird. The only thing that seems to me it would be nice to expand it with named time zones.

@omachtandras
Copy link

SELECT CAST(CURRENT_TIMESTAMP AS VARCHAR(45) FORMAT 'DD.MM.YEAR HH24:MI:SS "is" J "Julian day"') FROM RDB$DATABASE;

14.6.2023 15:41:29 is 2460110 Julian day

It would be good if it could be controlled in the case of the month and day so that it appears with a leading zero.

14.06.2023 15:41:29 is 2460110 Julian day

@asfernandes
Copy link
Member

This functionality is in the SQL:2016 standard, so it would be nice to add it to Firebird. The only thing that seems to me it would be nice to expand it with named time zones.

Can you quote relevant SQL:2016 section?

Also, are you aware of other DBMS engine implementing this?

@dyemanov
Copy link
Member

Adriano: #2388

@AlexPeshkoff
Copy link
Member

It would be good if it could be controlled in the case of the month and day so that it appears with a leading zero.

+1
No matter that it's not covered by SQL standard.

@TreeHunter9
Copy link
Contributor Author

I see 2 variants how to implement this:

  1. Remove case-insensitive and add lower case versions of already existing flags (MM -> mm, DD -> dd, etc).
  2. Add a prefix, that adds a zero at the beginning if number < 10 (0MM, 0DD, etc).

Which would be better?

@aafemt
Copy link
Contributor

aafemt commented Jun 16, 2023

AFAIR "MM" is used for zero-padded value and "M" for non-padded. The rest the same.

@TreeHunter9
Copy link
Contributor Author

AFAIR "MM" is used for zero-padded value and "M" for non-padded. The rest the same.

I thought about it, but it won't work with "DD", because "D" is already taken for Day of the Week. But maybe we can remove Day of the Week, if its not in standard, and use "D" for non-padded "DD".

@sim1984
Copy link

sim1984 commented Jun 16, 2023

the problem concerns not only days and months. Minutes and seconds can also be with or without leading zero.

src/common/classes/NoThrowTimeStamp.cpp Outdated Show resolved Hide resolved
src/common/classes/NoThrowTimeStamp.cpp Outdated Show resolved Hide resolved
src/common/common.h Show resolved Hide resolved
src/common/common.h Outdated Show resolved Hide resolved
src/common/cvt.cpp Outdated Show resolved Hide resolved
src/common/cvt.cpp Outdated Show resolved Hide resolved
src/common/cvt.cpp Outdated Show resolved Hide resolved
src/common/cvt.cpp Outdated Show resolved Hide resolved
src/common/cvt.cpp Outdated Show resolved Hide resolved
src/common/cvt.cpp Outdated Show resolved Hide resolved
@dyemanov
Copy link
Member

The standard does not seem to mention whether DD etc are zero-padded or not, but usually (in other languages) two-letter format abbreviation means exactly that.. So I'd suggest this being our default behaviour for all elements that are affected. This would also be consistent with our default date->char formatting:

select cast(cast('1.1.2020 1:1:1' as timestamp) as varchar(24)) from rdb$database;

CAST                     
======================== 
2020-01-01 01:01:01.0000 

note the leading zeroes.

@dyemanov
Copy link
Member

This functionality is in the SQL:2016 standard, so it would be nice to add it to Firebird. The only thing that seems to me it would be nice to expand it with named time zones.

Oracle uses TZD for time zone abbreviations and TZR for time zone names. They're non-standard, but could be followed.

@dyemanov
Copy link
Member

I'm wondering where the non-standard format specifiers originate from? Oracle, I guess?

BTW, this is the default Oracle behaviour for padding:

Oracle uses trailing blank characters and leading zeroes to fill format elements to a constant width. The width is equal to the display width of the largest element for the relevant format model:

Numeric elements are padded with leading zeros to the width of the maximum value allowed for the element. For example, the YYYY element is padded to four digits (the length of '9999'), HH24 to two digits (the length of '23'), and DDD to three digits (the length of '366').

The character elements MONTH, MON, DAY, and DY are padded with trailing blanks to the width of the longest full month name, the longest abbreviated month name, the longest full date name, or the longest abbreviated day name, respectively, among valid names determined by the values of NLS_DATE_LANGUAGE and NLS_CALENDAR parameters. For example, when NLS_DATE_LANGUAGE is AMERICAN and NLS_CALENDAR is GREGORIAN (the default), the largest element for MONTH is SEPTEMBER, so all values of the MONTH format element are padded to nine display characters. The values of the NLS_DATE_LANGUAGE and NLS_CALENDAR parameters are specified in the third argument to TO_CHAR and TO_* datetime functions or they are retrieved from the NLS environment of the current session.

So DD is 01, MM is 01, etc

@TreeHunter9
Copy link
Contributor Author

TreeHunter9 commented Jun 19, 2023

This functionality is in the SQL:2016 standard, so it would be nice to add it to Firebird. The only thing that seems to me it would be nice to expand it with named time zones.

Oracle uses TZD for time zone abbreviations and TZR for time zone names. They're non-standard, but could be followed.

If im correct, we cannot get timezone abbreviation specifically, its stored in same place as timezone name, so i only add "TZR". But i can not add unit tests for this flag because fb_get_master_interface is not available at this moment.

@TreeHunter9
Copy link
Contributor Author

I also have a question about truncating string, for example if we do CAST(CURRENT_TIMESTAMP AS VARCHAR(20)) we get a truncation error (-expected length 20, actual 38). But what I do is that if the result > expected length, we truncate the result and assign what's left. Maybe I should throw the same truncation error?

@asfernandes
Copy link
Member

I also have a question about truncating string, for example if we do CAST(CURRENT_TIMESTAMP AS VARCHAR(20)) we get a truncation error (-expected length 20, actual 38). But what I do is that if the result > expected length, we truncate the result and assign what's left. Maybe I should throw the same truncation error?

I think so.

@asfernandes
Copy link
Member

It is possible to insert raw text into a format string with "":

I do not have access to the SQL 2016 standard, and in #2388 I see nothing about quotes.

Is this standard?

Isn't there a way to put literal double quotes in the output?

@asfernandes
Copy link
Member

This looks incorrectly in relation to time zone:

select
    current_timestamp a,
    cast(current_timestamp as varchar(50) format 'YEAR-MM-DD HH24:MI:SS.FF4') b,
    cast(cast(current_timestamp as varchar(50) format 'YEAR-MM-DD HH24:MI:SS.FF4') as timestamp with time zone format 'YEAR-MM-DD HH24:MI:SS.FF4') c
  from rdb$database;

A 2023-06-28 08:38:20.2790 America/Sao_Paulo
B 2023-06-28 08:38:20.2790
C 2023-06-28 08:38:20.2790 +00:00

In the case of C, it should use session time zone.

@asfernandes
Copy link
Member

Is it correct that tokens left in the trailing part is not matched with the format and do not generate error?

SELECT CAST('2000.12.08 12:35:30.5000 XXX' AS TIMESTAMP FORMAT 'YEAR.MM.DD HH24:MI:SS.FF4') FROM RDB$DATABASE;

@asfernandes
Copy link
Member

Please put the documentation (currently in the PR description) in a README formatted as markdown.

@TreeHunter9
Copy link
Contributor Author

Is this standard?

It's not in standard. I saw it in PostgreSQL and I thought it would be a good addition.

Isn't there a way to put literal double quotes in the output?

No... I will fix it.

@TreeHunter9
Copy link
Contributor Author

Is it correct that tokens left in the trailing part is not matched with the format and do not generate error?

SELECT CAST('2000.12.08 12:35:30.5000 XXX' AS TIMESTAMP FORMAT 'YEAR.MM.DD HH24:MI:SS.FF4') FROM RDB$DATABASE;

From code perspective - it's correct, because we filled out our FORMAT, and there is no reason to continue parsing the input string.
But, perhaps, from the user's point of view there should be an error.

if (format.hasData())
{
dsqlScratch->appendUChar(blr_cast_format);
dsqlScratch->appendString(0, format);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are ignoring the (implicit or explicit) connection charset (not _WIN1252 in the example):

SELECT CAST(localTIMESTAMP AS VARCHAR(45) FORMAT _WIN1252 'DD.MM.YEAR HH24:MI:SS "is" J "Julian day"') FROM RDB$DATABASE;

if (format.isEmpty())
cb->err(Arg::Gds(isc_sysf_invalid_null_empty) << Arg::Str(STRINGIZE(format)));

auto invalidPatternException = [](std::string_view pattern, Callbacks* cb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string_view is C++17 feature, not allowed to use in current Firebird code.
But since this should not go to Firebird 5, I will propose adoption of C++17 for Firebird 6 code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see other people agree to move to C++ 17, so I switch back static to inline variables.

Artyom Ivanov and others added 7 commits July 5, 2023 12:41
@TreeHunter9 TreeHunter9 force-pushed the master_cast_format_datetime_to_string_and_vice_versa branch from 69daaa3 to 9b4f135 Compare August 7, 2023 14:23
@aafemt
Copy link
Contributor

aafemt commented Aug 7, 2023

I wonder why this code is so much complicated?

Template that is instantiated with only one type and only one callback parameter can be a plain function. Single-line callback for callback is mind-screwing.

to_upper() is locale-aware and is overkill in this case. If SQL standard doesn't require case insensitive format, I would make patterns case-sensitive.

Because you are not going to use locale-specific separators (do you?) everything that doesn't match pattern can be treated as plain text and appear in output as-is. In this case the parser is simplified to comparison with patterns starting from the longest and plain char* does the job.

@TreeHunter9
Copy link
Contributor Author

Template that is instantiated with only one type and only one callback parameter can be a plain function. Single-line callback for callback is mind-screwing.

Yea, it would be better.

to_upper() is locale-aware and is overkill in this case. If SQL standard doesn't require case insensitive format, I would make patterns case-sensitive.

It's not mentioned in standard, but I think it gives more freedom when using format patterns.
If others think its unnecessary, I can make it case-sensitive.

@asfernandes
Copy link
Member

I can make it case-sensitive.

I think you shouldn't.

| SSSSS | Seconds after midnight (0 - 86399) |
| FF1 - FF4 | Fractional seconds with the specified accuracy |
| TZH | Time zone in Hours (-14 - 14) |
| TZM | Time zone in Minutes (0 - 59) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why TZR is not supported here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about this... But this is very tricky to implement, because time zone names contains separators, for example Canada/East-Saskatchewan - it contains '/' and '-', and it's not easy to parse. I've written an implementation of this, but maybe it's too overwhelming for this kind of task. If you think it's not a good solution, I'll try to think of something else.

src/common/cvt.cpp Outdated Show resolved Hide resolved
src/common/cvt.cpp Outdated Show resolved Hide resolved
src/dsql/ExprNodes.cpp Outdated Show resolved Hide resolved
src/dsql/ExprNodes.h Outdated Show resolved Hide resolved
src/dsql/parse.y Outdated Show resolved Hide resolved
@asfernandes asfernandes merged commit 897ac0c into FirebirdSQL:master Oct 24, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement SQL standard FORMAT clause for CAST between string types and datetime types [CORE6507]
7 participants