Skip to content
This repository was archived by the owner on Dec 25, 2019. It is now read-only.

Dates and intervals#8

Merged
quodlibetor merged 6 commits intomasterfrom
dates-and-intervals
Aug 27, 2019
Merged

Dates and intervals#8
quodlibetor merged 6 commits intomasterfrom
dates-and-intervals

Conversation

@quodlibetor
Copy link
Copy Markdown
Contributor

This isn't ready yet but I think all the types are in place.

There are a couple breaking changes here, which I could fix by adding a ParsedDateTime::from_value(str, leading_field) that computes lazily, but that feels like it breaks the overall style of sqlparser of having everything fully parsed as is.

Still obviously left to do:

  • more tests
  • non-interval Date, Time, and DateTime parsing, which should just be a matter of using the parsing code in this commit.

@quodlibetor quodlibetor requested a review from benesch August 14, 2019 21:18
Copy link
Copy Markdown
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked through this in great detail, but it all seems quite sensible! Feel free to merge whenever you think it's ready.

@quodlibetor
Copy link
Copy Markdown
Contributor Author

How durationlike INTERVALs are actually handled

I'm going to walk through the 2016 spec, my understanding of it, and what other DBs are doing.

An interval is defined as INTERVAL '<datelike string>' <interval qualifier>.

An <interval qualifier> is basically:

<interval qualifier> ::=
    <field> TO <field>
  | <field>

where <field> is usually just e.g. YEAR/MONTH/SECOND but can be more complex.

The spec on page 41 states:

The actual subset of fields that comprise a value of either type of interval
is defined by an <interval qualifier> and this subset is known as the
precision of the value.

and on page 746 it says:

  1. An item qualified by an <interval qualifier> contains the datetime fields identified by the <interval qualifier>.
  2. Case:
    a. If the <interval qualifier> specifies a <single datetime field>, then the <interval qualifier> identifies a single <primary datetime field>. Any reference to the most significant or least significant <primary datetime field> of the item refers to that <primary datetime field>.
    b. Otherwise, the <interval qualifier> identifies those datetime fields from <start field> to <end field>, inclusive.

And 747 says:

iii. Otherwise, let participating datetime fields mean the datetime fields that are less significant than the <start field> and more significant than the <end field> of the <interval qualifier>.

There might be other semantic definitions of intervals in SQL '16, and '92 is just worded slightly differently, but I can't find them..

I misinterpreted the fact that there is a leading/trailing field to mean that if only one field is present it is supposed to be the most significant field present, but other databases (mysql/pg/cockroach) treat it as the least significant field.

Additionally, the words in the spec inline above seem to say that if you use the full <leading field> TO <end field> syntax you are only supposed to include the fields in between leading/end, but that is not what any DB does.

Instead, postgres just uses the smallest precision as the smallest value, and shifts things around to make that work:

-- a second, nice
SELECT INTERVAL '1' SECOND;
 00:00:01

-- huh, this is 1 hour 2 minutes, I would expect it to be 1 minute 2 seconds
SELECT INTERVAL '1:2' SECOND;
 01:02:00

-- okay if there is a decimal it must be a second
SELECT INTERVAL '2:3.0004' MINUTE;
 00:02:00
SELECT INTERVAL '2:3.0004' SECOND;
 00:02:03.0004

-- this is more expected
SELECT INTERVAL '1:2:3' SECOND;
 01:02:03

-- this is also reasonable with the new semantics
SELECT INTERVAL '1:2:3' MINUTE;
 01:02:00

-- I would expect this to be 00:02:03
SELECT INTERVAL '1:2:3' MINUTE TO SECOND;
 01:02:03

--  also pretty reasonable
SELECT INTERVAL '9 1:2' SECOND;
 9 days 01:02:00

MySQL on the other hand just has really bad handling of things:

  • you can't just SELECT an interval

  • If you specify something different in any way to what is provided in the
    input string you just get NULL back because why not:

    -- at least you get a warning I guess
    mysql> SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '9:9:9' HOUR_MINUTE) AS result;
    +--------+
    | result |
    +--------+
    | NULL   |
    +--------+
    mysql> show warnings;
    +---------+------+-----------------------------------------------------+
    | Level   | Code | Message                                             |
    +---------+------+-----------------------------------------------------+
    | Warning | 1441 | Datetime function: date_add_interval field overflow |
    +---------+------+-----------------------------------------------------+
    
    -- you are allowed to add numbers
    mysql> mysql> SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '9:9' HOUR_MINUTE) AS result;
    +---------------------+
    | result              |
    +---------------------+
    | 2000-01-01 09:09:00 |
    +---------------------+

Almost none of that is the way that it works in this PR.

Monthlike intervals

Okay I think that's all the things about duration-like datetimes, let's look at months, which the spec has this to say about them:

There are two classes of intervals. One class, called year-month intervals,
has an express or implied datetime precision that includes no fields other
than YEAR and MONTH, though not both are required. The other class, called
day-time intervals, has an express or implied interval precision that can
include any fields other than YEAR or MONTH.

Seems clear enough, let's look at an obvious failure:

SELECT INTERVAL '8-9 1:2:3.0004' SECOND;
 8 years 9 mons 01:02:03.0004

oh come on.

SELECT INTERVAL '7-8-9 1:2:3.0004' SECOND;
ERROR:  invalid input syntax for type interval: "7-8-9 1:2:3.0004"
LINE 1: SELECT INTERVAL '7-8-9 1:2:3.0004' SECOND;

SELECT INTERVAL '7-8 9 1:2:3.0004' SECOND;
 7 years 8 mons 9 days 01:02:03.0004

Postgres supports combinations of durationlike and timelike intervals 🤷‍♀️

MySQL returns null, similar to what it does for things that should really be legal.

I'm going to spend the weekend thinking about what the right thing to do here is.

Plan

My current inclination is to make the parsing and computed code more like postgres because despite the fact that it does not appear to be to spec it is kind of sane and less likely to cause people to lose data, which the spec seems like it is designed to cause people to lose data.

I do not plan to support combining year-month and durationlike intervals.

The postgres algorithm appears to be:

  • Always only use the most precise field provided -- in particular, ignore the
    leading field except to check that it is bigger than the end field.

  • If the interval string consists of a single number, then that number
    corresponds to the field.

    SELECT INTERVAL '1' DAY;
      1 day
    
    SELECT INTERVAL '1' DAY TO SECOND;
      00:00:01
  • If there's a colon then the first colon specifies hours, no matter what
    fields are provided.

  • For some reason this doesn't work:

    SELECT INTERVAL '1 2' DAY to SECOND;
    ERROR:  invalid input syntax for type interval: "1 2"
    LINE 1: SELECT INTERVAL '1 2' DAY to SECOND;
                            ^

    but these do:

    SELECT INTERVAL '1 2' DAY TO HOUR;
     1 day 02:00:00
    SELECT INTERVAL '1 2:3' DAY to SECOND;
     1 day 02:03:00
  • The only dash that's allowed is between year and month:

    -- okay
    SELECT INTERVAL '7-8 9 1:2:3.0004' SECOND;
     7 years 8 mons 9 days 01:02:03.0004
    -- invalid for some reason
    SELECT INTERVAL '7-8-9 1:2:3.0004' DAY;
    ERROR:  invalid input syntax for type interval: "7-8-9 1:2:3.0004"
    LINE 1: SELECT INTERVAL '7-8-9 1:2:3.0004' DAY;

@quodlibetor
Copy link
Copy Markdown
Contributor Author

quodlibetor commented Aug 19, 2019

Exciting followup, MySQL edition.

MySQL is a lot closer to my interpretation of the spec:

SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:2:3' SECOND);
+---------------------+
| 2000-01-01 00:00:01 |
+---------------------+

SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:2:3' HOUR_SECOND);
+---------------------+
| 2000-01-01 01:02:03 |
+---------------------+

SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '9 1:2:3' DAY_SECOND) as result;
+---------------------+
| 2000-01-10 01:02:03 |
+---------------------+

SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:2:3' MINUTE_SECOND);
+--------+
| NULL   |
+--------+

SHOW WARNINGS;
+---------+------+-----------------------------------------------------+
| Level   | Code | Message                                             |
+---------+------+-----------------------------------------------------+
| Warning | 1441 | Datetime function: date_add_interval field overflow |
+---------+------+-----------------------------------------------------+

--
-- MySQL cannot handle fractional seconds unless they are the only thing in the interval!
SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:3.004' MINUTE_SECOND);
---------+
-- NULL  |
---------+

SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '3.004' SECOND);
-------------------------------+
-- 2000-01-01 00:00:03.004000  |
-------------------------------+

And when you specify an interval with no field specifier in postgres you get
"something":

SELECT INTERVAL '1:2:3';
|------------|
| 1:02:03    |
+------------+

SELECT INTERVAL '2:3';
|------------|
| 2:03:00    |
+------------+

SELECT INTERVAL '1';
|------------|
| 0:00:01    |
+------------+

For most uses the most important takeaway is that:

  • For a single field (e.g. INTERVAL '1' DAY), MySQL and postgres have the same interpretation.
  • For correctly-specified multiple fields (e.g. INTERVAL '9 1:2:3' DAY TO SECOND [or DAY_SECOND for MySQL]) both return the same result (although mysql can't handle fractional seconds combined with other durations).
  • For incorrectly-specified multiple fields (e.g. INTERVAL '1:2:3' MINUTE TO SECOND) MySQL returns a NULL, and postgres just uses the precision, ignoring the significance (i.e. ignores the MINUTE, just parses optimistically to SECOND).

I think my plan has changed to maintaining the current parsing rules that I've implemented, and also to expose a convenience method that will allow SQL engines to get a "were more fields present in the interval string than the field specifier?" method to provide nice errors.

quodlibetor added a commit that referenced this pull request Aug 20, 2019
This allows parser-users to be more restrictive in the values that they accept
if they want to use the `IntervalValue::computed` method, because afaict no
databases in existence actually match any SQL spec for their interpretation of
interval literal values[1].

[1]: #8 (comment)
Copy link
Copy Markdown
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic!

Comment thread tests/sqlparser_common.rs
expected_duration_str: Option<&str>,
expected_field_match_erro: Option<&str>,
) {
println!("testing: {}", sql);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it easier to tell what sql statement caused test failures, since many of the tests test many statements. It is silent if the tests pass, I added a comment explaining that it's intentional.

Comment thread tests/sqlparser_common.rs Outdated
value: IntervalValue,
expected_computed: Interval,
expected_duration_str: Option<&str>,
expected_field_match_erro: Option<&str>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "erro" a standard abbreviation "err option?" or a typo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo!

quodlibetor added a commit that referenced this pull request Aug 26, 2019
This allows parser-users to be more restrictive in the values that they accept
if they want to use the `IntervalValue::computed` method, because afaict no
databases in existence actually match any SQL spec for their interpretation of
interval literal values[1].

[1]: #8 (comment)
quodlibetor added a commit that referenced this pull request Aug 26, 2019
This allows parser-users to be more restrictive in the values that they accept
if they want to use the `IntervalValue::computed` method, because afaict no
databases in existence actually match any SQL spec for their interpretation of
interval literal values[1].

[1]: #8 (comment)
@quodlibetor quodlibetor changed the title [WIP] Dates and intervals Dates and intervals Aug 27, 2019
ParsedDateTimes just represent all the values that were present in a datetime
string. For a regular `DateTime` object they will all be present, but for
`INTERVAL` they depend on the following token.
This will allow us to put some methods on it.
Knowing that we've got some optional number of years/seconds/minutes in a
`ParsedDateTime` isn't nearly as useful as being able to state that you have 13
months or 250,000 milliseconds. Now we can get either/or of those.
This allows parser-users to be more restrictive in the values that they accept
if they want to use the `IntervalValue::computed` method, because afaict no
databases in existence actually match any SQL spec for their interpretation of
interval literal values[1].

[1]: #8 (comment)
This uses some of the infrastructure added for Intervals to support extracting
the fields of `DATE 'yyyy-mm-dd'`.

It has a similar philosophical position -- it just extracts the fields into a
struct that has numeric fields for the dates, and does very little verification
-- the only thing that it does is verify that month and day are not 0 and fit
into a u8, since the exact number of days in any given month are variable.
@quodlibetor quodlibetor merged commit e4bf262 into master Aug 27, 2019
@quodlibetor quodlibetor deleted the dates-and-intervals branch August 27, 2019 19:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants