Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: the conversion rules for numeric types to interval are not the same as string to interval #57876

Open
knz opened this issue Dec 13, 2020 · 3 comments
Labels
A-sql-builtins SQL built-in functions and semantics thereof. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-starter Might be suitable for a starter project for new employees or team members. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@knz
Copy link
Contributor

knz commented Dec 13, 2020

Found by @petermattis:

> SELECT 2592000::interval, '2592000'::interval;
  interval | interval
-----------+------------
  1 mon    | 720:00:00

What is happening here:

  • '2592000'::interval uses the postgres string to interval conversion rules, which state that a number of seconds converted to interval should not expand to days / months and keep the interval to the hour unit.

  • 2592000::interval uses int -> interval cast, which is a crdb-specific extension, not recognized by postgres. In Peter's words “so we could fix it”

Or can we?


As the person who originally implemented this, I (@knz) 100% agree this was an oversight and should be corrected. The reason why this misdesign exists is that the casts appeared “easy” to implement: take the numeric value, convert it to a duration.Duration, then stick that in an interval.

The numeric to duration.Duration conversion functions were already there.

I had not appreciated that these conversion functions were expanding the seconds into days/months. Was that intentional? I don't know. (However, see below.)


How to fix this?

Before we go into details, we need to establish some clear goals:

  • the conversion behavior must be homogeneous between all numeric types (int, decimal, float). For example, the same integer value like 2592000 should not yield different intervals depending on whether it's converted to decimal or float first.

  • both conversion directions must remain consistent (they are today). So for example, converting 2592000 to interval and then back to int should yield the same value.

    (note however that we're OK if a starting value in interval form, converted to a numeric type, then back to interval, yields a different interval. That is possible and acceptable because the interval type contains more information than a numeric type, so we can expect data loss.)

Can we achieve this?

  • duration.FromInt64 / AsInt64 are only used by this cast. So we can change that to achieve alignment without fear of side effects.

  • duration.FromBigInt64 / AsBigInt64 is similar.

  • however, duration.FromFloat64 / AsFloat64 are odd.

These are used in multiple places:

  1. in the casts that are under consideration in this issue.
  2. AsFloat64 is used when converting from interval with extract(epoch from <interval>) (this is a postgres standard feature)
  3. both are used by percentile_cont, an aggregate function that reports fractions between input values in an aggregation bucket (also a standard pg feature)
  4. FromFloat64 is used to generate the interval values for SHOW LAST QUERY STATISTICS

I think case (4) will benefit from the proposed adjustment, because arguably SHOW LAST QUERY STATISTICS wants an interval expressed in hours/minutes/seconds and doesn't care about days/months.

However, I will recommend making sure we have ample testing for cases (2) and (3), with values that exceed a month, and ensure we don't diverge from postgres for those. (Maybe they are already incompatible with pg, I don't know? If they are, then they will also benefit from the fix.)

Of course, we can't let duration.FromFloat64 do "something different" from duration.FromInt64 because then the rules are not homogeneous any more.

cc @otan for triage.

@blathers-crl
Copy link

blathers-crl bot commented Dec 13, 2020

Hi @knz, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 13, 2020
@knz knz added A-sql-builtins SQL built-in functions and semantics thereof. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Dec 13, 2020
@knz knz added this to Triage in SQL Sessions - Deprecated via automation Dec 13, 2020
@rafiss rafiss moved this from Triage to Longer term backlog in SQL Sessions - Deprecated Jan 12, 2021
@rafiss rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021
@rafiss rafiss added the E-starter Might be suitable for a starter project for new employees or team members. label May 21, 2021
@rafiss rafiss moved this from Longer term backlog to Smaller fixes/improvements in SQL Sessions - Deprecated Nov 1, 2021
@arctica
Copy link

arctica commented May 22, 2022

This bug just bit me. I was wondering why my timestamp values were way way off and nailed it down to some decimal -> interval casts giving completely wrong results. Not every month has the same amount of seconds in it so a numerical conversion to an interval does not make sense unless one for example keeps the highest time unit to hours like postgres does. Even that could be slightly off due to the fact that leap seconds being a thing but for me personally that's not an issue.

To illustrate what mayham this can cause:

SELECT
    1640995123::interval,
    '1640995123'::interval,
    0::timestamp + 1640995123::interval,
    0::timestamp + '1640995123'::interval
{"years":52,"months":9,"days":2,"hours":23,"minutes":58,"seconds":43}
{"hours":455831,"minutes":58,"seconds":43}
2022-10-03 23:58:43
2021-12-31 23:58:43

It would be great if someone could revisit the interval type and maybe expand test coverage. Dates and times are a surprisingly tricky subject and yet so important to most programs.

@otan
Copy link
Contributor

otan commented May 23, 2022

thanks for letting us know arctica!

my 2c is that we should deprecate int+float/interval casts, since i actually think they don't quite map to each other as concepts. intervals in SQL standard are some Month+Day+Hour construct with confusing rules about mapping days -> months and the like (e.g. years can be treated as 365.25 days or 365 days depending on context - we have some complex logic and fallout from that a while ago, see #56667 as an example). PG indeed doesn't support this, i'm guessing for that reason and more.

cc @vy-ton do you think we should support this? if so, how do we think ints/floats map to intervals - unix seconds, or like the age builtin which has complex rules?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-builtins SQL built-in functions and semantics thereof. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-starter Might be suitable for a starter project for new employees or team members. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
SQL Sessions - Deprecated
Smaller fixes/improvements
Development

No branches or pull requests

4 participants