Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: refactor dates to be fully PG compatible #36938

Merged
merged 2 commits into from May 2, 2019

Conversation

Projects
None yet
5 participants
@mjibson
Copy link
Member

commented Apr 18, 2019

Previously dates were an int64 number of days from 1970-01-01. Functions
and operators needing to act on a date could manipulate the value of
the date without any overflow checks, causing multiple bugs:

  • The postgres binary wire protocol uses an int32 number of days since
    2000-01-01, thus it was possible to have dates in cockroach outside
    of the expressible range over the wire. We were blindly converting the
    int64 to an int32, possibly discarding data.
  • Adding dates to ints was done by normal addition. However this could
    overflow. Our normal int + int path uses overflow-checked addition,
    but dates were not subject to such checks.
  • Converting dates to a text representation (for sending over pgwire or
    our distsql serialization) converted the number of days to a time.Time
    using the standard library time.Date method, which (more-or-less) stores
    time as an int64 number of nanos. A very high year could overflow the
    nanosecond part causing it to silently by incorrect.
  • The infinity dates were not treated as infinity, but converted to an
    actual date.

This change adds a new pgdate.Date type. It internally represented as
an int32 number of days since 2000-01-01, because that's exactly what
is needed by the postgres wire protocol. It doesn't export any internal
properties, and instead forces users of it to use its API, which is
guaranteed to be safe. The API does all overflow and bounds checking,
and correctly handles the infinity dates. This fixes the above bugs
since it's now not possible to mistakenly misuse Dates.

When reading from any existing on-disk data, dates outside of the postgres
bounds are converted to +/- infinity. This presents a problem since
datums need to roundtrip to the same on-disk encoding so that old index
entries can be deleted. In order to round trip correctly, the original
on-disk integer is saved and used during serialization if it exists.

Fixes #36557

Release note (sql change): Dates are now fully Postgres-compatible,
including support for sentinel values (+/- infinity) and the Postgres
date range (4714-11-24 BC to 5874897-12-31). Existing dates outside of
this range will be converted to the +/- infinity dates.

@mjibson mjibson requested review from knz, bobvawter and yuzefovich Apr 18, 2019

@mjibson mjibson requested review from cockroachdb/cli-prs as code owners Apr 18, 2019

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

This change is Reviewable

@yuzefovich
Copy link
Contributor

left a comment

This is great stuff! The API is very clean!

And I also like the compatibility approach that you and Bram came up with. Just wondering: do we plan to ever remove orig fields to reduce the space usage? I understand that it'll break the compatibility with already existing indices, but maybe like in 2 or 3 major releases?

Reviewed 29 of 29 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @knz, and @mjibson)


pkg/sql/logictest/testdata/logic_test/datetime, line 143 at r1 (raw file):


query TTTT
SELECT 'infinity'::date + 1, 'infinity'::date - 1, '-infinity'::date + 1, '-infinity'::date - 1

[nit]: this query overlaps with the one a few above, possibly combine them.


pkg/sql/logictest/testdata/logic_test/datetime, line 1281 at r1 (raw file):

subtest regression_36146

statement error out of range

Why does it have a different error message?


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 853 at r1 (raw file):

1042  bpchar        1307062959    NULL      -1      false     b
1043  varchar       1307062959    NULL      -1      false     b
1082  date          1307062959    NULL      16      true      b

[question]: just to confirm my understanding, days take up 4 bytes, orig another 8, and hasOrig yet another 4 bytes, correct?


pkg/util/timeutil/pgdate/pgdate.go, line 18 at r1 (raw file):

// and times in a manner that is compatible with PostgreSQL.
//
// The implementation here is inspired by the following

[nit]: possibly this comment needs adjusting.


pkg/util/timeutil/pgdate/pgdate.go, line 69 at r1 (raw file):

	// instead of days when encoding for on-disk. This is required
	// so that we can roundtrip to the same on-disk value, which is
	// necessary deleting old index entries. This value should never

[nit]: necessary for deleting.


pkg/util/timeutil/pgdate/pgdate.go, line 138 at r1 (raw file):

}

func makeDateFromYMD(year, month, day int) Date {

I can't seem to find where this method is used. Also, is panicking here unavoidable? Maybe return an error instead?


pkg/util/timeutil/pgdate/pgdate.go, line 164 at r1 (raw file):

	secondsPerHour  = 60 * 60
	secondsPerDay   = 24 * secondsPerHour
	daysPer400Years = 365*400 + 97

Also I can't seem to find where these daysPer... are used.


pkg/util/timeutil/pgdate/pgdate.go, line 176 at r1 (raw file):

	case math.MaxInt32:
		buf.WriteString("infinity")
	default:

Do we need a special case for epoch here?


pkg/util/timeutil/pgdate/pgdate.go, line 262 at r1 (raw file):

	if !ok {
		return Date{}, pgerror.NewErrorf(pgerror.CodeDatetimeFieldOverflowError,
			"%s + %d is out of range", d, days)

[nit]: %s - %d.

@mjibson mjibson force-pushed the mjibson:year3 branch from 64cf7b3 to 0b90876 Apr 19, 2019

@mjibson
Copy link
Member Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @knz, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/datetime, line 143 at r1 (raw file):

Previously, yuzefovich wrote…

[nit]: this query overlaps with the one a few above, possibly combine them.

Removed, it was redundant.


pkg/sql/logictest/testdata/logic_test/datetime, line 1281 at r1 (raw file):

Previously, yuzefovich wrote…

Why does it have a different error message?

The original problem happened because of overflow during distsql serialization. But now it's not possible to even create that date.


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 853 at r1 (raw file):

Previously, yuzefovich wrote…

[question]: just to confirm my understanding, days take up 4 bytes, orig another 8, and hasOrig yet another 4 bytes, correct?

Go aligns things at word boundaries for adjacent properties. So days is 4, hasOrig is 1, and there's 3 unused bytes, but 8 total. Then orig is another 8.


pkg/util/timeutil/pgdate/pgdate.go, line 18 at r1 (raw file):

Previously, yuzefovich wrote…

[nit]: possibly this comment needs adjusting.

Done.


pkg/util/timeutil/pgdate/pgdate.go, line 69 at r1 (raw file):

Previously, yuzefovich wrote…

[nit]: necessary for deleting.

Done.


pkg/util/timeutil/pgdate/pgdate.go, line 138 at r1 (raw file):

Previously, yuzefovich wrote…

I can't seem to find where this method is used. Also, is panicking here unavoidable? Maybe return an error instead?

Removed.


pkg/util/timeutil/pgdate/pgdate.go, line 164 at r1 (raw file):

Previously, yuzefovich wrote…

Also I can't seem to find where these daysPer... are used.

Removed.


pkg/util/timeutil/pgdate/pgdate.go, line 176 at r1 (raw file):

Previously, yuzefovich wrote…

Do we need a special case for epoch here?

No. Epoch is an alias for 1970-01-01, it's not preserved as the word 'epoch'.


pkg/util/timeutil/pgdate/pgdate.go, line 262 at r1 (raw file):

Previously, yuzefovich wrote…

[nit]: %s - %d.

Done.

@knz
Copy link
Member

left a comment

This is very good work. I love that I was feeling like I was learning something while reading the code of the new date package. Eventually I have a dream we give dedicated packages with proper semantics to all the scalar types.

Only a few comments below.

Reviewed 24 of 29 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @mjibson, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/datetime, line 124 at r2 (raw file):

'0001-12-13 BC'::date

psql gives me " 0001-12-13 BC" on this
your test displays "0000-12-13"

is that intended?

(I suspect it is -- we're merely seeing Go's conversion of a time.Time to string, which can display year 0. However in that case I would invite you to also test/display the result of converting the date back to ::string. Thank you.)


pkg/sql/sem/tree/eval.go, line 701 at r2 (raw file):

			Fn: func(_ *EvalContext, left Datum, right Datum) (Datum, error) {
				a, b := MustBeDInt(left), MustBeDInt(right)
				r, ok := arith.SubWithOverflow(int64(a), int64(b))

This change does not belong to this commit. Please extract it to a separate commit in the sequence (or a subsequent PR) for clarity.


pkg/sql/sem/tree/eval.go, line 3111 at r2 (raw file):

			res = NewDInt(DInt(v.Unix()))
		case *DDate:
			// TODO(mjibson): This cast is unsupported by postgres. Should we remove ours?

Huh, why remove it - the semantics are super well defined.

I agree on the error, but out of curiosity: what is the SQL way to test whether a date is infinite?


pkg/sql/sem/tree/eval.go, line 3158 at r2 (raw file):

		case *DDate:
			// TODO(mjibson): This cast is unsupported by postgres. Should we remove ours?
			if !v.IsFinite() {

ditto


pkg/sql/sem/tree/eval.go, line 3180 at r2 (raw file):

			// TODO(mjibson): This cast is unsupported by postgres. Should we remove ours?
			if !v.IsFinite() {
				return nil, errDecOutOfRange

ditto


pkg/sql/sem/tree/eval.go, line 3322 at r2 (raw file):

			return d, nil
		case *DInt:
			// TODO(mjibson): This cast is unsupported by postgres. Should we remove ours?

Same comment as above - the semantics are well defined. I don't see a reason to remove.

@bobvawter
Copy link
Contributor

left a comment

Reviewed 24 of 29 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @mjibson, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/datetime, line 124 at r2 (raw file):

Previously, knz (kena) wrote…
'0001-12-13 BC'::date

psql gives me " 0001-12-13 BC" on this
your test displays "0000-12-13"

is that intended?

(I suspect it is -- we're merely seeing Go's conversion of a time.Time to string, which can display year 0. However in that case I would invite you to also test/display the result of converting the date back to ::string. Thank you.)

This a weird formatting case where year 0 is what we'd normally write as 1 BC. Year -1 is 2 BC, etc. We should probably avoid using go's stringification as a follow-on patch to make the output consistent with pg.


pkg/util/arith/arith.go, line 32 at r2 (raw file):

// Add32to64WithOverflow returns a+b. If ok is false, a+b overflowed.
func Add32to64WithOverflow(a int32, b int64) (r int32, ok bool) {
	if b > math.MaxInt32 || b < math.MinInt32 {

How are these functions supposed to behave? Are they guaranteeing that the operation can be performed and that the resulting value fits within the return type? Consider the case where a := MaxInt32 and b := MinInt32-1. I would expect this function to return -1 since the result is in range, even though it may have required performing the math using more bits.


pkg/util/timeutil/pgdate/pgdate.go, line 58 at r2 (raw file):

// Date is a postgres-compatible date implementation. It stores the number
// of seconds from the postgres epoch (2000-01-01). Its properties are

If it's storing seconds, why is the field called days?


pkg/util/timeutil/pgdate/pgdate.go, line 210 at r2 (raw file):

// Compare compares two dates.
func (d Date) Compare(other Date) int {

Is it valid to just return d.days - other.days?

@mjibson
Copy link
Member Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @knz, @mjibson, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/datetime, line 124 at r2 (raw file):

Previously, bobvawter (Bob Vawter) wrote…

This a weird formatting case where year 0 is what we'd normally write as 1 BC. Year -1 is 2 BC, etc. We should probably avoid using go's stringification as a follow-on patch to make the output consistent with pg.

Added a test for the string conversion.


pkg/sql/sem/tree/eval.go, line 701 at r2 (raw file):

Previously, knz (kena) wrote…

This change does not belong to this commit. Please extract it to a separate commit in the sequence (or a subsequent PR) for clarity.

Moved to second commit.


pkg/sql/sem/tree/eval.go, line 3111 at r2 (raw file):

Previously, knz (kena) wrote…

Huh, why remove it - the semantics are super well defined.

I agree on the error, but out of curiosity: what is the SQL way to test whether a date is infinite?

The semantics used to be well defined: return the integer number of days since the unix epoch. Now that we support infinity dates, they are no longer well defined. What should we do in the infinity case? Should we return infinity for float and decimals since they support it? Should we not check for infinity at all, but instead just return whatever's there (max/min int32 in the inf cases)? Since postgres doesn't do this we can't check for compatibility, but it definitely adds some questions without clear answers. Furthermore, we chose the unix epoch fairly arbitrarily (especially considering that postgres uses a different epoch underneath). A user just has to know that these conversions use the unix epoch number of days, which isn't obvious at all.

I believe postgres made the better design choice here which is to support only the date - date operator which returns the number of days between those two dates, and errors if either are infinity. This absolves the database of having to choose an arbitrary epoch, is conceptually more obvious than raw number conversion, and the infinity problem is a bit simpler also.


pkg/util/arith/arith.go, line 32 at r2 (raw file):

Previously, bobvawter (Bob Vawter) wrote…

How are these functions supposed to behave? Are they guaranteeing that the operation can be performed and that the resulting value fits within the return type? Consider the case where a := MaxInt32 and b := MinInt32-1. I would expect this function to return -1 since the result is in range, even though it may have required performing the math using more bits.

For now they are fine. I've added a comment describing the int32 size requirement of b so it's operation is clearer. But postgres doesn't support date - int8, only date - int4, so we are within the spec for now.


pkg/util/timeutil/pgdate/pgdate.go, line 58 at r2 (raw file):

Previously, bobvawter (Bob Vawter) wrote…

If it's storing seconds, why is the field called days?

seconds was a typo. Fixed the comment to say days.


pkg/util/timeutil/pgdate/pgdate.go, line 210 at r2 (raw file):

Previously, bobvawter (Bob Vawter) wrote…

Is it valid to just return d.days - other.days?

No, the API docs for tree.Datum.Compare specify +/- 1, not positive or negative.

@mjibson mjibson force-pushed the mjibson:year3 branch from 0b90876 to 89b76af Apr 19, 2019

@yuzefovich
Copy link
Contributor

left a comment

:lgtm:

Reviewed 3 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @bobvawter, @knz, @mjibson, and @yuzefovich)

@knz

knz approved these changes Apr 21, 2019

Copy link
Member

left a comment

:lgtm: but I still don't like a "TODO: remove this" for valid and well-defined features :)

Reviewed 3 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @bobvawter, @mjibson, and @yuzefovich)


pkg/sql/sem/tree/eval.go, line 3111 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

The semantics used to be well defined: return the integer number of days since the unix epoch. Now that we support infinity dates, they are no longer well defined. What should we do in the infinity case? Should we return infinity for float and decimals since they support it? Should we not check for infinity at all, but instead just return whatever's there (max/min int32 in the inf cases)? Since postgres doesn't do this we can't check for compatibility, but it definitely adds some questions without clear answers. Furthermore, we chose the unix epoch fairly arbitrarily (especially considering that postgres uses a different epoch underneath). A user just has to know that these conversions use the unix epoch number of days, which isn't obvious at all.

I believe postgres made the better design choice here which is to support only the date - date operator which returns the number of days between those two dates, and errors if either are infinity. This absolves the database of having to choose an arbitrary epoch, is conceptually more obvious than raw number conversion, and the infinity problem is a bit simpler also.

  • the infinity support is new, so it's just normal we have to ask the question and a good answer. If we don't want to answer this now, you can refuse the conversion from an infinity date with an error, like you did here. If we want an answer, I agree with you that likely conversion to/from floats and decimals should use infinities in that case. I don't have a clear opinion about integers (should remain an error until then).

  • conversions with numbers using the epoch as reference is a common choice for server software. It's also consistent with the other date/time/timestamp conversions. It's a matter of documentation (for people who don't expect epoch-based conversions already).

@bobvawter
Copy link
Contributor

left a comment

:lgtm:

Reviewed 4 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @mjibson and @yuzefovich)

@mjibson mjibson force-pushed the mjibson:year3 branch from 89b76af to 519e930 Apr 23, 2019

mjibson added some commits Apr 11, 2019

sql: refactor dates to be fully PG compatible
Previously dates were an int64 number of days from 1970-01-01. Functions
and operators needing to act on a date could manipulate the value of
the date without any overflow checks, causing multiple bugs:

- The postgres binary wire protocol uses an int32 number of days since
2000-01-01, thus it was possible to have dates in cockroach outside
of the expressible range over the wire. We were blindly converting the
int64 to an int32, possibly discarding data.
- Adding dates to ints was done by normal addition. However this could
overflow. Our normal int + int path uses overflow-checked addition,
but dates were not subject to such checks.
- Converting dates to a text representation (for sending over pgwire or
our distsql serialization) converted the number of days to a time.Time
using the standard library time.Date method, which (more-or-less) stores
time as an int64 number of nanos. A very high year could overflow the
nanosecond part causing it to silently by incorrect.
- The infinity dates were not treated as infinity, but converted to an
actual date.

This change adds a new pgdate.Date type. It internally represented as
an int32 number of days since 2000-01-01, because that's exactly what
is needed by the postgres wire protocol. It doesn't export any internal
properties, and instead forces users of it to use its API, which is
guaranteed to be safe. The API does all overflow and bounds checking,
and correctly handles the infinity dates. This fixes the above bugs
since it's now not possible to mistakenly misuse Dates.

When reading from any existing on-disk data, dates outside of the postgres
bounds are converted to +/- infinity. This presents a problem since
datums need to roundtrip to the same on-disk encoding so that old index
entries can be deleted. In order to round trip correctly, the original
on-disk integer is saved and used during serialization if it exists.

Fixes #36557

Release note (sql change): Dates are now fully Postgres-compatible,
including support for sentinel values (+/- infinity) and the Postgres
date range (4714-11-24 BC to 5874897-12-31). Existing dates outside of
this range will be converted to the +/- infinity dates.
sql/sem/tree: use arith package for sub overflow checking
This puts all the overflow logic into a single package.

@mjibson mjibson force-pushed the mjibson:year3 branch from 519e930 to 7a0b7b6 May 2, 2019

@mjibson

This comment has been minimized.

Copy link
Member Author

commented May 2, 2019

bors r+

craig bot pushed a commit that referenced this pull request May 2, 2019

Merge #36938
36938: sql: refactor dates to be fully PG compatible r=mjibson a=mjibson

Previously dates were an int64 number of days from 1970-01-01. Functions
and operators needing to act on a date could manipulate the value of
the date without any overflow checks, causing multiple bugs:

- The postgres binary wire protocol uses an int32 number of days since
2000-01-01, thus it was possible to have dates in cockroach outside
of the expressible range over the wire. We were blindly converting the
int64 to an int32, possibly discarding data.
- Adding dates to ints was done by normal addition. However this could
overflow. Our normal int + int path uses overflow-checked addition,
but dates were not subject to such checks.
- Converting dates to a text representation (for sending over pgwire or
our distsql serialization) converted the number of days to a time.Time
using the standard library time.Date method, which (more-or-less) stores
time as an int64 number of nanos. A very high year could overflow the
nanosecond part causing it to silently by incorrect.
- The infinity dates were not treated as infinity, but converted to an
actual date.

This change adds a new pgdate.Date type. It internally represented as
an int32 number of days since 2000-01-01, because that's exactly what
is needed by the postgres wire protocol. It doesn't export any internal
properties, and instead forces users of it to use its API, which is
guaranteed to be safe. The API does all overflow and bounds checking,
and correctly handles the infinity dates. This fixes the above bugs
since it's now not possible to mistakenly misuse Dates.

When reading from any existing on-disk data, dates outside of the postgres
bounds are converted to +/- infinity. This presents a problem since
datums need to roundtrip to the same on-disk encoding so that old index
entries can be deleted. In order to round trip correctly, the original
on-disk integer is saved and used during serialization if it exists.

Fixes #36557

Release note (sql change): Dates are now fully Postgres-compatible,
including support for sentinel values (+/- infinity) and the Postgres
date range (4714-11-24 BC to 5874897-12-31). Existing dates outside of
this range will be converted to the +/- infinity dates.

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
@craig

This comment has been minimized.

Copy link

commented May 2, 2019

Build succeeded

@craig craig bot merged commit 7a0b7b6 into cockroachdb:master May 2, 2019

4 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details
security/snyk - pkg/ui/package.json (Cockroach Labs) No manifest changes detected

@mjibson mjibson deleted the mjibson:year3 branch May 2, 2019

danhhz pushed a commit to danhhz/cockroach that referenced this pull request May 28, 2019

craig[bot]
Merge cockroachdb#36938
36938: sql: refactor dates to be fully PG compatible r=mjibson a=mjibson

Previously dates were an int64 number of days from 1970-01-01. Functions
and operators needing to act on a date could manipulate the value of
the date without any overflow checks, causing multiple bugs:

- The postgres binary wire protocol uses an int32 number of days since
2000-01-01, thus it was possible to have dates in cockroach outside
of the expressible range over the wire. We were blindly converting the
int64 to an int32, possibly discarding data.
- Adding dates to ints was done by normal addition. However this could
overflow. Our normal int + int path uses overflow-checked addition,
but dates were not subject to such checks.
- Converting dates to a text representation (for sending over pgwire or
our distsql serialization) converted the number of days to a time.Time
using the standard library time.Date method, which (more-or-less) stores
time as an int64 number of nanos. A very high year could overflow the
nanosecond part causing it to silently by incorrect.
- The infinity dates were not treated as infinity, but converted to an
actual date.

This change adds a new pgdate.Date type. It internally represented as
an int32 number of days since 2000-01-01, because that's exactly what
is needed by the postgres wire protocol. It doesn't export any internal
properties, and instead forces users of it to use its API, which is
guaranteed to be safe. The API does all overflow and bounds checking,
and correctly handles the infinity dates. This fixes the above bugs
since it's now not possible to mistakenly misuse Dates.

When reading from any existing on-disk data, dates outside of the postgres
bounds are converted to +/- infinity. This presents a problem since
datums need to roundtrip to the same on-disk encoding so that old index
entries can be deleted. In order to round trip correctly, the original
on-disk integer is saved and used during serialization if it exists.

Fixes cockroachdb#36557

Release note (sql change): Dates are now fully Postgres-compatible,
including support for sentinel values (+/- infinity) and the Postgres
date range (4714-11-24 BC to 5874897-12-31). Existing dates outside of
this range will be converted to the +/- infinity dates.

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>

(sapling split of efac5a4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.