Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: remove nanoseconds #8864

Merged
merged 1 commit into from Aug 29, 2016
Merged

sql: remove nanoseconds #8864

merged 1 commit into from Aug 29, 2016

Conversation

maddyblue
Copy link
Contributor

@maddyblue maddyblue commented Aug 27, 2016

Some months ago we discovered that the nanoseconds we were sending over
pgwire in the text format were not spec compliant. We changed cockroach
to round to micros by default and added 3 functions that could create
or display those nanoseconds. Those functions carried caveats that use
of timestamps without using those functions would result in problems.
#8804 is the first of those problems that we observed. The cockroach dump

subcommand did not know about nanoseconds and so would mutate
nanosecond data (well, round them to nanos), equating fields with equal
microseconds, and possibly duplicating rows if they were at a page
boundary. Teaching dump about nanoseconds is possible, but quite messy.

Instead, and after some discussion in #8804, we have decided it's best
to remove nanoseconds altogether mostly because the pg wire protocol
doesn't support them, and they were creating many issues. If we decide
at some future time we want this feature, we can create a TIMESTAMPNS
type with a custom OID that marshals to a string. We will wait on user
requests for this.

Relatedly, remove the nanosecond extract functions because they now
won't do what users expect (since those nanoseconds would always be zero
and a user would expect non-zero). If users want those they can multiply
the microsecond versions by 1000.

There are two places in the system code where nanos are still used in
SQL. First is the lease system, which has been changed to use micros
since the timestamps appear in the system.lease table. The second is in
AS OF SYSTEM TIME queries, which will retain nanosecond support because
those are converted into hlc.Timestamps (i.e., not a DTimestamp), to
be used in transactions, and at some future time they will support the
decimal physical + logical ticks format, which preserves nanoseconds.

Fixes #8804


This change is Reviewable

@maddyblue
Copy link
Contributor Author

This should not be merged until at least one co-founder reviews it.

@tbg
Copy link
Member

tbg commented Aug 27, 2016

LGTM (though you'll want to wait for @dt).

Some months ago we discovered that the nanoseconds we were sending over
pgwire in the text format were not spec compliant. We changed cockroach
to round to micros by default and added 3 functions that could create
or display those nanoseconds. Those functions carried caveats that use
of timestamps without using those functions would result in problems.

#8804 is the first of those problems that we observed. The `cockroach
dump` subcommand did not know about nanoseconds and so would mutate
nanosecond data (well, round them to nanos), equating fields with equal
microseconds, and possibly duplicating rows if they were at a page
boundary. Teaching dump about nanoseconds is possible, but quite messy.

Instead, and after some discussion in #8804, we have decided it's best
to remove nanoseconds altogether mostly because the pg wire protocol
doesn't support them, and they were creating many issues. If we decide
at some future time we want this feature, we can create a TIMESTAMPNS
type with a custom OID that marshals to a string. We will wait on user
requests for this.

Relatedly, remove the nanosecond `extract` functions because they now
won't do what users expect (since those nanoseconds would always be zero
and a user would expect non-zero). If users want those they can multiply
the microsecond versions by 1000.

There are two places in the system code where nanos are still used in
SQL. First is the lease system, which has been changed to use micros
since the timestamps appear in the system.lease table. The second is in
AS OF SYSTEM TIME queries, which will retain nanosecond support because
those are converted into hlc.Timestamps (i.e., not a DTimestamp), to
be used in transactions, and at some future time they will support the
decimal physical + logical ticks format, which preserves nanoseconds.

Fixes #8804
@bdarnell
Copy link
Member

LGTM


Review status: 0 of 5 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@dt
Copy link
Member

dt commented Aug 29, 2016

:lgtm:

We should also think about how to truncate existing on-disk nanos to avoid nasty surprises down the road.... would suggesting a UPDATE table SET ts = ts in release notes cut it?


Reviewed 5 of 5 files at r1.
Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@maddyblue
Copy link
Contributor Author

UPDATE t SET s = s + '0s' works. SET ts = ts doesn't do anything. Yes, this should be in the release notes.


Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@maddyblue maddyblue merged commit 6028355 into cockroachdb:develop Aug 29, 2016
@maddyblue maddyblue deleted the nanos branch August 29, 2016 17:36
@sploiselle
Copy link
Contributor

@mjibson Does this change also impact INTERVAL?

@maddyblue
Copy link
Contributor Author

This change didn't directly impact INTERVAL. INTERVAL has its own issue about if/how it should care about nanosconds, but I'm not able to find it.

@maddyblue maddyblue mentioned this pull request Oct 7, 2016
20 tasks
craig bot pushed a commit that referenced this pull request Jan 31, 2019
34202: sql: remove nanoseconds from INTERVAL r=mjibson a=mjibson

Nanoseconds were not representable over pgwire binary mode and were being
truncated. We previously encountered this problem with timestamps (#6597)
and removed nanoseconds from timestamps at that time. We should have
done the same for intervals, since they have the same kind of problem,
but did not.

It is no longer possible to create intervals with nanosecond
precision. Parsing from string or converting from float or decimal will
round to the nearest microsecond. Similarly any arithmetic operation
(add, sub, mul, div) on intervals will also round to nearest micro. We
round instead of truncate because that's what Postgres does.

Existing on-disk intervals that contain nanoseconds will retain their
underlying value when doing encode/decode operations (so that indexes can
be correctly maintained. However there is no longer any way to retrieve
the nanosecond part. Converting to string, float, or decimal will first
round and then convert.

The reasoning for this restriction on existing on-disk nanoseconds is
related to the original bug, where we were truncating nanos to micros over
binary pgwire. The problem there was that depending on how you queried
the data (text or binary mode), you would get a different result, and
one of them was wrong. Similarly, it would be wrong to have the results
of an interval -> string conversion return a different result than just
querying the interval.

It is unfortunate that upgrading from 2.1 -> 2.2 will completely remove
the ability for users to continue accessing their nanoseconds. Due to
that, we must describe in the major release notes this change. Users who
require nanoseconds to be present will have to modify their application
to use a different data type before upgrading. Further, applications
that do comparisons on intervals may have some edge cases errors due to
rounding and seeming equality. That is, some intervals with nanos will
be rounded up to the next microsecond, possibly changing the results of
an existing query. Also, it is not possible to compare equality to any
existing interval with on-disk nanos. We believe the number of users
affected by this will be very small, and that it is still a necessary
change because of the unavoidable pgwire binary mode bug above, which
may already have been unknowningly affecting them.

Other implementations were worked on, like one where the user could
specify the desired precision of each operation (similary to how
timestamps work). This ended up being very tedious since there
are many operations and they all required the same microsecond
precision. Timestamps are different since there are some operations
that actually do need nanosecond precision, but intervals have no such
need. Thus, it was better to remove the precision argument and hard
code rounding. Another attempt was made to replace Nanos with Micros,
with an additional nanos field to hold on-disk nanoseconds. This had
difficult problems since all of our encoding infra uses nanoseconds
on disk. Converting the Micros field to nanos increased the possibilty
of overflow due multiplying by 1000. Handling the possibility of this
overflow in all possibly locations would require many large and risky
changes.

The implementation changes here are a bit odd and surprising at
first. This change leaves the duration.Nanos field, but (excepting the
Decode func) automatically rounds Nanos to nearist micro. This does leave
open the possible misuse of the Nanos field, since durations are created
directly instead of via a constructor. However, I think this problem is
less of a risk as the other attempts listed above.

See #6604 and #8864 for previous PRs and discussion about this problem
when we fixed it for timestamps.

Fixes #32143

Release note (sql change): INTERVAL values are now stored with microsecond
precision instead of nanoseconds. Existing intervals with nanoseconds
are no longer able to return their nanosecond part. An existing table t
with nanoseconds in intervals of column s can round them to the nearest
microsecond with `UPDATE t SET s = s + '0s'`. Note that this could
potentially cause uniqueness problems if the interval is a primary key.

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants