Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create FAQs for numbering problems. #3104

Merged
merged 1 commit into from
May 10, 2018

Conversation

knz
Copy link
Contributor

@knz knz commented May 5, 2018

As discussed on cockroachdb/cockroach#9227 and other related issues.

Jesse please take this over.

@knz knz requested review from bdarnell and jseldess May 5, 2018 01:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented May 6, 2018

Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


_includes/faq/sequential-numbers.md, line 5 at r1 (raw file):

{{site.data.alerts.callout_info}}Sequences produce <emph>unique</emph> values, however not all values are guaranteed to be produced (e.g., when a transaction is canceled after it consumes a value) and the values may be slightly reordered (e.g., when a transaction that consumes a lower sequence number commits after a transaction that consumes a higher number).{{site.data.alerts.end}}

{{site.data.alerts.callout_info}}For maximum performance, avoid using sequences to generate row IDs or indexed columns. This is because sequence values are logically close to each other and can cause contention on few data ranges during inserts. Instead, prefer <code>UUID</code>  identifiers or integer identifiers generated with <code>unique_rowid()</code>.{{site.data.alerts.end}}

unique_rowid() is still mostly sequential - it will perform more like sequences than like uuid with respect to contention. The main difference between unique_rowid() and sequences is that sequences produce smaller numbers (but generating them is much slower).

So we should recommend that if you need a roughly-ordered id, use unique_rowid() unless you need the numbers to be small. And unless you specifically need a roughly-ordered id, you're probably better off using a UUID (it's important to emphasize this point to wean people off the habits of sequential IDs from other databases).


v2.0/sql-faqs.md, line 21 at r1 (raw file):

{% include faq/sequential-numbers.md %}

## How do I totally order writes to a table over time in CockroachDB?

I don't think "totally order" is the right term to use here. All rows in CRDB are totally ordered in the mathematical sense . The question here is how to make that total order correspond with insertion order.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 8, 2018

Amended based on Ben's suggestion. Also added a comparison table. RFAL.


Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions.


_includes/faq/sequential-numbers.md, line 5 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

unique_rowid() is still mostly sequential - it will perform more like sequences than like uuid with respect to contention. The main difference between unique_rowid() and sequences is that sequences produce smaller numbers (but generating them is much slower).

So we should recommend that if you need a roughly-ordered id, use unique_rowid() unless you need the numbers to be small. And unless you specifically need a roughly-ordered id, you're probably better off using a UUID (it's important to emphasize this point to wean people off the habits of sequential IDs from other databases).

Thanks for reminding me. Amended.


v2.0/sql-faqs.md, line 21 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't think "totally order" is the right term to use here. All rows in CRDB are totally ordered in the mathematical sense . The question here is how to make that total order correspond with insertion order.

Yes, you're right. Rephrased.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented May 8, 2018

:lgtm:


Reviewed 5 of 5 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

| Value distribution             | Uniformly distributed (128 bits)        | Contains time and space (node ID) components | Dense, small values            |
| Performance cost at generation | Small, scalable                         | Small, scalable                              | Variable, can cause contention |
| Locality                       | Maximally distributed, least contention | Somewhat local, may cause INSERT contention  | Highly local, most INSERT contention |

The locality issue is not about insert contention: unique_rowid values are extremely unlikely to directly contend with each other. The difference between unique_rowid and UUID is parallelism: looking up values by a UUID can make use of many nodes because the values will be spread across many ranges. Typical queries by a more time-ordered id will create more hotspots.


v2.0/sql-faqs.md, line 37 at r2 (raw file):

- [On the Way to Better SQL Joins](https://www.cockroachlabs.com/blog/better-sql-joins-in-cockroachdb/)

## How do I get the last ID/SERIAL value inserted into a table?

Add this to the 2.1 docs too.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 9, 2018

Review status: 3 of 5 files reviewed at latest revision, 2 unresolved discussions.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The locality issue is not about insert contention: unique_rowid values are extremely unlikely to directly contend with each other. The difference between unique_rowid and UUID is parallelism: looking up values by a UUID can make use of many nodes because the values will be spread across many ranges. Typical queries by a more time-ordered id will create more hotspots.

I reworded, can you have another look?


v2.0/sql-faqs.md, line 37 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Add this to the 2.1 docs too.

It is there already.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented May 9, 2018

Review status: 0 of 5 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

Previously, knz (kena) wrote…

I reworded, can you have another look?

For "data locality" and "read performance", sequences and unique_rowid are equivalent. We don't need to make a distinction between them except for insert performance.


_includes/faq/differences-between-numberings.md, line 9 at r3 (raw file):

| Value distribution                  | Uniformly distributed (128 bits)        | Contains time and space (node ID) components  | Dense, small values            |
| Data locality                       | Maximally distributed                   | Values generated close in time are co-located | Highly local                   |
| INSERT performance when used as key | Highest                                 | Lower for values generated close in time      | Slowest                        |

Break "insert performance" into latency and throughput buckets. For latency, both uuid and unique_rowid have good latency while sequences are slower. For throughput, UUID has best throughput while unique_rowid and sequences are limited.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 9, 2018

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

For "data locality" and "read performance", sequences and unique_rowid are equivalent. We don't need to make a distinction between them except for insert performance.

Well I don't agree with that. Two rowids generated a day apart will have values far apart. Sequences are guaranteed to be close to each other. Or are you saying that the "value distance" due to time distance for rowids does not matter?


_includes/faq/differences-between-numberings.md, line 9 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Break "insert performance" into latency and throughput buckets. For latency, both uuid and unique_rowid have good latency while sequences are slower. For throughput, UUID has best throughput while unique_rowid and sequences are limited.

The latency increases under contention, doesn't it?


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented May 9, 2018

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

Previously, knz (kena) wrote…

Well I don't agree with that. Two rowids generated a day apart will have values far apart. Sequences are guaranteed to be close to each other. Or are you saying that the "value distance" due to time distance for rowids does not matter?

The "value distance" only matters if there are other keys between them. Each key will (usually) be adjacent to the one generated before it, whether the difference between the keys is 1 or 1000.


_includes/faq/differences-between-numberings.md, line 9 at r3 (raw file):

Previously, knz (kena) wrote…

The latency increases under contention, doesn't it?

Yes. Under low traffic, UUID and unique_rowid will have the same latency. As traffic increases, unique_rowid latency will degrade while UUID insertion latency will stay the same.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 9, 2018

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The "value distance" only matters if there are other keys between them. Each key will (usually) be adjacent to the one generated before it, whether the difference between the keys is 1 or 1000.

Check. Now I get it (I think).


_includes/faq/differences-between-numberings.md, line 9 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Yes. Under low traffic, UUID and unique_rowid will have the same latency. As traffic increases, unique_rowid latency will degrade while UUID insertion latency will stay the same.

Ok updated.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented May 9, 2018

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

Previously, knz (kena) wrote…

Check. Now I get it (I think).

For the data locality line, I think both unique_rowid and sequences are "highly local". There's not a meaningful difference between the two.

Similarly, what difference is "somewhat time-ordered" vs "highly time-ordered" supposed to indicate? I'd say that both are equally time-ordered. I think the amount of concurrency required to have out-of-order insertions is similar for both.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 9, 2018

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions.


_includes/faq/differences-between-numberings.md, line 7 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

For the data locality line, I think both unique_rowid and sequences are "highly local". There's not a meaningful difference between the two.

Similarly, what difference is "somewhat time-ordered" vs "highly time-ordered" supposed to indicate? I'd say that both are equally time-ordered. I think the amount of concurrency required to have out-of-order insertions is similar for both.

Yes ok I agree with this too.


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


Comments from Reviewable

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, @knz. LGTM, with some nits and minor copyedits. Nothing glaring though, so I'll merge and make edits in a follow-up PR.


{% include faq/sequential-numbers.md %}

~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this extra empty code block.


{% include faq/sequential-numbers.md %}

~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this extra empty code block.

Sequential numbers can be generated in CockroachDB using the built-in
function `unique_rowid()` or using [SQL sequences](create-sequence.html).

{{site.data.alerts.callout_info}}Unless you need roughly-ordered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use <code></code> instead of backticks within a callout.

write ordering can be solved with other, more distribution-friendly
solutions instead. For example, CockroachDB's [time travel queries
(`AS OF SYSTEM
TIME`)](https://www.cockroachlabs.com/blog/time-travel-queries-select-witty_subtitle-the_future/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once https://github.com/cockroachdb/docs/pull/3018/files lands, we'll have a specific doc page to link to here, which I think it preferable to the blog post.

- initially: `CREATE TABLE cnt(val INT PRIMARY KEY); INSERT INTO cnt(val) VALUES(1);`
- in each transaction: `INSERT INTO cnt(val) SELECT max(val)+1 FROM cnt RETURNING val;`

This will cause all your INSERT transactions to conflict with each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should code-format INSERT here.

@jseldess jseldess merged commit 4c6eb7e into cockroachdb:master May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants