Skip to content

Conversation

knz
Copy link
Contributor

@knz knz commented May 17, 2018

I've been seeing this so many times everywhere (on the forum, on gitter, from sales etc) that I am surprised we didn't FAQ this earlier. Here it is.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented May 17, 2018

Once the copy-editing is done I'll do the propagation to the other version directories

@knz knz force-pushed the 20180517-txn-contention branch from bf002dc to 4a8ee43 Compare May 17, 2018 11:37
@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

@knz knz force-pushed the 20180517-txn-contention branch from 4a8ee43 to a903219 Compare May 17, 2018 11:49
@cockroach-teamcity
Copy link
Member

@knz knz force-pushed the 20180517-txn-contention branch from a903219 to 8036276 Compare May 17, 2018 11:53
@cockroach-teamcity
Copy link
Member

@knz knz force-pushed the 20180517-txn-contention branch from 8036276 to 8915bc8 Compare May 17, 2018 12:11
v2.1/sql-faqs.md Outdated
over table rows with the same index key values (either on [primary
keys](primary-key.html) or secondary [indexes](indexes.html)) or using
index key values that are close to each other, and thus place the
indexed data on the same [data ranges](architecture/overview.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table rows with the same index key values within the same column family, no?a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I wrote first. But for identifying contention column families don't matter: we don't split rows with the same PK across different ranges, so even accesses to different column families land on the same range.

The distinction matters for single-key serialization, which I detail further below.

@cockroach-teamcity
Copy link
Member

@knz knz force-pushed the 20180517-txn-contention branch 2 times, most recently from 76395ea to 9fd8a3f Compare May 17, 2018 12:17
@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

@bdarnell
Copy link
Contributor

:lgtm:

Does this belong in FAQs or the performance best practices page? It's definitely a frequently-encountered issue, but "what is transaction contention" is not the way that people phrase the question. Is there another title we can give this that would be a better match for the users' language? (other than something vague like "why is it so slow")


Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


v2.1/sql-faqs.md, line 58 at r2 (raw file):

- At least some of the transactions write or modify the data.

Transaction contention will cause contended transactions to be

"A set of transactions that all contend on the same keys will be limited in performance to the maximum processing speed of a single node..."

It is possible for a set of transactions that all contend with each other to contend on different keys and therefore still get some parallelism, but we don't need to go into that much detail here.


v2.1/sql-faqs.md, line 69 at r2 (raw file):

  node (the range lease holder). Performance can increase if CockroachDB
  can utilize multiple hardware processors, i.e. some horizontal
  scalability via multi-core parallelism is possible.

I would remove the discussion of multi-core parallelism from these two bullets, and instead add a section at the end, something like:

"It is always best to avoid contention as much as possible via the design of the schema and application. However, sometimes contention is unavoidable. To maximize performance in the presence of contention, you'll need to maximize the performance of a single range.

  • Minimize the network distance between the replicas of a range, possibly using zone configs and partitioning.
  • Use the fastest storage devices available.
  • If the contending transactions operate on different keys within the same range, add more CPU power (more cores) per node (this is less likely to provide an improvement if the transactions all operate on the same key)."

v2.1/sql-faqs.md, line 90 at r2 (raw file):

  suggestions.

- Increase

I'm not sure about this advice. Have we seen many instances where this was part of the solution? Denormalization of immutable (or nearly-immutable) data can also increase scalability by creating multiple copies of often-referenced data.


v2.1/sql-faqs.md, line 106 at r2 (raw file):

  [`SELECT`](select-clause.html) and
  [`INSERT`](insert.html)/[`UPDATE`](update.html)/[`DELETE`](delete.html)/[`UPSERT`](upsert.html)
  clauses together in a single SQL statement.

Also mention that using UPSERT and specifying values for all columns will usually have the best performance under contention (compared to INSERT or UPDATE).


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 17, 2018

The title must stay because we often talk about contention in our explanations to users (and in blog posts, and other things) and it would be good for people to have a definition/explanation to search for if they want to understand our answers better.

However I agree we must provide additional entry points to make this information better discoverable. I'll make some attempt. Will let you know.


Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


Comments from Reviewable

@knz knz force-pushed the 20180517-txn-contention branch from 9fd8a3f to 85f87a9 Compare May 17, 2018 18:23
@knz
Copy link
Contributor Author

knz commented May 17, 2018

I have moved the bulk of the explanation to the perf best practices page, and added two entry points in the FAQ.

RFAL


Review status: 0 of 2 files reviewed at latest revision, 5 unresolved discussions.


v2.1/sql-faqs.md, line 58 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

"A set of transactions that all contend on the same keys will be limited in performance to the maximum processing speed of a single node..."

It is possible for a set of transactions that all contend with each other to contend on different keys and therefore still get some parallelism, but we don't need to go into that much detail here.

Done.


v2.1/sql-faqs.md, line 69 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I would remove the discussion of multi-core parallelism from these two bullets, and instead add a section at the end, something like:

"It is always best to avoid contention as much as possible via the design of the schema and application. However, sometimes contention is unavoidable. To maximize performance in the presence of contention, you'll need to maximize the performance of a single range.

  • Minimize the network distance between the replicas of a range, possibly using zone configs and partitioning.
  • Use the fastest storage devices available.
  • If the contending transactions operate on different keys within the same range, add more CPU power (more cores) per node (this is less likely to provide an improvement if the transactions all operate on the same key)."

Done.


v2.1/sql-faqs.md, line 90 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm not sure about this advice. Have we seen many instances where this was part of the solution? Denormalization of immutable (or nearly-immutable) data can also increase scalability by creating multiple copies of often-referenced data.

Added a note with your highlight.


v2.1/sql-faqs.md, line 106 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Also mention that using UPSERT and specifying values for all columns will usually have the best performance under contention (compared to INSERT or UPDATE).

Done.


Comments from Reviewable

@cockroach-teamcity
Copy link
Member

@knz knz force-pushed the 20180517-txn-contention branch from 85f87a9 to 1f112e5 Compare May 17, 2018 18:26
@cockroach-teamcity
Copy link
Member

@bdarnell
Copy link
Contributor

:lgtm:


Review status: 0 of 2 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


Comments from Reviewable

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, @knz! As usual. LGTM, with some nits.

Once this lands, I'll look into other ways to cross-reference this new material.


## Understanding and Avoiding Transaction Contention

Transaction contention occurs when the following three conditions hold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hold together > are met.


To avoid contention, multiple strategies can be applied:

- Use index key values with a more random distribution of values, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to link directly to the specific FAQ you have in mind.

specify values for all columns in the inserted rows. This will
usually have the best performance under contention, compared to
combinations of [`SELECT`](select-clause.html),
[`INSERT`](insert.html) and [`UPDATE`](update.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: place comma before the and (the "oxford comma").

It is always best to avoid contention as much as possible via the
design of the schema and application. However, sometimes contention is
unavoidable. To maximize performance in the presence of contention,
you'll need to maximize the performance of a single range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this paragraph introducing the following bulleted list? If so, use a semicolon instead of a period at the end of the last sentence.

v2.1/sql-faqs.md Outdated
Transaction contention occurs when transactions issued from multiple
clients at the same time operate on the same data.

This can cause transactions to wait on each other and decrease
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be its own paragraph. Please make it the second sentence of the previous paragraph.

v2.1/sql-faqs.md Outdated
Transaction
Contention](performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention).

## Why can I not get more operations per second by increasing the number of nodes?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase this a bit: Why would increasing the number of nodes not result in more operations per second?

@knz knz force-pushed the 20180517-txn-contention branch from 1f112e5 to 5bf139f Compare May 22, 2018 14:11
@knz
Copy link
Contributor Author

knz commented May 22, 2018

Review status: 0 of 2 files reviewed at latest revision, 11 unresolved discussions.


v2.1/performance-best-practices-overview.md, line 232 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

nit: hold together > are met.

Done.


v2.1/performance-best-practices-overview.md, line 270 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Probably best to link directly to the specific FAQ you have in mind.

There are multiple. How do you suggest to do that?


v2.1/performance-best-practices-overview.md, line 287 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

nit: place comma before the and (the "oxford comma").

Done.


v2.1/performance-best-practices-overview.md, line 305 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Is this paragraph introducing the following bulleted list? If so, use a semicolon instead of a period at the end of the last sentence.

Done.


v2.1/sql-faqs.md, line 47 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

This doesn't need to be its own paragraph. Please make it the second sentence of the previous paragraph.

Done.


v2.1/sql-faqs.md, line 55 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

I'd rephrase this a bit: Why would increasing the number of nodes not result in more operations per second?

Done.


Comments from Reviewable

@cockroach-teamcity
Copy link
Member

@jseldess
Copy link
Contributor

v2.1/performance-best-practices-overview.md, line 270 at r3 (raw file):

Previously, knz (kena) wrote…

There are multiple. How do you suggest to do that?

You're right. You can just leave it as-is. Users will see the toc and jump to the FAQ that interests them.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 22, 2018

Want me to propagate the change to the other version dirs?

@jseldess
Copy link
Contributor

Yes, please.

@knz knz force-pushed the 20180517-txn-contention branch from 5bf139f to a4541ad Compare May 22, 2018 15:29
@knz
Copy link
Contributor Author

knz commented May 22, 2018

Yes, please.

Done

@cockroach-teamcity
Copy link
Member

@jseldess jseldess merged commit eb801ee into cockroachdb:master May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants