Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contention: record 40001 errors as contention events #111685

Merged
merged 2 commits into from
Oct 9, 2023

Conversation

xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Oct 3, 2023

Commit 1 contention: add contention_type field to ExtendedContentionEvent

This commit adds the contention_type field to ExtendedContentionEvent and
also adds it as a new column in crdb_internal.transaction_contention_events.

Note that this field does not impact the following surfaces:

  • crdb_internal.cluster_contention_events
  • crdb_internal.cluster_contended_tables
  • crdb_internal.cluster_contended_indexes
  • crdb_internal.cluster_contended_keys

Release note (sql change): New column contention_type in
crdb_internal.transaction_contention_events table indicating
the type of txn contention encountered. Current values are

  • LOCK_WAIT

Part of: #111649

Commit 2 sql, contention: record 40001 errors as contention events

This commit writes contention events to the contention registry's
event store when a 40001 error is returned at the end of a transaction.
The criteria for writing the event to the store is:

  • If a TransactionRetryWithProtoRefreshError error was encountered
  • If there exists conflicting txn information on the error.

Enabling this feature is controlled by the cluster setting
sql.contention.record_serialization_conflicts.enabled which is off by
default.

SERIALIZATION_CONFLICT is also added to the list of contention types
in the contention protobufs. These contention events will have their
txn ids mapped to txn fingerprints and will be surfaced in the table
crdb_internal.transaction_contention_events.
Note that all SERIALIZATION_CONFLICT type contention events will
have a contentino_duration value of 0.

Release note (cli change): New contention_type in table
crdb_internal.transaction_contention_events, SERIALIZATION_CONFLICT.
These contention events specify that contention occurred between
2 transactions due to serialization conflicts.

Closes: #111649

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz changed the title contention: add contention_type field to ExtendedContentionEvent contention: record 40001 errors as contention events Oct 3, 2023
@xinhaoz xinhaoz marked this pull request as ready for review October 4, 2023 14:34
@xinhaoz xinhaoz requested a review from a team October 4, 2023 14:34
@xinhaoz xinhaoz requested review from a team as code owners October 4, 2023 14:34
@xinhaoz xinhaoz requested a review from a team October 4, 2023 14:34
@xinhaoz xinhaoz requested a review from a team as a code owner October 4, 2023 14:34
@xinhaoz xinhaoz requested review from abarganier and cucaroach and removed request for a team, cucaroach and abarganier October 4, 2023 14:34
Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @xinhaoz)


pkg/sql/conn_executor_exec.go line 3189 at r2 (raw file):

	// If we have a known conflicting txn meta for serialization conflict, we
	// should add it to our contention registry so we can resolve the event.
	var retryErr *kvpb.TransactionRetryWithProtoRefreshError

recordTransactionFinish is already a pretty large. Can this be refactored into it's own function to make it easier to read?


pkg/sql/crdb_internal.go line 7454 at r1 (raw file):

    table_name          				 STRING NOT NULL,
    index_name          				 STRING
    contention_type              STRING              

nit: align and remove spaces after string


pkg/sql/crdb_internal.go line 7545 at r1 (raw file):

					tree.NewDString(tableName),  // table_name
					tree.NewDString(indexName),  // index_name
					tree.NewDString(resp.Events[i].ContentionType.String()),

During upgrade scenario could this be nil?


pkg/sql/contention/registry.go line 266 at r1 (raw file):

		// non lock wait related contention to index contention surfaces.
		c := event.BlockingEvent
		r.globalLock.Lock()

Do we need the global lock to add to the event store?

Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @j82w)


pkg/sql/crdb_internal.go line 7454 at r1 (raw file):

Previously, j82w (Jake) wrote…

nit: align and remove spaces after string

Hmm, it seems it's actually the above 4 lines that aren't aligned with the rest of the fields due to them using tab spsacing. I'll convert the spacing there (and also remove the extra spacing after the string on this line).


pkg/sql/crdb_internal.go line 7545 at r1 (raw file):

Previously, j82w (Jake) wrote…

During upgrade scenario could this be nil?

It will default to LOCK_WAIT for the events from nodes that haven't upgraded yet, which is accurate since there won't be any serialization conflict events there.


pkg/sql/contention/registry.go line 266 at r1 (raw file):

Previously, j82w (Jake) wrote…

Do we need the global lock to add to the event store?

I can double check what this lock is doing. I'd assume so since we weren't unlocking before the event store before.

@j82w
Copy link
Contributor

j82w commented Oct 4, 2023

pkg/sql/crdb_internal.go line 7454 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Hmm, it seems it's actually the above 4 lines that aren't aligned with the rest of the fields due to them using tab spsacing. I'll convert the spacing there (and also remove the extra spacing after the string on this line).

nvm, I left the comment on an old revision. It's fixed in the latest.

Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I have also now put the new behaviour behind a cluster setting. This is reflected in the commit message and testing. It's undocumented right now but I'm asssuming we want to make it public so users can turn it on themselves?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @j82w)


pkg/sql/conn_executor_exec.go line 3189 at r2 (raw file):

Previously, j82w (Jake) wrote…

recordTransactionFinish is already a pretty large. Can this be refactored into it's own function to make it easier to read?

Done.


pkg/sql/crdb_internal.go line 7454 at r1 (raw file):

Previously, j82w (Jake) wrote…

nvm, I left the comment on an old revision. It's fixed in the latest.

Yeah, I converted the 4 lines above it to use spaces instead of tabs 👍

Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @xinhaoz)


pkg/sql/contention/event_store.go line 248 at r7 (raw file):

	// duration exceeds the threshold.
	threshold := DurationThreshold.Get(&s.st.SV)
	if e.ContentionType != contentionpb.ContentionType_SERIALIZATION_CONFLICT &&

Why have the threshold be ignored for serialization conflict?


pkg/sql/contention/registry.go line 266 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

I can double check what this lock is doing. I'd assume so since we weren't unlocking before the event store before.

Any update on this?

Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @j82w)


pkg/sql/contention/event_store.go line 248 at r7 (raw file):

Previously, j82w (Jake) wrote…

Why have the threshold be ignored for serialization conflict?

We don't record how long the stmt 'waited' for a serialiation conflict since that doesn't always apply in this scenario. So all serialization_conflict events will have a 0 duration currently and we record them all. For now, I wouldn't expect there to be too many of these events since we only have enoug info to record the event in the scenarios where the conflicting txn hadn't commited at the time of conflict.


pkg/sql/contention/registry.go line 266 at r1 (raw file):

Previously, j82w (Jake) wrote…

Any update on this?

Done.

Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

This commit adds the `contention_type` field to `ExtendedContentionEvent` and
also adds it as a new column in `crdb_internal.transaction_contention_events`.

Note that this field does not impact the following surfaces:
- crdb_internal.cluster_contention_events
- crdb_internal.cluster_contended_tables
- crdb_internal.cluster_contended_indexes
- crdb_internal.cluster_contended_keys

Release note (sql change): New column `contention_type` in
crdb_internal.transaction_contention_events table indicating
the type of txn contention encountered. Current values are
- LOCK_WAIT

Part of: cockroachdb#111649
This commit writes contention events to the contention registry's
event store when a 40001 error is returned at the end of a transaction.
The criteria for writing the event to the store is:
- If a `TransactionRetryWithProtoRefreshError` error was encountered
- If there exists conflicting txn information on the error

Enabling this feature is controlled by the cluster setting
`sql.contention.record_serialization_conflicts.enabled` which is off by
default.

`SERIALIZATION_CONFLICT` is also added to the list of contention types
in the contention protobufs. These contention events will have their
txn ids mapped to txn fingerprints and will be surfaced in the table
`crdb_internal.transaction_contention_events`.
Note that all `SERIALIZATION_CONFLICT` type contention events will
have a `contentino_duration` value of 0.

Release note (cli change): New contention_type in table `SERIALIZATION_CONFLICT`
`crdb_internal.transaction_contention_events`.  These contention events specify
that contention occurred between 2 transactions due to serialization conflicts.

Closes: cockroachdb#111649
@xinhaoz
Copy link
Member Author

xinhaoz commented Oct 9, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 9, 2023

Build succeeded:

@craig craig bot merged commit f840e7f into cockroachdb:master Oct 9, 2023
8 checks passed
@xinhaoz xinhaoz deleted the contention-registry-40001 branch April 1, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

insights: resolve txn id - > fingerprint id for failed insights with 40001 errors
3 participants