Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: activity update job fails with CONFLICT command cannot affect row a second time #111224

Closed
j82w opened this issue Sep 25, 2023 · 1 comment · Fixed by #112350
Closed

sql: activity update job fails with CONFLICT command cannot affect row a second time #111224

j82w opened this issue Sep 25, 2023 · 1 comment · Fixed by #112350
Assignees
Labels
A-cluster-upgrades C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@j82w
Copy link
Contributor

j82w commented Sep 25, 2023

The failure is caused by not aggregating all the columns except the primary key values. The index_recommendation column needs to be aggregated.

Error:
W230922 10:23:57.856299 221828524 sql/sql_activity_update_job.go:107 ⋮ [T1,n1,job=‹AUTO UPDATE SQL ACTIVITY id=103›] 2928376 error running sql activity updater job: ‹activity-flush-stmt-transfer-tops›: ‹UPSERT or INSERT...ON CONFLICT command cannot affect row a second time›

The query is also causing high memory usage, because of the array_agg in the update query. This can be optimized with a new function.

Jira issue: CRDB-31827

@j82w j82w added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cluster-upgrades T-cluster-observability labels Sep 25, 2023
@j82w j82w self-assigned this Sep 25, 2023
@blathers-crl blathers-crl bot added this to Triage in Cluster Observability Sep 25, 2023
@j82w j82w moved this from Triage to Active Issues in Cluster Observability Sep 26, 2023
@j82w
Copy link
Contributor Author

j82w commented Oct 5, 2023

Related: 111866

maryliag added a commit to maryliag/cockroach that referenced this issue Oct 13, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the last executed
time to chose which one to use.
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 17, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the last executed
time to chose which one to use.
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 18, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the last executed
time to chose which one to use.
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 18, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 19, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 19, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
@maryliag maryliag assigned maryliag and unassigned j82w Oct 19, 2023
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 20, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
craig bot pushed a commit that referenced this issue Oct 21, 2023
112350: sql: fix sql activity cache for large cardinality r=maryliag a=maryliag

Note to reviewers: first commit being reviewed on #112609

---
Recreated from #112071

tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats.

1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column.
4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value.

Fixes: #111780
Fixes: #111224

Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.

Co-authored-by: maryliag <marylia@cockroachlabs.com>
@craig craig bot closed this as completed in 37ed721 Oct 21, 2023
Cluster Observability automation moved this from Active Issues to Done Oct 21, 2023
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 23, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 23, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 24, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 24, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (sql change): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
maryliag added a commit to maryliag/cockroach that referenced this issue Oct 30, 2023
Recreated from cockroachdb#112071

tldr: fix activity update job failing from conflict, caching to much
info when their is high cardinatlity in the stats, and fix cache to
store the correct top stats.

1. Fix activity update failure from upserting same row multiple times.
This was caused by 2 nodes having the same statement, but one only
executed it 2 or 3 time and the other node executed it enough to
generate an index recommendation. The different index recommendation
caused duplicate rows based on primary key to be inserted. It now uses
index recommendation from the merged stats which uses the latest existing
recommendation (we don't update if there is no recommendation).
2. Customers can have transactions that have over 1k unique statement
fingerprint ids. This makes it unreasonable to cache all of the
statement information for all the top transactions. This commit removes
the logic to cache all the statement information for the top
transactions which could cause the table to grow to over 100k for a
single hour. This requires the call to go the statistics table to get
all the necessary statement information for the txn details.
3. The activity cache logic was not correctly selecting the top CPU json
column. It was using the computed index column name instead of the json
field name. The values shown on the UI were correct because the endpoint
still aggregated several columns using the aggregated stats rather than
the separate column.
4. The activity cache logic was using the execution count instead of the
statistics count to select the top queries. In most cases this still
generates the correct top queries, and the UI uses the aggregated stats
which showed the correct value.

Fixes: cockroachdb#111780
Fixes: cockroachdb#111224

Release note (bug fix): Fix the SQL activity update job to avoid
conflicts on update, reduces the amount of data cached to just what the
overview page requires, and fix the correctess of the top queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cluster-upgrades C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants