-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: activity update job fails with CONFLICT command cannot affect row a second time
#111224
Labels
A-cluster-upgrades
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
Comments
j82w
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-cluster-upgrades
T-cluster-observability
labels
Sep 25, 2023
Related: 111866 |
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 13, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the last executed time to chose which one to use. 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 17, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the last executed time to chose which one to use. 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 18, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the last executed time to chose which one to use. 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 18, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 19, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 19, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 20, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
craig bot
pushed a commit
that referenced
this issue
Oct 21, 2023
112350: sql: fix sql activity cache for large cardinality r=maryliag a=maryliag Note to reviewers: first commit being reviewed on #112609 --- Recreated from #112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: #111780 Fixes: #111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries. Co-authored-by: maryliag <marylia@cockroachlabs.com>
This was referenced Oct 23, 2023
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 23, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 23, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 24, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 24, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
maryliag
added a commit
to maryliag/cockroach
that referenced
this issue
Oct 30, 2023
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (bug fix): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cluster-upgrades
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
The failure is caused by not aggregating all the columns except the primary key values. The
index_recommendation
column needs to be aggregated.Error:
W230922 10:23:57.856299 221828524 sql/sql_activity_update_job.go:107 ⋮ [T1,n1,job=‹AUTO UPDATE SQL ACTIVITY id=103›] 2928376 error running sql activity updater job: ‹activity-flush-stmt-transfer-tops›: ‹UPSERT or INSERT...ON CONFLICT command cannot affect row a second time›
The query is also causing high memory usage, because of the
array_agg
in the update query. This can be optimized with a new function.Jira issue: CRDB-31827
The text was updated successfully, but these errors were encountered: