MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY #3229

FooBarrior · 2024-04-28T00:15:16Z

The Jira issue number for this PR is: MDEV-15990

Description

This PR attempts to handle several issues related to system versioned history collision handling.

First, the subject of MDEV-15990: REPLACE shouldn't end up with duplicate key error.

Secondly, the user should not experience lost history, when possible, if several updates on the row happened to applty at the same timestamp. This is a contrary to trx_id, where a collision can only happen in a single transaction, and shouldn't be preserved by design, see MDEV-15427.

In addition, don't save history with row_start>row_end. This conforms current historical row insetion behavior.
Though, in production this may happen f.ex. after ntp time correction. Then we'd better want to update row_start=row_end and save the row. I'm open to discussion.

Release Notes

Fix REPLACE behavior on a TRX-ID versioned table, when a single row was rewritten more than once. Now this doesn't cause duplicate key error.
Improve timestamp-based versioned history collision handling, when several changs of a single row happen at the same timestamp.

How can this PR be tested?

For 1 and 2 the test case is added to versioning.replace. For 3 the test is in versioning.misc.

Basing the PR against the correct MariaDB version

This is a new feature and the PR is based against the latest MariaDB development branch.
This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

midenok

This is partial review. I should evaluate more on TABLE::delete_row(). But anyway posting it now for you @FooBarrior to evaluate it if you want to.

mysql-test/suite/versioning/r/misc.result

midenok · 2024-05-25T09:29:05Z

storage/innobase/row/row0ins.cc

 					ut_ad(trx_id_len == DATA_TRX_ID_LEN);
 					if (trx->id == trx_read_trx_id(trx_id)) {
-						err = DB_FOREIGN_DUPLICATE_KEY;
+						err = DB_DUPLICATE_KEY;


In the commit message you should refer to your previous commit "MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY" as this commit depends on it. You did not remove DB_FOREIGN_DUPLICATE_KEY from TABLE::delete_row(). Was it intended? No point in this assignment since err was DB_DUPLICATE_KEY anyway (line 2432), you could revert the whole hunk from MDEV-23644.

yes, intended. DB_DUPLICATE_KEY is converted to DB_FOREIGN_DUPLICATE_KEY when returned from a cascade function call

I don't understand fully the case. When foreign cascade generates history and that history has duplicate error? That goes not from one command. Is there any test case? I doubt it should skip history in that case.

sql/sql_delete.cc

sql/field.cc

montywi · 2024-05-25T16:19:19Z

Hi!

On Sat, 25 May 2024, 18:42 'Nikita Malyavin' via Michael Widenius, < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sql/field.cc <#3229 (comment)>: > @@ -4626,17 +4626,17 @@ void Field_longlong::set_max() int8store(ptr, unsigned_flag ? ULONGLONG_MAX : LONGLONG_MAX); } -bool Field_longlong::is_max() +bool Field_longlong::is_max(const uchar *ptr_arg) const I think the opposite: a pointer should ALWAYS be supplied. And it makes no sense to store ptr in Field (offset should be stored, rather). You already may have noticed, how hard it can be to make literally anything with a record that is not table->record[0]. Many things get broken because of that.

We should not change interfaces for how records are accessed just for one case and in an old MariaDB version. When we do that, we should do that systematic and fix all interfaces at the same time. There are plans of how to do this, and the suggested change is not the way to do that. Regards, Monty

…

FooBarrior · 2024-05-27T09:46:59Z

@montywi This approach of accessing the rocerds has already been presented in a number of Field methods:

int cmp(const uchar *a,const uchar *b) const;
int cmp_binary(const uchar *a,const uchar *b, uint32 max_length) const;
int cmp_prefix(const uchar *a, const uchar *b, size_t prefix_char_len) const;
int key_cmp(const uchar *,const uchar*) const override;
uchar *pack(uchar *to, const uchar *from, uint max_length);
const uchar *unpack(uchar* to, const uchar *from, const uchar *from_end, uint param_data);
void val_str_from_ptr(String *val, const uchar *ptr) const;

So that is not something new for the Field interface, but an organic continuation.

We should not change interfaces for how records are accessed just for one
case and in an old MariaDB version.
When we do that, we should do that systematic and fix all interfaces at
the same time.

There are plans of how to do this, and the suggested change is not the way
to do that.

midenok

Mostly ok except refactoring requiring more arguments across the code.

FooBarrior · 2025-07-31T18:28:27Z

Approved by @midenok [jira]

See also MDEV-30046. Idempotent write_row works same as REPLACE: if there is a duplicating record in the table, then it will be deleted and re-inserted, with the same update optimization. The code in Rows:log_event::write_row was basically copy-pasted from write_record. What's done: REPLACE operation was unified across replication and sql. It is now representred as a Write_record class, that holds the whole state, and allows re-using some resources in between the row writes. Replace, IODKU and single insert implementations are split across different methods, reluting in a much cleaner code. The entry point is preserved as a single Write_record::write_record() call. The implementation to call is chosen on the constructor stage. This allowed several optimizations to be done: 1. The table key list is not iterated for every row. We find last unique key in the order of checking once and preserve it across the rows. See last_uniq_key(). 2. ib_handler::referenced_by_foreign_key acquires a global lock. This call was done per row as well. Not all the table config that allows optimized replace is folded into a single boolean field can_optimize. All the fields to check are even stored in a single register on a 64-bit platform. 3. DUP_REPLACE and DUP_UPDATE cases now have one less level of indirection 4. modified_non_trans_tables is checked and set only when it's really needed. 5. Obsolete bitmap manipulations are removed. Also: * Unify replace initialization step across implementations: add prepare_for_replace and finalize_replace * alloca is removed in favor of mem_root allocation. This memory is reused across the rows. * An rpl-related callback is added to the replace branch, meaning that an extra check is made per row replace even for the common case. It can be avoided with templates if considered a problem.

We had a protection against it, by allowing versioned delete if: trx->id != table->vers_start_id() For replace this check fails: replace calls ha_delete_row(record[2]), but table->vers_start_id() returns the value from record[0], which is irrelevant. The same problem hits Field::is_max, which may have checked the wrong record. Fix: * Refactor Field::is_max to optionally accept a pointer as an argument. * Refactor vers_start_id and vers_end_id to always accept a pointer to the record. there is a difference with is_max is that is_max accepts the pointer to the field data, rather than to the record. Method val_int() would be too effortful to refactor to accept the argument, so instead the value in record is fetched directly, like it is done in Field_longlong.

Timestamp-versioned row deletion was exposed to a collisional problem: if current timestamp wasn't changed, then a sequence of row delete+insert could get a duplication error. A row delete would find another conflicting history row and return an error. This is true both for REPLACE and DELETE statements, however in REPLACE, the "optimized" path is usually taken, especially in the tests. There, delete+insert is substituted for a single versioned row update. In the end, both paths end up as ha_update_row + ha_write_row. The solution is to handle a history collision somehow. From the design perspective, the user shouldn't experience history rows loss, unless there's a technical limitation. To the contrary, trxid-based changes should never generate history for the same transaction, see MDEV-15427. If two operations on the same row happened too quickly, so that they happen at the same timestamp, the history row shouldn't be lost. We can still write a history row, though it'll have row_start == row_end. We cannot store more than one such historical row, as this will violate the unique constraint on row_end. So we will have to phisically delete the row if the history row is already available. In this commit: 1. Improve TABLE::delete_row to handle the history collision: if an update results with a duplicate error, delete a row for real. 2. use TABLE::delete_row in a non-optimistic path of REPLACE, where the system-versioned case now belongs entirely.

during row insert DB_FOREIGN_DUPLICATE_KEY in row_ins_duplicate_error_in_clust was motivated by handling the cascade changes during versioned operations. It was found though, that certain row_update_for_mysql calls could return DB_FOREIGN_DUPLICATE_KEY, even if there's no foreign relations. Change DB_FOREIGN_DUPLICATE_KEY to DB_DUPLICATE_KEY in row_ins_duplicate_error_in_clust. It will be later converted to DB_FOREIGN_DUPLICATE_KEY in row_ins_check_foreign_constraint if needed. Additionally, ha_delete_row should return neither. Ensure it by an assertion.

FooBarrior requested a review from midenok April 28, 2024 00:15

FooBarrior force-pushed the 10.5-nikita-MDEV-15990 branch from 2419fd0 to ed7e9bd Compare April 28, 2024 13:12

midenok reviewed May 25, 2024

View reviewed changes

midenok requested changes Jun 4, 2024

View reviewed changes

cvicentiu added the MariaDB Corporation label Nov 26, 2024

FooBarrior changed the base branch from bb-10.5-nikita-MDEV-30046 to 10.6 January 7, 2025 18:33

FooBarrior force-pushed the 10.5-nikita-MDEV-15990 branch 2 times, most recently from 7355ebb to 2b9e2ec Compare January 9, 2025 18:03

FooBarrior changed the base branch from 10.6 to 10.11 July 31, 2025 18:27

FooBarrior force-pushed the 10.5-nikita-MDEV-15990 branch 4 times, most recently from e74d481 to 6202709 Compare August 4, 2025 12:39

FooBarrior added 6 commits August 4, 2025 14:42

multi_delete: fix unlikely -> likely

ed1a6e1

MDEV-15990 versioning: don't allow changes in the past

a648178

FooBarrior force-pushed the 10.5-nikita-MDEV-15990 branch from 6202709 to 93b3dcb Compare August 4, 2025 12:42

FooBarrior merged commit c4b76b9 into 10.11 Aug 4, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY #3229

MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY #3229

Uh oh!

FooBarrior commented Apr 28, 2024

Uh oh!

midenok left a comment

Uh oh!

Uh oh!

midenok May 25, 2024

Uh oh!

FooBarrior May 25, 2024

Uh oh!

midenok May 26, 2024

Uh oh!

Uh oh!

Uh oh!

montywi commented May 25, 2024 via email

Uh oh!

FooBarrior commented May 27, 2024 •

edited

Loading

Uh oh!

midenok left a comment

Uh oh!

FooBarrior commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Uh oh!

MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY #3229

MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY #3229

Uh oh!

Conversation

FooBarrior commented Apr 28, 2024

Description

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Uh oh!

midenok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

midenok May 25, 2024

Choose a reason for hiding this comment

Uh oh!

FooBarrior May 25, 2024

Choose a reason for hiding this comment

Uh oh!

midenok May 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

montywi commented May 25, 2024 via email

Uh oh!

FooBarrior commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

midenok left a comment

Choose a reason for hiding this comment

Uh oh!

FooBarrior commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

FooBarrior commented May 27, 2024 •

edited

Loading