Skip to content

Commit

Permalink
memtx: fix and refactor memtx_tx_history_rollback_stmt
Browse files Browse the repository at this point in the history
Rollback is rather complicated part if MVCC implementation that
is meant to handle two kinds of rollback:
* rollback from in-progress state, if box.rollback() is called.
* rollback from prepared state, when WAL fails.
Unfortunately the last one was not properly tested and surely
has at least one flaw. When an inserting transaction becomes
prepared its stories could be linked as deleted (via del_stmt
pointer) by other in-progress transactions in order to maintain
correct visibility. The problem is that in case of rollback of
such prepared transaction those links remained. Broken links
breaks the chain structure, and some older changes (that were
linked as deleted before that hapless preparation) can become
visible to other transaction.

Refactor, simplify a bit that part of code and fix the issue
described above; cover with tests.

Closes tarantool#8648

NO_DOC=bugfix
  • Loading branch information
alyapunov committed Jun 21, 2023
1 parent e901507 commit 85569d9
Show file tree
Hide file tree
Showing 3 changed files with 269 additions and 21 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## bugfix/core

* Fixed a bug when MVCC rollback of prepared statement could break internal
invariants (gh-8648).
129 changes: 108 additions & 21 deletions src/box/memtx_tx.c
Original file line number Diff line number Diff line change
Expand Up @@ -1625,16 +1625,13 @@ memtx_tx_story_delete_is_visible(struct memtx_story *story, struct txn *txn,
{
*is_own_change = false;

bool was_deleted_by_prepared = false;
struct txn_stmt *dels = story->del_stmt;
while (dels != NULL) {
if (dels->txn == txn) {
/* Tuple is deleted by us (@txn). */
*is_own_change = true;
return true;
}
if (story->del_psn != 0 && dels->txn->psn == story->del_psn)
was_deleted_by_prepared = true;
dels = dels->next_in_del_list;
}

Expand All @@ -1645,7 +1642,7 @@ memtx_tx_story_delete_is_visible(struct memtx_story *story, struct txn *txn,
if (is_prepared_ok && story->del_psn != 0 && story->del_psn < rv_psn)
return true; /* Tuple is deleted by prepared TX. */

if (story->del_psn != 0 && !was_deleted_by_prepared &&
if (story->del_psn != 0 && story->del_stmt == NULL &&
story->del_psn < rv_psn)
return true; /* Tuple is deleted by committed TX. */

Expand Down Expand Up @@ -2183,30 +2180,74 @@ memtx_tx_history_remove_story_del_stmts(struct memtx_story *story)
static void
memtx_tx_history_rollback_added_story(struct txn_stmt *stmt)
{
assert(stmt->add_story != NULL);
assert(stmt->add_story->tuple == stmt->rollback_info.new_tuple);
struct memtx_story *story = stmt->add_story;
struct memtx_story *add_story = stmt->add_story;
struct memtx_story *del_story = stmt->del_story;

memtx_tx_history_remove_story_del_stmts(story);
story->add_stmt = NULL;
stmt->add_story = NULL;
/*
* In case of rollback of prepared statement we need to rollback
* preparation actions.
*/
if (stmt->txn->psn != 0) {
/*
* During preparation of this statement there were two cases:
* * del_story != NULL: all in-progress transactions that were
* to delete del_story were relinked to delete add_story.
* * del_story == NULL: all in-progress transactions that were
* to delete same nothing were relinked to delete add_story.
* See memtx_tx_history_prepare_insert_stmt for details.
* Note that by design of rollback, all statements are rolled
* back in reverse order, and thus at this point there can be no
* statements of the same transaction that deletes add_story.
* So we must scan delete statements and relink them to delete
* del_story if it's not NULL or to delete nothing otherwise.
*/
struct txn_stmt **from = &add_story->del_stmt;
while (*from != NULL) {
struct txn_stmt *test_stmt = *from;
assert(test_stmt->del_story == add_story);
assert(test_stmt->txn != stmt->txn);
assert(!test_stmt->is_own_change);
assert(test_stmt->txn->psn == 0);

/* Unlink from add_story list. */
*from = test_stmt->next_in_del_list;
test_stmt->next_in_del_list = NULL;
test_stmt->del_story = NULL;

if (del_story != NULL) {
/* Link to del_story's list. */
memtx_tx_story_link_deleted_by(del_story,
test_stmt);
}
}

/* Revert psn assignment. */
add_story->add_psn = 0;
if (del_story != NULL)
del_story->del_psn = 0;
}

/* Unlink stories from the statement. */
memtx_tx_story_unlink_added_by(add_story, stmt);
if (del_story != NULL)
memtx_tx_story_unlink_deleted_by(del_story, stmt);

/*
* Sink the story to the end of chain and mark is as deleted long
* time ago (with some very low del_psn). After that the story will
* be invisible to any reader (that's what is needed) and still be
* able to store read set, if necessary.
*/
for (uint32_t i = 0; i < story->index_count; ) {
struct memtx_story *old_story = story->link[i].older_story;
for (uint32_t i = 0; i < add_story->index_count; ) {
struct memtx_story *old_story = add_story->link[i].older_story;
if (old_story == NULL) {
/* Old story is absent. */
i++; /* Go to the next index. */
continue;
}
memtx_tx_story_reorder(story, old_story, i);
memtx_tx_story_reorder(add_story, old_story, i);
}
story->del_psn = MEMTX_TX_ROLLBACKED_PSN;
add_story->del_psn = MEMTX_TX_ROLLBACKED_PSN;
}

/*
Expand All @@ -2215,22 +2256,67 @@ memtx_tx_history_rollback_added_story(struct txn_stmt *stmt)
static void
memtx_tx_history_rollback_deleted_story(struct txn_stmt *stmt)
{
struct memtx_story *story = stmt->del_story;
struct memtx_story *del_story = stmt->del_story;

/*
* There can be no more than one prepared statement deleting a story at
* any point in time.
* In case of rollback of prepared statement we need to rollback
* preparation actions.
*/
assert(story->del_psn == 0 || story->del_psn == stmt->txn->psn);
story->del_psn = 0;
memtx_tx_story_unlink_deleted_by(story, stmt);
if (stmt->txn->psn != 0) {
/*
* During preparation of deletion we could unlink other
* transactions that want to overwrite this story. Now we have
* to restore the link. Replace-like statements can be found in
* the story chain of primary index. Unfortunately DELETE
* statements cannot be found since after unlink they are not
* present in chains. The good news is that by design all their
* transactions are surely conflicted because of read-write
* conflict and thus does not matter anymore.
*/
struct memtx_story *test_story;
for (test_story = del_story->link[0].newer_story;
test_story != NULL;
test_story = test_story->link[0].newer_story) {
struct txn_stmt *test_stmt = test_story->add_stmt;
if (test_stmt->is_own_change)
continue;
assert(test_stmt->txn != stmt->txn);
assert(test_stmt->del_story == NULL);
assert(test_stmt->txn->psn == 0);
memtx_tx_story_link_deleted_by(del_story, test_stmt);
}

/* Revert psn assignment. */
del_story->del_psn = 0;
}

/* Unlink the story from the statement. */
memtx_tx_story_unlink_deleted_by(del_story, stmt);
}

void
memtx_tx_history_rollback_stmt(struct txn_stmt *stmt)
{
/* Consistency asserts. */
if (stmt->add_story != NULL) {
assert(stmt->add_story->tuple == stmt->rollback_info.new_tuple);
assert(stmt->add_story->add_psn == stmt->txn->psn);
}
if (stmt->del_story != NULL)
assert(stmt->del_story->del_psn == stmt->txn->psn);
/*
* There can be no more than one prepared statement deleting a story at
* any point in time.
*/
assert(stmt->txn->psn == 0 || stmt->next_in_del_list == NULL);

/*
* Note that both add_story and del_story can be NULL,
* see comment in memtx_tx_history_prepare_stmt.
*/
if (stmt->add_story != NULL)
memtx_tx_history_rollback_added_story(stmt);
if (stmt->del_story != NULL)
else if (stmt->del_story != NULL)
memtx_tx_history_rollback_deleted_story(stmt);
assert(stmt->add_story == NULL && stmt->del_story == NULL);
}
Expand Down Expand Up @@ -2581,6 +2667,7 @@ memtx_tx_history_commit_stmt(struct txn_stmt *stmt, size_t *bsize)
memtx_tx_story_unlink_added_by(stmt->add_story, stmt);
}
if (stmt->del_story != NULL) {
assert(stmt->del_story->del_stmt == stmt);
*bsize -= tuple_bsize(stmt->del_story->tuple);
memtx_tx_story_unlink_deleted_by(stmt->del_story, stmt);
}
Expand Down
157 changes: 157 additions & 0 deletions test/box-luatest/gh_8648_memtx_mvcc_rollback_prepared_test.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
local server = require('luatest.server')
local t = require('luatest')

local g = t.group()

g.before_all(function(cg)
cg.server = server:new{
alias = 'default',
box_cfg = {memtx_use_mvcc_engine = true}
}
cg.server:start()
end)

g.after_all(function(cg)
cg.server:drop()
end)

g.before_each(function(cg)
cg.server:exec(function()
box.schema.create_space('test'):create_index('pk')
end)
end)

g.after_each(function(cg)
cg.server:exec(function()
box.space.test:drop()
end)
end)

-- Test that ensures that rollback of prepared works correctly.
-- Case when there was a tuple before transactions.
g.test_rollback_prepared_simple_with_tuple = function(cg)
t.tarantool.skip_if_not_debug()
cg.server:exec(function()
box.space.test:create_index('sk', {parts={{2}}})
local txn_proxy = require("test.box.lua.txn_proxy")

box.space.test:replace{1, 0, 0}

local tx1 = txn_proxy.new()
tx1:begin()
tx1('box.space.test:replace{1, 1, 1}')
-- {1, 0, 0} is invisible since it is replaced by {1, 1, 1}
t.assert_equals(tx1('box.space.test.index.sk:select{0}'), {{}})

local tx2 = txn_proxy.new()
tx2:begin()
tx2('box.space.test:replace{1, 2, 2}')

-- Prepare and rollback tx2.
box.error.injection.set('ERRINJ_WAL_WRITE', true)
tx2:commit()
box.error.injection.set('ERRINJ_WAL_WRITE', false)

-- {1, 0, 0} must remain invisible since it is replaced by {1, 1, 1}
t.assert_equals(tx1('box.space.test.index.sk:select{0}'), {{}})
-- Must be successful
t.assert_equals(tx1:commit(), "")
end)
end

-- Test that ensures that rollback of prepared works correctly.
-- Case when there were no tuple before transactions.
g.test_rollback_prepared_simple_without_tuple = function(cg)
t.tarantool.skip_if_not_debug()
cg.server:exec(function()
box.space.test:create_index('sk', {parts={{2}}})
local txn_proxy = require("test.box.lua.txn_proxy")

local tx1 = txn_proxy.new()
tx1:begin()
tx1('box.space.test:replace{11, 21}')

local tx2 = txn_proxy.new()
tx2:begin()
tx2('box.space.test:replace{11, 22}')

-- Prepare and rollback tx2.
box.error.injection.set('ERRINJ_WAL_WRITE', true)
tx2:commit()
box.error.injection.set('ERRINJ_WAL_WRITE', false)

tx1:rollback()

-- Nothing of the above must be visible since everything is rollbacked.
t.assert_equals(box.space.test.index.pk:select{11}, {})
t.assert_equals(box.space.test.index.sk:select{21}, {})
t.assert_equals(box.space.test.index.sk:select{22}, {})
end)
end

-- Test that ensures that rollback of prepared works correctly.
-- Case with rollback of prepared DELETE statement.
g.test_rollback_prepared_simple_delete = function(cg)
t.tarantool.skip_if_not_debug()
cg.server:exec(function()
box.space.test:create_index('sk', {parts={{2}}})
local txn_proxy = require("test.box.lua.txn_proxy")

box.space.test:replace{21, 30}
local tx1 = txn_proxy.new()
tx1:begin()
tx1('box.space.test:replace{21, 31}')
-- tx1 must overwrite {21, 30}.
t.assert_equals(tx1('box.space.test.index.sk:select{30}'), {{}})

local tx2 = txn_proxy.new()
tx2:begin()
tx2('box.space.test:delete{21}')

-- Prepare and rollback tx2.
box.error.injection.set('ERRINJ_WAL_WRITE', true)
tx2:commit()
box.error.injection.set('ERRINJ_WAL_WRITE', false)

-- tx1 must continue to overwrite {21, 30}.
t.assert_equals(tx1('box.space.test.index.sk:select{30}'), {{}})

tx1:rollback()
end)
end

-- The first test from issue.
g.test_rollback_prepared_complicated = function(cg)
t.tarantool.skip_if_not_debug()
cg.server:exec(function()
local txn_proxy = require("test.box.lua.txn_proxy")

box.space.test:replace{1, 1, 1}

-- tx1 is made just for preventing further GC of {1, 1, 1} story.
local tx1 = txn_proxy.new()
tx1:begin()
tx1('box.space.test:select{1}') -- {1, 1, 1}

box.space.test:delete{1} -- tx1 goes to read view and owns {1, 1, 1}

local tx2 = txn_proxy.new()
tx2:begin()
tx2('box.space.test:replace{1, 2, 2}')

local tx3 = txn_proxy.new()
tx3:begin()
tx3('box.space.test:replace{1, 3, 3}')

-- Prepare and rollback tx2.
box.error.injection.set('ERRINJ_WAL_WRITE', true)
t.assert_equals(tx2:commit(), {{error = "Failed to write to disk"}})
box.error.injection.set('ERRINJ_WAL_WRITE', false)

-- Must be OK
t.assert_equals(tx3:rollback(), "")

-- Must return {}
t.assert_equals(box.space.test:select{1}, {})
end)
end

0 comments on commit 85569d9

Please sign in to comment.