Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs which could not create unique index "gp_fastsequence_objid_objmod_index" #146

Merged
merged 4 commits into from
Sep 11, 2023

Conversation

lss602726449
Copy link
Contributor

@lss602726449 lss602726449 commented Aug 14, 2023

closes: #132


Change logs

As indicated by the test in the issue. The same key exists in gp_fastsequence. The root cause is that a crash occurred when a tuple was inserted in gp_fastsequence in the test. Because it inserted a frozen tuple, and the tuple insertion and index update were not atomic. We can reproduce this problem by killing the server when the index is not inserted after inserting the tuple. To solve this problem, we can solve this problem by inserting a simple tuple and index (that is, the serevr crash triggers transaction rollback at this time), and after inserting the index, update it to a frozen tuple.

Why are the changes needed?

Fix the inconsistency of index and tuple in gp_fastsequence.

Does this PR introduce any user-facing change?

No

How was this patch tested?

cherrypick from GPDB

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to @cloudberrydb/dev team for review and approval when your PR is ready🥳

@tuhaihe
Copy link
Member

tuhaihe commented Aug 14, 2023

Hi @lss602726449, Thank you for your efforts! Please revise both your pull request and commit message's title and body to ensure they are thorough and easy to understand. You can follow our commit conventions as a guide.

@lss602726449 lss602726449 changed the title reindex [regression reindex_gpfastsequence] could not create unique index "gp_fastsequence_objid_objmod_index" Aug 15, 2023
@lss602726449 lss602726449 changed the title [regression reindex_gpfastsequence] could not create unique index "gp_fastsequence_objid_objmod_index" Fix bugs which could not create unique index "gp_fastsequence_objid_objmod_index" Aug 15, 2023
@yjhjstz
Copy link
Collaborator

yjhjstz commented Aug 15, 2023

root cause is ?

@lss602726449
Copy link
Contributor Author

root cause is ?

Maybe the root cause is geting the wrong snapshot in the reindex. So that it gets two version of the same tuple to build the unique index. So error ocuurs. And due to cannot reproduct in the lcoal computer. We try to reproduct it in the CI pipeline.

@CLAassistant
Copy link

CLAassistant commented Aug 17, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ lss602726449
❌ ashwinstar
❌ huansong
You have signed the CLA already but the status is still pending? Let us recheck it.

ashwinstar and others added 4 commits August 23, 2023 10:54
gp_fastsequence is populated and kept updated during inserts. No need
exist to write/update gp_fastsequence during index builds if block
directory needs to be created. Seems this code was added way back in
history when gp_fastsequence was introduced and to cover the upgrades
required to add/update entry in gp_fastsequence. No more such need
exist hence deleting this code.

Also, deleting code to initialize gp_fastsequence with 0 as
last_sequence during InsertInitialSegnoEntry(). No having tuple in
gp_fastsequence can already provide that behavior so no need to add
entry with 0 and then update the same with next value later during
insert. Helps to complete delete InsertFastSequenceEntry() interface.
When we insert into gp_fastsequence, we insert a frozen row
(xmin=FrozenTransactionXid) and then update the index. That creates a
possibility of inconsistency in presence of server crash: if the WAL
for the frozen insert is flushed to persistent storage but the index
is not, after crash recovery we would be able to see the frozen row
in seqscan, but not in indexscan.

Now fixing this problem by doing regular heap_insert (normal MVCC) and
update the index first. Only after that we mark the row to be frozen.
That way we remove the inconsistency: if we crashed before we mark the
row frozen, the row follows regular MVCC and we won't be able to see it;
or if we crashed after we marked it frozen (and also it made to the disk),
we would replay both table row and index, so we would see the row in both
seqscan and indexscan.

Note that, now we use the upstream-aligned way to mark the tuple frozen
which is infomask |= HEAP_XMIN_FROZEN (commit 37484ad), instead of
setting xmin = FrozenTransactionId. The freeze effect should be the same
though, but it helps leave behind a trail as to which transaction did the
freezing.

In addition, we use the same WAL type XLOG_HEAP2_FREEZE_PAGE as vacuum,
so we also align with upstream in how to replay the frozen tuple.
The alternative was to use the WAL type for heap_inplace_update, but then
we would need to change it to update the header portion as well as data
portion, which would be an unpleasant difference in WAL semantics.

The only other thing that has been changed from the previous way of doing
frozen_heap_insert is that now we don't set XLogRecord.xl_xid to
FrozenTransactionId anymore. However, this should be OK because:
(1) we are not setting xim=FrozenTransactionId anymore, the WAL record
should remain consistent.
(2) the change where we were setting this field in Greenplum seems to be
outdated itself. Upstream doesn't have even the capability to do that: it
never passes an arbitrary XID to that field, it's always
GetCurrentTransactionIdIfAny() (see XLogRecordAssemble()), even if you
insert a tuple w/ xmin=FrozenTransactionId (e.g. for PG sequence, see
fill_seq_with_data()). GPDB doesn't have a reason to differ.

Finally, need to make a fewn adjustment in the regress test output where
xmin is queried for gp_fastsequence.

Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Relations such as temporary tables or unlogged tables should not have
WALs created.
Due to the different definition in PG12 and PG14 for struct RelFileNode,
There is different length for XLogRecord. See function XLogRecordAssemble
in detail.
@my-ship-it my-ship-it merged commit ae1821e into cloudberrydb:main Sep 11, 2023
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[regression reindex_gpfastsequence] could not create unique index "gp_fastsequence_objid_objmod_index"
9 participants