-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use correct snapshot when scanning columnar tables #5154
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5154 +/- ##
==========================================
- Coverage 94.20% 94.20% -0.01%
==========================================
Files 214 214
Lines 42121 42133 +12
==========================================
+ Hits 39682 39693 +11
- Misses 2439 2440 +1 |
45da562
to
73058d3
Compare
5b1ba38
to
fba4998
Compare
Doing some manual testing to see how the changes in this pr would affect the things and found a bug that happens on master but not on this branch: drop table test;
create table test (a int unique) using columnar;
-- s1
begin;
insert into test select s from generate_series(1,2) s;
select * from test; -- force flush write state
-- s2
insert into test select s from generate_series(1,4) s;
ERROR: unexpected chunk group count This is because, on master, we use the snapshot provided to |
7a27802
to
bd8771a
Compare
9969605
to
c4fcb01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review my comment and see if there's a way to clarify the code a bit. Other than that, it looks good.
PushCopiedSnapshot(snapshot); | ||
|
||
/* now our snapshot is the active one */ | ||
UpdateActiveSnapshotCommandId(); | ||
snapshot = GetActiveSnapshot(); | ||
|
||
snapshotCreatedByUs = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little hard for me to follow what's happening here. If we aren't using GetActiveSnapshot() anywhere, why not just pop it unconditionally after updating the command ID? And I don't think EndRead is guaranteed to be called, so that means we can end up leaving a snapshot on the stack for no reason.
And if using the active stack is useful, why not always use it, so we don't have to keep track of whether to pop it or not? That would make one code path rather than two, which is more likely to be exercised thoroughly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hy not just pop it unconditionally after updating the command ID?
Agree, this makes a lot of sense, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I guess pushing our snapshot to stack and not keeping it in readState seem to be the better one at first glance. That way, we can use GetActiveSnapshot()
during read operation.
In that case, we would need to call PopActiveSnapshot
in EndRead but I'm not sure what happens if EndRead is not guaranteed to be called.
Moreover, if snapshot is not an MVCC snapshot, then we don't push a new snapshot as the active one.
In that case, I think we would still need to keep snapshot in readState to use during the read operation.
If we go with the other option, i.e.: keeping snapshot in readState and immediately calling PopActiveSnapshot in BeginRead, I think this time we should make sure to Register / Unregister snapshot in BeginRead / EndRead. But then what happens if EndRead doesn't get called ?
For this reason, I think we should choose the second option, but not sure how to make sure we unregister the snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ColumnarEndRead will be callled even if the executor node doesn't run to completion (e.g. limit or exists query). I thought I remembered a case where it might not be called, but I'm not seeing it now.
So the Push/Update/Pop/Register approach seems best. Let me know if you run into a problem case.
Also, I was just thinking about the |
PushCopiedSnapshot(snapshot); | ||
|
||
/* now our snapshot is the active one */ | ||
UpdateActiveSnapshotCommandId(); | ||
snapshot = GetActiveSnapshot(); | ||
|
||
snapshotCreatedByUs = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ColumnarEndRead will be callled even if the executor node doesn't run to completion (e.g. limit or exists query). I thought I remembered a case where it might not be called, but I'm not seeing it now.
So the Push/Update/Pop/Register approach seems best. Let me know if you run into a problem case.
This comment made me do some testing with cursors, looks like changes in this pr doesn't help with the cases where Even if we've called I think I should open issue without attempting to solve this in this pr, what do you think @jeff-davis ? DROP TABLE columnar_table;
CREATE TABLE columnar_table(q1 int8, q2 int8) USING columnar;
INSERT INTO columnar_table VALUES (1, 2), (3, 4);
BEGIN;
DECLARE columnar_cursor CURSOR WITH HOLD FOR
SELECT * FROM columnar_table ORDER BY 1,2;
INSERT INTO columnar_table VALUES (5, 6), (7, 8);
-- bad, shows (5, 6) and (7, 8) as well :/
-- heapAM would only show (1, 2) & (3, 4)
FETCH ALL FROM columnar_cursor;
┌────┬────┐
│ q1 │ q2 │
├────┼────┤
│ 1 │ 2 │
│ 3 │ 4 │
│ 5 │ 6 │
│ 7 │ 8 │
└────┴────┘
(4 rows)
COMMIT; |
/* now our snapshot is the active one */ | ||
UpdateActiveSnapshotCommandId(); | ||
snapshot = GetActiveSnapshot(); | ||
RegisterSnapshot(snapshot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment that this doesn't fix the issue with cursors, open an issue as well
Instead of using xact snapshot, use the snapshot provided to columnarAM when scanning table.
Seems that we always increment the command counter right after finishing metadata table modification. For this reason, it makes sense to call CommandCounterIncrement within FinishModifyRelation.
In next commit, we will adjust curcid of the snapshot being used when scanning the columnar table. However, for index scan, snapshot is provided not when beginning scan but within fetch-tuple call. For this reason, start flushing pending writes in init_columnar_read_state since this seem to be a prerequisite step that needs to be done before scanning a columnar table regardless of the scan method being used.
Before starting to scan a columnar table, we always flush the pending writes to disk. However, we increment command counter after modifying metadata tables. On the other hand, now that we _don't always use_ xact snapshot to scan a columnar table, writes that we just flushed might not be visible to the query that just flushed pending writes to disk since curcid of provided snapshot would become smaller than the command id being used when modifying metadata tables. To give an example, before this change, below was a possible scenario due to the changes that we made to use the correct snapshot. ```sql CREATE TABLE t(a int, b int) USING columnar; BEGIN; INSERT INTO t VALUES (5, 10); SELECT * FROM t; ┌───┬───┐ │ a │ b │ ├───┼───┤ └───┴───┘ (0 rows) SELECT * FROM t; ┌───┬────┐ │ a │ b │ ├───┼────┤ │ 5 │ 10 │ └───┴────┘ (1 row) ```
bffcdda
to
bf4dfad
Compare
Fixes #5180.
i) Instead of using xact snapshot, use the snapshot provided to columnarAM
when scanning table. This is the main purpose of this pr, also see #5154 (comment).
ii) Let me also explain the tricky part of this pr (bf4dfad):
To avoid visibility issues around pending writes that are force-flushed
to disk when scanning the table in the same xact, update curcid of given
snapshot before starting to scan the columnar table.
The reason behind such visibility issues is that we increment command counter
after modifying columnar metadata tables when flushing pending writes.
Now that we don't always use xact snapshot to scan a columnar table,
writes that we just flushed might not be visible to the query that made
pending writes flushed to disk since curcid of provided snapshot would
become smaller than the command id being used when modifying columnar
metadata tables.
To give an example, before bf4dfad, below was a possible scenario
after the changes done in i):