Use correct snapshot when scanning columnar tables #5154

onurctirtir · 2021-07-30T15:15:13Z

i) Instead of using xact snapshot, use the snapshot provided to columnarAM
when scanning table. This is the main purpose of this pr, also see #5154 (comment).

ii) Let me also explain the tricky part of this pr (bf4dfad):

To avoid visibility issues around pending writes that are force-flushed
to disk when scanning the table in the same xact, update curcid of given
snapshot before starting to scan the columnar table.

The reason behind such visibility issues is that we increment command counter
after modifying columnar metadata tables when flushing pending writes.

Now that we don't always use xact snapshot to scan a columnar table,
writes that we just flushed might not be visible to the query that made
pending writes flushed to disk since curcid of provided snapshot would
become smaller than the command id being used when modifying columnar
metadata tables.

To give an example, before bf4dfad, below was a possible scenario
after the changes done in i):

CREATE TABLE t(a int, b int) USING columnar;
BEGIN;
  INSERT INTO t VALUES (5, 10);

  SELECT * FROM t;
  ┌───┬───┐
  │ a │ b │
  ├───┼───┤
  └───┴───┘
  (0 rows)

  SELECT * FROM t;
  ┌───┬────┐
  │ a │ b  │
  ├───┼────┤
  │ 5 │ 10 │
  └───┴────┘
  (1 row)

codecov · 2021-07-30T15:15:31Z

Codecov Report

Merging #5154 (1a9cb20) into master (262f893) will decrease coverage by 0.00%.
The diff coverage is 85.71%.

❗ Current head 1a9cb20 differs from pull request most recent head 21235c1. Consider uploading reports for the commit 21235c1 to get more accurate results

@@            Coverage Diff             @@
##           master    #5154      +/-   ##
==========================================
- Coverage   94.20%   94.20%   -0.01%     
==========================================
  Files         214      214              
  Lines       42121    42133      +12     
==========================================
+ Hits        39682    39693      +11     
- Misses       2439     2440       +1

onurctirtir · 2021-08-03T14:43:39Z

Doing some manual testing to see how the changes in this pr would affect the things and found a bug that happens on master but not on this branch:

drop table test;
create table test (a int unique) using columnar;

-- s1
begin;
insert into test select s from generate_series(1,2) s;
select * from test; -- force flush write state

-- s2
insert into test select s from generate_series(1,4) s;
ERROR:  unexpected chunk group count

This is because, on master, we use the snapshot provided to columnar_index_fetch_tuple to determine number of chunk groups that stripe has. However, again on master, ReadChunkGroupRowCounts passes snapshot as NULL (probably means using catalog snapshot) when scanning chunk group metadata and above error is thrown.

src/backend/columnar/columnar_tableam.c

jeff-davis

Please review my comment and see if there's a way to clarify the code a bit. Other than that, it looks good.

jeff-davis · 2021-08-24T07:36:43Z

src/backend/columnar/columnar_tableam.c

+		PushCopiedSnapshot(snapshot);
+
+		/* now our snapshot is the active one */
+		UpdateActiveSnapshotCommandId();
+		snapshot = GetActiveSnapshot();
+
+		snapshotCreatedByUs = true;


It's a little hard for me to follow what's happening here. If we aren't using GetActiveSnapshot() anywhere, why not just pop it unconditionally after updating the command ID? And I don't think EndRead is guaranteed to be called, so that means we can end up leaving a snapshot on the stack for no reason.

And if using the active stack is useful, why not always use it, so we don't have to keep track of whether to pop it or not? That would make one code path rather than two, which is more likely to be exercised thoroughly.

hy not just pop it unconditionally after updating the command ID?

Agree, this makes a lot of sense, thank you.

Then I guess pushing our snapshot to stack and not keeping it in readState seem to be the better one at first glance. That way, we can use GetActiveSnapshot() during read operation.
In that case, we would need to call PopActiveSnapshot in EndRead but I'm not sure what happens if EndRead is not guaranteed to be called.
Moreover, if snapshot is not an MVCC snapshot, then we don't push a new snapshot as the active one.
In that case, I think we would still need to keep snapshot in readState to use during the read operation.

If we go with the other option, i.e.: keeping snapshot in readState and immediately calling PopActiveSnapshot in BeginRead, I think this time we should make sure to Register / Unregister snapshot in BeginRead / EndRead. But then what happens if EndRead doesn't get called ?

For this reason, I think we should choose the second option, but not sure how to make sure we unregister the snapshot.

It looks like ColumnarEndRead will be callled even if the executor node doesn't run to completion (e.g. limit or exists query). I thought I remembered a case where it might not be called, but I'm not seeing it now.

So the Push/Update/Pop/Register approach seems best. Let me know if you run into a problem case.

jeff-davis · 2021-08-31T00:50:21Z

Also, I was just thinking about the WITH HOLD cursor case. I don't think we even have tests for that, but it seems to basically work now. We might need to be careful about what resource owner owns any snapshot, etc.

jeff-davis · 2021-08-31T00:48:01Z

src/backend/columnar/columnar_tableam.c

+		PushCopiedSnapshot(snapshot);
+
+		/* now our snapshot is the active one */
+		UpdateActiveSnapshotCommandId();
+		snapshot = GetActiveSnapshot();
+
+		snapshotCreatedByUs = true;


It looks like ColumnarEndRead will be callled even if the executor node doesn't run to completion (e.g. limit or exists query). I thought I remembered a case where it might not be called, but I'm not seeing it now.

So the Push/Update/Pop/Register approach seems best. Let me know if you run into a problem case.

onurctirtir · 2021-08-31T09:59:02Z

Also, I was just thinking about the WITH HOLD cursor case. I don't think we even have tests for that, but it seems to basically work now. We might need to be careful about what resource owner owns any snapshot, etc.

This comment made me do some testing with cursors, looks like changes in this pr doesn't help with the cases where FETCH ALL force-flushes pending writes. Note that below scenario is also reproduceable on older versions (at least on v10.1.2).
The root cause is that we build the readState when reading the first row, so we don't use the snaphot provided when running DECLARE command.

Even if we've called init_columnar_read_state in begin_read, we would still encounter with this issue when doing index-scan. This is because, index_begin_read doesn't provide a snapshot. For this reason, we have to use the snapshot passed to index_fetch_tuple there, which first calls init_columnar_read_state.

I think I should open issue without attempting to solve this in this pr, what do you think @jeff-davis ?

DROP TABLE columnar_table;
CREATE TABLE columnar_table(q1 int8, q2 int8) USING columnar;
INSERT INTO columnar_table VALUES (1, 2), (3, 4);

BEGIN;
  DECLARE columnar_cursor CURSOR WITH HOLD FOR
  SELECT * FROM columnar_table ORDER BY 1,2;

  INSERT INTO columnar_table VALUES (5, 6), (7, 8);

  -- bad, shows (5, 6) and (7, 8) as well :/
  -- heapAM would only show (1, 2) & (3, 4)  
  FETCH ALL FROM columnar_cursor;
┌────┬────┐
│ q1 │ q2 │
├────┼────┤
│  1 │  2 │
│  3 │  4 │
│  5 │  6 │
│  7 │  8 │
└────┴────┘
(4 rows)
COMMIT;

onurctirtir · 2021-09-01T18:15:16Z

src/backend/columnar/columnar_tableam.c

+		/* now our snapshot is the active one */
+		UpdateActiveSnapshotCommandId();
+		snapshot = GetActiveSnapshot();
+		RegisterSnapshot(snapshot);


comment that this doesn't fix the issue with cursors, open an issue as well

Instead of using xact snapshot, use the snapshot provided to columnarAM when scanning table.

Seems that we always increment the command counter right after finishing metadata table modification. For this reason, it makes sense to call CommandCounterIncrement within FinishModifyRelation.

In next commit, we will adjust curcid of the snapshot being used when scanning the columnar table. However, for index scan, snapshot is provided not when beginning scan but within fetch-tuple call. For this reason, start flushing pending writes in init_columnar_read_state since this seem to be a prerequisite step that needs to be done before scanning a columnar table regardless of the scan method being used.

Before starting to scan a columnar table, we always flush the pending writes to disk. However, we increment command counter after modifying metadata tables. On the other hand, now that we _don't always use_ xact snapshot to scan a columnar table, writes that we just flushed might not be visible to the query that just flushed pending writes to disk since curcid of provided snapshot would become smaller than the command id being used when modifying metadata tables. To give an example, before this change, below was a possible scenario due to the changes that we made to use the correct snapshot. ```sql CREATE TABLE t(a int, b int) USING columnar; BEGIN; INSERT INTO t VALUES (5, 10); SELECT * FROM t; ┌───┬───┐ │ a │ b │ ├───┼───┤ └───┴───┘ (0 rows) SELECT * FROM t; ┌───┬────┐ │ a │ b │ ├───┼────┤ │ 5 │ 10 │ └───┴────┘ (1 row) ```

onurctirtir added the columnar label Jul 30, 2021

onurctirtir requested review from jeff-davis and thanodnl July 30, 2021 21:47

onurctirtir force-pushed the col/optimize-index-read branch from 45da562 to 73058d3 Compare August 2, 2021 08:02

Base automatically changed from col/optimize-index-read to master August 2, 2021 08:06

onurctirtir force-pushed the col/use-correct-snapshot branch from 5b1ba38 to fba4998 Compare August 2, 2021 08:54

onurctirtir added the bug label Aug 3, 2021

onurctirtir commented Aug 10, 2021

View reviewed changes

src/backend/columnar/columnar_tableam.c Outdated Show resolved Hide resolved

onurctirtir force-pushed the col/use-correct-snapshot branch from 7a27802 to bd8771a Compare August 16, 2021 12:43

This was referenced Aug 17, 2021

Split columnar stripe reservation into two phases #5188

Merged

Columnar self-insert sometimes runs forever #5180

Closed

onurctirtir force-pushed the col/use-correct-snapshot branch 8 times, most recently from 9969605 to c4fcb01 Compare August 19, 2021 10:08

jeff-davis approved these changes Aug 24, 2021

View reviewed changes

jeff-davis approved these changes Aug 31, 2021

View reviewed changes

onurctirtir commented Sep 1, 2021

View reviewed changes

onurctirtir mentioned this pull request Sep 2, 2021

Subsequent writes become visible to cursor when reading columnar tables #5231

Open

onurctirtir added 3 commits September 2, 2021 11:10

Use correct snapshot when reading a columnar table

0b4ed07

Instead of using xact snapshot, use the snapshot provided to columnarAM when scanning table.

Increment command counter in FinishModifyRelation instead

db0e4ce

Seems that we always increment the command counter right after finishing metadata table modification. For this reason, it makes sense to call CommandCounterIncrement within FinishModifyRelation.

onurctirtir force-pushed the col/use-correct-snapshot branch from bffcdda to bf4dfad Compare September 2, 2021 08:12

onurctirtir merged commit 0bf2920 into master Sep 2, 2021

onurctirtir deleted the col/use-correct-snapshot branch September 2, 2021 08:20

onurctirtir added this to the 10.2 Release milestone Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use correct snapshot when scanning columnar tables #5154

Use correct snapshot when scanning columnar tables #5154

onurctirtir commented Jul 30, 2021 •

edited

Loading

codecov bot commented Jul 30, 2021 •

edited

Loading

onurctirtir commented Aug 3, 2021 •

edited

Loading

jeff-davis left a comment

jeff-davis Aug 24, 2021

onurctirtir Aug 24, 2021

onurctirtir Aug 24, 2021 •

edited

Loading

jeff-davis Aug 31, 2021

jeff-davis commented Aug 31, 2021

jeff-davis Aug 31, 2021

onurctirtir commented Aug 31, 2021 •

edited

Loading

onurctirtir Sep 1, 2021

Use correct snapshot when scanning columnar tables #5154

Use correct snapshot when scanning columnar tables #5154

Conversation

onurctirtir commented Jul 30, 2021 • edited Loading

codecov bot commented Jul 30, 2021 • edited Loading

Codecov Report

onurctirtir commented Aug 3, 2021 • edited Loading

jeff-davis left a comment

Choose a reason for hiding this comment

jeff-davis Aug 24, 2021

Choose a reason for hiding this comment

onurctirtir Aug 24, 2021

Choose a reason for hiding this comment

onurctirtir Aug 24, 2021 • edited Loading

Choose a reason for hiding this comment

jeff-davis Aug 31, 2021

Choose a reason for hiding this comment

jeff-davis commented Aug 31, 2021

jeff-davis Aug 31, 2021

Choose a reason for hiding this comment

onurctirtir commented Aug 31, 2021 • edited Loading

onurctirtir Sep 1, 2021

Choose a reason for hiding this comment

onurctirtir commented Jul 30, 2021 •

edited

Loading

codecov bot commented Jul 30, 2021 •

edited

Loading

onurctirtir commented Aug 3, 2021 •

edited

Loading

onurctirtir Aug 24, 2021 •

edited

Loading

onurctirtir commented Aug 31, 2021 •

edited

Loading