-
Notifications
You must be signed in to change notification settings - Fork 112
Handling Unchanged TOAST Columns as a part MIRROR for CDC #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nexus/sqlparser-rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this file
@@ -388,3 +367,1097 @@ func (s *E2EPeerFlowTestSuite) Test_Complete_Simple_Flow() { | |||
|
|||
env.AssertExpectations(s.T()) | |||
} | |||
|
|||
func (s *E2EPeerFlowTestSuite) Test_Toast_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
371-440 lines are duplicate of e2e/peer_flow_test.go:442-505
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Nochanges_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
442-505 lines are duplicate of e2e/peer_flow_test.go:507-581
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_1_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
507-581 lines are duplicate of e2e/peer_flow_test.go:610-678
(dupl)
} | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_2_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
610-678 lines are duplicate of e2e/peer_flow_test.go:680-748
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_3_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
680-748 lines are duplicate of e2e/peer_flow_test.go:371-440
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_1_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1100-1175 lines are duplicate of e2e/peer_flow_test.go:1177-1246
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_2_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1177-1246 lines are duplicate of e2e/peer_flow_test.go:1248-1317
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_3_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1248-1317 lines are duplicate of e2e/peer_flow_test.go:962-1032
(dupl)
@@ -388,3 +370,1097 @@ func (s *E2EPeerFlowTestSuite) Test_Complete_Simple_Flow() { | |||
|
|||
env.AssertExpectations(s.T()) | |||
} | |||
|
|||
func (s *E2EPeerFlowTestSuite) Test_Toast_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
374-443 lines are duplicate of e2e/peer_flow_test.go:445-508
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Nochanges_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
445-508 lines are duplicate of e2e/peer_flow_test.go:510-584
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_1_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
510-584 lines are duplicate of e2e/peer_flow_test.go:613-681
(dupl)
} | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_2_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
613-681 lines are duplicate of e2e/peer_flow_test.go:683-751
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_3_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
683-751 lines are duplicate of e2e/peer_flow_test.go:374-443
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Nochanges_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1037-1320 lines are duplicate of e2e/peer_flow_test.go:965-1249
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_1_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1103-1178 lines are duplicate of e2e/peer_flow_test.go:1180-1249
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_2_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1180-1249 lines are duplicate of e2e/peer_flow_test.go:1251-1320
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_3_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1251-1320 lines are duplicate of e2e/peer_flow_test.go:965-1035
(dupl)
s.sfHelper = sfHelper | ||
|
||
// for every test, drop the _PEERDB_INTERNAL schema | ||
s.sfHelper.client.DropSchema("_PEERDB_INTERNAL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
Error return value of s.sfHelper.client.DropSchema
is not checked (errcheck)
matchData: "", | ||
batchID: syncBatchID, | ||
stagingBatchID: stagingBatchID, | ||
unchangedToastColumns: utils.KeysToString(r.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString
matchData: string(oldItemsJSON), | ||
batchID: syncBatchID, | ||
stagingBatchID: stagingBatchID, | ||
unchangedToastColumns: utils.KeysToString(r.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString
matchData: string(itemsJSON), | ||
batchID: syncBatchID, | ||
stagingBatchID: stagingBatchID, | ||
unchangedToastColumns: utils.KeysToString(r.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString) (typecheck)
matchData: "", | ||
batchID: syncBatchID, | ||
items: typedRecord.Items, | ||
unchangedToastColumns: utils.KeysToString(typedRecord.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString
matchData: string(oldItemsJSON), | ||
batchID: syncBatchID, | ||
items: typedRecord.NewItems, | ||
unchangedToastColumns: utils.KeysToString(typedRecord.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString
matchData: string(oldItemsJSON), | ||
batchID: syncBatchID, | ||
stagingBatchID: stagingBatchID, | ||
unchangedToastColumns: utils.KeysToString(r.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString
matchData: string(itemsJSON), | ||
batchID: syncBatchID, | ||
stagingBatchID: stagingBatchID, | ||
unchangedToastColumns: utils.KeysToString(r.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString) (typecheck)
matchData: "", | ||
batchID: syncBatchID, | ||
items: typedRecord.Items, | ||
unchangedToastColumns: utils.KeysToString(typedRecord.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString
matchData: string(oldItemsJSON), | ||
batchID: syncBatchID, | ||
items: typedRecord.NewItems, | ||
unchangedToastColumns: utils.KeysToString(typedRecord.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString
matchData: string(itemsJSON), | ||
batchID: syncBatchID, | ||
items: typedRecord.Items, | ||
unchangedToastColumns: utils.KeysToString(typedRecord.UnchangedToastColumns), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
undefined: utils.KeysToString) (typecheck)
This reverts commit e9856d5.
@@ -388,3 +381,1097 @@ func (s *E2EPeerFlowTestSuite) Test_Complete_Simple_Flow() { | |||
|
|||
env.AssertExpectations(s.T()) | |||
} | |||
|
|||
func (s *E2EPeerFlowTestSuite) Test_Toast_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
385-454 lines are duplicate of e2e/peer_flow_test.go:456-519
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Nochanges_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
456-519 lines are duplicate of e2e/peer_flow_test.go:521-595
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_1_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
521-595 lines are duplicate of e2e/peer_flow_test.go:624-692
(dupl)
} | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_2_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
624-692 lines are duplicate of e2e/peer_flow_test.go:694-762
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_3_BQ() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
694-762 lines are duplicate of e2e/peer_flow_test.go:385-454
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
976-1260 lines are duplicate of e2e/peer_flow_test.go:1048-1331
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Nochanges_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1048-1331 lines are duplicate of e2e/peer_flow_test.go:976-1260
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_1_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1114-1189 lines are duplicate of e2e/peer_flow_test.go:1191-1260
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_2_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1191-1260 lines are duplicate of e2e/peer_flow_test.go:1262-1331
(dupl)
env.AssertExpectations(s.T()) | ||
} | ||
|
||
func (s *E2EPeerFlowTestSuite) Test_Toast_Advance_3_SF() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
1262-1331 lines are duplicate of e2e/peer_flow_test.go:976-1046
(dupl)
@iskakaushik @saisrirampur Hi folks. Sorry for reviving old thread but I'm curious about something. Unfortunately I'm not familiar with Go so it's hard for me to parse 15 changed files. So let's say old value of TOAST column is not available in the current batch and you're parsing WAL stream that looks like this
Let's say you're processing Do I understand it correctly ? |
TOAST Storage:
Postgres utilizes TOAST storage for large column values (>8KB) instead of storing them directly in the table's heap (data pages). TOAST breaks down the value into multiple pieces and stores them separately, providing compression by default for optimization. The end-user is not aware of this internal storage mechanism. Additional information about TOAST can be found here.
Problem:
In Postgres, the logical decoding/replication feature is used by MIRROR for CDC. Logical decoding provides the operation type (INSERT, DELETE, UPDATE) and all column values for each row/tuple. However, if a column value is a TOAST and remains unchanged during a data manipulation language (DML) operation, Postgres does not provide the actual column value. Instead, it offers a pointer that refers to the
pg_catalog.*
table/chunk. It becomes the responsibility of the client to fetch the value based on this pointer. This poses a challenge when normalizing raw changes to the final (normalized) table, as TOAST columns can be nullified.Solution:
Step 1 can be a significant optimization for workloads that frequently insert/update large values of the same row. This is particularly applicable in IoT/NoSQL-like workloads. PeerDB controls the population of unchanged column values per table based on the largest value seen in that batch.