Implement cluster shrink (2nd phase)#2247
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
After f2f discussion, we need to evaluate CTAS approach for mat views, as current approach can have potential issues with race condition, if one mat view depends on another mat view. Created GG-225. |
Why? If the database is still available to users during shrink, what will be the state of their temporary tables after that? |
Current approach won't be cut off for now? |
Problem description: Before this patch, in order to rebalance a materialized view, 2 steps were required: the actual rebalance where distribution policy was updated, and the refresh step to update the data in the materialized view. This approach had 2 problems with respect to usage in 'ggrebalance' tool for cluster shrink: 1. It could change the actual data in the materialized view before the cluster shrink, and after the shrink, if the view was not up-to-date. We intend to keep the logical data in the cluster not altered. 2. If a materialized view depends on another materialized view, there could be a race condition when doing the refresh, when we try to refresh based on the yet-not-refreshed one. Fix: Use the CTAS approach from the EXPAND TABLE specifically when we are rebalancing a materialized view. It creates a temp table with a correct distribution policy, where all data from the materialized view is copied, and then the relfilenode of the materialized view is swapped with the temp table. It keeps the data as it was before the rebalance, even if it was not up-to-date (therefore we will not surprise the user with the not expected view content), and it eliminates dependencies on other objects besides the materialized view itself. (cherry picked from commit 37dc7e7)
I've updated handling of matviews. Now they do not require REFRESH step. Please note that there are changes in |
According to requirements, we need to rebalance tables with "relpersistence = 'p' | relpersistence = 'u'". |
Implement cluster shrink (2nd phase)
List of changes:
tables, partitioned tables, unlogged tables. Skip processing of temp tables.
It is done to comply with the requirements.
to rebalance the table. It is needed as one could drop it in parallel after we
have created the rebalance table list.
other session opens a transaction after we have created the rebalance table
list, drops the table before we started to rebalance it, and commits the
transaction when we started to rebalance the table (and are hanging on the
table's locks).
stopped strictly after primaries in order to avoid hanging replication
processes.
segments. Now we only emit a warning. It is done to comply with the
requirements.
we will not stop in case of an exception inside the 'SegmentStopAfterShrink'
worker. So now, when a fault is injected, send SIGINT to the ggrebalance
process to halt its work.
but didn't use it. Instead, they tried to use the connection from the context,
which was not properly configured.
materialized views and unlogged tables.
crashing it.