Lazy Schema Change #342
Lazy Schema Change #342
Changes from all commits
bcbbcaa
63f80d2
ebb0ef1
1723b43
cf91e19
a3f1b96
c82b7ff
9c7d5ef
36b38f3
0f173ee
c85242e
66b6aa9
8717f51
2ba3914
1d7d2db
2776858
7506fc9
03402cb
01f48cf
21b968b
87b6bf3
99d95cb
5872f3c
de11304
8d845f6
7ae48ae
504e450
a4d7b57
6cdee78
ea1305a
ae06652
c1e42bb
d36ba88
b4d0a4a
00a6157
7f701a3
d807187
1d535f6
0500a92
6e6a206
6a58b45
d3ecdfb
bd81c27
d54a401
fb9de00
2ea3c0a
7f202d5
7fb5aae
9b39fd5
1507239
fc2c1e2
812fb74
7c88a17
2a91470
75b0d72
9ea38ac
40e58b6
3bc713c
d81f871
a3229e7
b6f7677
a87cc46
f9df645
f171243
28718a8
2307125
e701941
829ecf9
aa452a0
b24bce6
0d118d2
0ad097c
ca7de04
a99665e
4da8640
057accb
882d924
0dd5c76
c604438
176763c
e66de92
3845842
e6196b1
d103214
dd497df
42fb1ee
0c07a63
7915171
e0579c6
35a994b
144c85b
018c7c5
bd4739a
155b974
3fb108d
94030a6
8d7a026
ce5ab7e
8b140d1
f1d09df
c7a0eb3
bcf31eb
32b9a26
803d1e3
6cbc928
4d72fa1
d459af4
d988b1e
f6c3150
e010648
c6df5bf
5359d51
1d2ef4e
2a30e06
587cce0
d450c13
2d63f06
aef1390
a6af4a1
575b96b
5681652
87413eb
25cdb88
36adf88
aac0a38
2f17b80
d7f2841
44fc635
7178975
bcbc157
eb4053b
b56aff7
9978b50
87531a2
c0187a3
dd41f99
ac59e0a
d75b01c
78b3aec
ab71a1f
d832ca2
45a4124
d32991d
5660b57
265772c
de37aba
98d13dc
2571408
76a31af
bb8ede5
f744b35
739ae0b
9772ede
abf3db0
d835052
877cc71
a4d3216
0fa78ed
3eccde7
b56a489
6e92e9a
c330036
932cf75
ee6f639
4d6cceb
50c947d
3bdca7f
22c3041
404270a
48bcfde
b6631cf
66330ac
8fcaa12
efc35a4
3a7ac68
614be3e
033f0cc
76aebd8
8ba1635
a9a4565
4fa1ee1
9384ea0
3562388
14d90c9
c4075a9
9dfece0
5b23977
9cca570
8d22128
9a7384a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# Non-Blocking Alter Table Support (SqlTable) | ||
|
||
## Overview | ||
The overall goal of this project is to implement lazy, non-blocking schema changes. The current storage layer does not support any schema changes. The first goal of this project is to support schema change operations including add column, drop column, and default value changes. However, the overall goal would be to carry this out in a non-blocking fashion, for which we decided to go with a lazy evaluation approach. This means that any schema change would not migrate existing tuples to the new schema until information that only exists in the new schema is modified. | ||
|
||
## Scope | ||
Almost all of the work will be localized to the SqlTable object as that is the access point from the execution layer to the storage layer. Also our design will not affect the underlying structures of the tuples or the DataTables and after this change they should still be able to provide their functionality without having to go through the SqlTable. Within the SqlTable we will be changing the following modules. Below we will refer to expected version as the version of the tuple the user expects to see and actual version as the version the tuple is currently in. These two versions will differ when a user has updated schema, but due to the lazy evaluation the tuple hasn’t been transformed into the latest version. | ||
|
||
### SqlTable::Insert | ||
- Insert will now take in a schema version number that indicates the version of the tuple being inserted. | ||
- In this case the actual and expected versions will be the same as an insert will always put a tuple in the version that is passed in. | ||
|
||
### SqlTable::Update | ||
- Update will be required to take in a schema version number that indicates the expected schema version number of the tuple being updated | ||
- In this case the actual version number will differ from the expected version in cases where the tuple being updated has not been shifted to the expected version. | ||
- If the update modifies columns that are not in the actual version then the tuple will be shifted to the expected version before applying the update. | ||
|
||
### SqlTable::Scan | ||
- Scan will be required to take in a schema version number that indicates the schema version number that the current transaction sees. | ||
|
||
### SqlTable::Delete | ||
- Delete will be required to take in the schema version number that indicates the expected schema version number of the tuple being deleted | ||
|
||
### Catalog | ||
- The catalog will keep track of the visible schema versions for each transactions based on its timestamp. This schema version is then passed on to the SqlTable layer. | ||
|
||
### DataTable/Projected Row | ||
- DataTable::SelectIntoBuffer iterates across a projected_row/column’s column ids and fills in the data for each column | ||
|
||
- For this operation we use the column_id VERSION_POINTER_COLUMN_ID as a sentinel id to represent a column that the DataTable should skip over and not fill in. We will go into detail as to why this happens within the architectural design below. | ||
|
||
|
||
## Architectural Design | ||
The design of this project will center around the modifications to SqlTable. The design for other components in the storage layer will remain unchanged. On a schema change the SqlTable will create a new DataTable that will store all the tuples inserted from that point on into the new version. To be lazy it will not modify already existing tuples to transform them into the latest version. In order to support this lazy schema change we need two functionalities: maintaining tuples in multiple different schema versions and providing methods of transforming them into the desired version. | ||
|
||
### Multi-versioning | ||
To address the multi-versioning the SqlTable will maintain a map from the schema_version_number to a DataTable. There will be one DataTable for each schema version. The functionality for accessing tuples is already present in DataTable and this way SqlTable will only need to manage the two functionalities we described above. The rest of the functionalities will be handled by the already existing DataTable implementation. Furthermore, each block will maintain metadata of which version all of the tuples within the block belong to. Since each DataTable can only be a single version, a block cannot contains tuples from multiple versions. Below is a description of the multiversion design for each of the SqlTable operations we are modifying and we will refer to expected version as the version of the tuple the user expects to see and actual version as the version the tuple is currently in. | ||
|
||
#### SqlTable::Insert | ||
Insert will always insert the passed in tuple into the DataTable of the schema version number passed in | ||
|
||
#### SqlTable::Update | ||
- Update has three cases | ||
1. The expected schema version matches the actual schema version | ||
- The update will happen in place on the DataTable of the actual schema version | ||
2. The expected schema version doesn’t match the actual schema version but the update doesn’t touch any columns not in the actual schema version | ||
- The update will happen in place on the DataTable of the actual schema version | ||
3. The expected schema version doesn’t match the actual schema version and the update touches not in the actual schema version. The following steps occur | ||
- Retrieve the tuple from the actual version DataTable | ||
- Transform the tuple to the expected version | ||
- Delete the tuple from the actual version DataTable | ||
- Insert the tuple into the expected version DataTable | ||
|
||
#### SqlTable::Scan | ||
- SqlTable will maintain its own slot iterator which will be used to iterate across all the schema version. The iterator interface exposed to the user will not change | ||
- The iterator for Scan must always begin on the latest version | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am doubting if this method really solves every problem. See the trace in https://github.com/cmu-db/terrier/pull/342/files#r275173692 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As the discussion in https://github.com/cmu-db/terrier/pull/342/files#r275175240, it seems it is not the actual solution you used to solve the scan/update conflict. You rely on MVCC to solve the conflict. Maybe the document also need to change accordingly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This requirement is solving a different problem than what was posed there. MVCC solves the problem of a race across transactions during the insert/delete process, but this addresses the risk of processing the same logical tuple twice. Specifically, if we are using a Hyper-style execution model of process a group of tuples as far as we can and this pipeline involves an update that forces migration to the latest version, then we must process the latest version first. For example, if we add a column and then systematically set the value of this column (via update and not a default value at the time we changed the schema), then we will migrate every tuple to the current version. If we iterate over older versions first, then when we reach the current version they will be visible because a transaction can see its own writes. While we could handle this by tracking migrated tuples explicitly, it is cleaner to iterate over the current version first so that when we trigger migrations they are inserted behind the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. So the user must guarantee that each update/scan call does not interleave with each other. But they can call scan, then update, then scan, etc, which is solved by this method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it mean we are only allowed to update the scanned tuples in the same transaction? For example, if we updated an unscanned tuple and it was migrated to an already scanned data table, then we would never be able to see it in the scan. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the update occurred in a different transaction, then the scan will see the old "deleted" tuple because the other transactions writes are not visible to the scan (snapshot isolation). |
||
- In cases where the user is interchanging scan call and update calls, if the scan iterator were to start on an older version then it is possible for a tuple retrieved from the scan to be updated, which could move it into the latest schema version | ||
- This would mean that when the iterator gets to the latest schema version it will read all the tuples in the DataTable for that version and this would cause that tuple to have been read twice within the scan. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to the code, under case "Expected schema version differs from the actual schema version", Scan() also returns the temporarily transformed tuples, similar to Select() right? Maybe the doc can make this clearer here. |
||
#### SqlTable::Select | ||
- There are two cases | ||
1. Expected schema version matches actual schema version | ||
- The tuple is directly selected from the DataTable for that schema version | ||
2. Expected schema version differs from the actual schema version | ||
- The tuple is selected from the DataTable for that schema version and during this process it is transformed to the expected schema version | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will the transformation also happen in the underlying storage? Or it is only a temporary result and then be thrown away. I think it is better to clarify that in the document. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The transformation here is temporary. The only time we persist the transformation is in case 3 for update where the tuple is moved between DataTables via a delete and insert. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the document, you said that your benchmark showed that reading a tuple which version doesn't match has a penalty to a factor of 5x. Consider the case that a hot tuple is in an older version. Each read needs to pay a 5x penalty. If the tuple is hot, we will transform the tuple again and again, and the overhead could be large. I think you may need some policy to migrate the tuple in such case, even we are only reading the tuple, to reduce the overhead. |
||
- This transformed tuple is returned | ||
|
||
#### SqlTable::Delete | ||
- Delete will directly delete the tuple from the DataTable of the actual version of that tuple | ||
|
||
#### SqlTable::UpdateSchema | ||
- This function is the access point through which users can update the schema by passing in a new schema object | ||
- The SqlTable will construct a new DataTable to maintain all of the tuples inserted for this version. | ||
- To be lazy, none of the already existing tuples will be modified in this call. | ||
|
||
### Transformation | ||
In the current interface the user can only retrieve data from a SqlTable through a ProjectedRow or a ProjectedColumn, we will refer to both as Projection in this section for simplicity. The user passes in a Projection which is filled by the storage layer. The Projection passed in by the user will be in the expected version but the actual version of the data could be different so we need to provide a way of transforming between versions. To do this we modify the header of the Projection. | ||
|
||
The header contains metadata regarding column_ids and column_offsets which is used by the DataTable to populate the Projection. Furthermore the column_ids can be different between schema versions. Before passing the Projection to the DataTable of the actual version we translate the column_ids that are in the expected version to column_ids of the actual version. Then for any column that is not present in the actual version we set the column_id to a sentinel value (VERSION_POINTER_COLUMN_ID, as no column in the Projection should have that id). We pass this modified Projection to the DataTable which populates it, skipping over any columns with the id set to the sentinel value. Then the we reset the Projection header to the original header and fill in any default values for column that were not present in the actual version. This way we avoid having to copy data from one version to another and it is filled in only once. | ||
|
||
## Design Rationale | ||
In order to support lazy schema changes we will need to maintain tuples in multiple different schema versions and provide methods of transforming them into the desired version. | ||
|
||
### Backend for different schema versions | ||
The initial decision for storing schema versions within an SQL table was whether to back it by a vector or map. While we appreciated the simplicity and probable performance benefits of a vector, we ultimately decided to go with a map because it did not force our versioning to start at 0. While this constraint does not seem significant at first, we realized that if the database is restored from a checkpoint we should restore the schema version number (since it may be exposed to the user or tracked by a hot-backup) rather than reinitialize the versions to 0. | ||
|
||
The second decision for our backend was whether we should protect the underlying data structure with latches or use a latch-free structure. We ultimately decided to use a latch-free data structure because we decided it would be difficult to reason about every possible point the latch would be needed (essentially any version check) without wrapping the map in another abstraction level. Additionally, we had serious concerns about introducing 4 to 10 latch operations on the path of every single SQL action in every single transaction and that would guarantee large numbers of cache invalidations. We therefore decided to go with the ConcurrentMap implementation in the database (wrapper of tbb::concurrent_hash_map) which supports the common case of concurrent insertions and lookups. However, this creates future difficulties for supporting compaction/deletion of obsolete schema versions because erasures are unsafe. Unfortunately, we are not aware of any candidate hash map implementation that supports safe insertions, deletions, and lookups without utilizing latches. | ||
|
||
### Transforming old versions into the expected version | ||
We recognized two possible ways to transform the data stored under old schemas into the expected schema for a transaction: (1) attribute-wise copying of the data from an old ProjectedRow to a new one and (2) rewriting the header on the new ProjectedRow (provided by the caller) to be readable by an older data table. We initially implemented (1) because it was far simpler logic. However, when we benchmarked the implementation for cross-version selects we observed a significant performance penalty (factor of 10x). We therefore have switched to (2) and have reduced the penalty to a factor of 5x. We are still working on improving this even further. | ||
|
||
## Testing Plan | ||
Our current testing plan is to implement two new test suites that test both the sequential correctness and concurrent robustness of the implementation. The sequential test suite focuses on ensuring that known edge cases are handled correctly. We focus on a sequential test for these situations because we can more tightly control the ordering of actions. We are also implementing a concurrent test suite which will ensure that performs a mini-stress test that tries to verify that rare, but possible, race conditions are likely to be detected. For this we focus on straining access to the versioning scheme by ensuring we are doing concurrent inserts and reads on the hash map. | ||
|
||
In addition to formal tests, we are also benchmarking our implementation as we go to ensure we measure and understand the performance impacts of our changes. Specifically, we are focusing on performance impact across a range of simulated workloads (selects, inserts, updates, and a mix) and in two general situations: a single schema version and multiple versions. The goal here is to ensure we can quantify our impact against the current single-versioned implementation as well as quantify the performance degradation for on-the-fly schema manipulation of old data that has not migrated to the new schema version. | ||
|
||
## Trade-offs and Potential Problems | ||
**Trade-off:** TBB Concurrent Unordered Map for storing DataTableVersion objects. This decision gives a simple and easy to integrate solution for supporting concurrent insertions (new schemas) and lookups (all DML transactions) on the data tables. However, this will limit options when we start to implement compaction of obsolete schema versions because the data structure does not support concurrent erasures. This likely means we will have to take a heavy-weight approach for compaction such as duplicating the structure without the data to be erased and then use an unlink-delete staged method similar to how the GC already works on undo records. | ||
|
||
**Trade-off:** Our decision to manually mangle the headers for ProjectedRow and ProjectedColumn greatly increases the code complexity (manual recreation of the headers) in order to significantly improve performance for reading data across schema versions. Specifically, we avoid an unnecessary allocation and deallocation for temporary projections by allowing old schema versions to write directly into the final projection. | ||
|
||
## Future Work | ||
### Pending tasks | ||
#### Default values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Outdated? You have done some work about default values. |
||
- Populate the default values into the ProjectRow during Select and Scan operations. | ||
- Handle changes to the default values. Should they be considered as a schema change or Catalog maintains the default values that can be queried by the SqlTable? | ||
|
||
### Stretch goals | ||
- Rolling back schema change transactions. | ||
- Implementing a Compactor to remove DataTables of older versions that don’t contain any tuples. | ||
- Serializing transactions with unsafe/conflicting schema changes by using a central commit latch, allowing only one transaction to commit at a time. Rollback if the validation checks fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also modified
InitializerForProjectedColumns
,InitializerForProjectedRows
.etc. You maybe also need to document them here.