Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval GC #340

Conversation

@thepulkitagarwal
Copy link
Contributor

thepulkitagarwal commented Apr 8, 2019

We mostly made changes in the garbage_collector.h and garbage_collector.cpp files, specifically the ProcessUnlinkQueue function.

We implemented interval compaction on top of the earlier garbage collection (called "regular gc"). While performing the interval compaction, we unlink the undo records that are being compacted, and then push it into the deallocate queue. The deallocation is basically the same as the regular gc (with a minor change for handling varlens).

Slides from Project Update Presentation: Google Docs Link
Design Docs: https://github.com/utkarsh39/terrier/wiki

ProcessTupleVersionChainHead(table, slot, active_txns);
}

return undo_record->txnptr_.IsNull();

This comment has been minimized.

Copy link
@wenxuanqiu

wenxuanqiu May 11, 2019

Contributor

Question: Is this line just going to be false, or ProcessTupleVersionChain and ProcessTupleVersionChainHead can modified undo_record to change that fact? If it is the former, might as well just return false.

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 11, 2019

undo records have an extra field containing the pointer to the corresponding txn. This txn pointer is set to NULL when the undo record is unlinked. So this line is not necessarily false.

// If there are no active transactions, or if the version pointer is older than the oldest active transaction,
// Collect the head of the chain using compare and swap
// Note that active_txns is sorted in descending order, so its tail should have the oldest txn's timestamp
if (version_ptr_timestamp < active_txns->back()) {

This comment has been minimized.

Copy link
@wenxuanqiu

wenxuanqiu May 11, 2019

Contributor

Calling back on empty vector can cause undefined behaviour, need to check with active_txns->empty() first

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 11, 2019

Transaction Manager ensures that if the active_txns is not empty, it inserts a timestamp.

This comment has been minimized.

Copy link
@qdHe

qdHe May 11, 2019

I see in transaction manager that you will return the current timestamp if there are no active transactions. But I still think that returning an empty vector/checking whether it is empty before using it is a more natural way. It will be more understandable and in case someone else calls the function.

This comment has been minimized.

Copy link
@thepulkitagarwal

thepulkitagarwal May 12, 2019

Author Contributor

GetActiveTxns is based on the function OldestTransactionStartTime, which was used by the original GC. OldestTransactionStartTime returns the oldest active transaction's start time. If there are no active transactions, it returns the current time (that is, it returns time_, the internal counter). Similar to OldestTransactionStartTime, GetActiveTxns returns the active transactions' timestamps. And if there are none, it returns the current time.

// Link the compacted record to the version chain.
LinkCompactedUndoRecord(start_record, &curr, next, compacted_undo_record);
} else if (interval_length == 1) {
if (transaction::TransactionUtil::NewerThan(active_txns->back(), curr->Timestamp().load())) {

This comment has been minimized.

Copy link
@wenxuanqiu

wenxuanqiu May 11, 2019

Contributor

active_txns->back() has undefined behaviour if active_txns is empty, you might want to check for that.

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 11, 2019

Transaction Manager ensures that if the active_txns is not empty, it inserts a timestamp.

src/include/storage/garbage_collector.h Outdated Show resolved Hide resolved
// OLAP Transaction starts here
auto *olap_txn = tested.GetTxnManager()->BeginTransaction();
for (uint32_t batch = 0; batch * batch_size < num_txns; batch++) {
auto result = tested.SimulateOltp(batch_size, num_concurrent_txns);

This comment has been minimized.

Copy link
@qdHe

qdHe May 11, 2019

Are there any special functions to simulate the OLAP workload? I do not think a long-running transaction with multiple OLTP queries is equivalent to OLAP workload. But I guess it is enough for garbage collection testing since OLAP workload is less likely to change data and need no GC. It is just unnecessary to name it OLAP?

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 15, 2019

The transaction is replicating the behaviour of OLAP in the sense that it's long-running. That's why we named it OLAP. You are right it doesn't need to do analytical queries, it just needs to sit there for Interval GC to kick in as opposed to regular GC which won't start until the OLAP txn is dead.

// NOLINTNEXTLINE
TEST_F(LargeGCTests, OLAPAndTPCCishWithGC) {
const uint32_t txn_length = 5;
const std::vector<double> update_select_ratio = {0.4, 0.6};

This comment has been minimized.

Copy link
@qdHe

qdHe May 11, 2019

Does the ratio number have any meaning (such as it is a common select ratio and is representative) or you just copy it from the above tests?

This comment has been minimized.

Copy link
@thepulkitagarwal

thepulkitagarwal May 12, 2019

Author Contributor

It is a common select ratio. Since the update to select ratio is low, the expectation is that the number of aborts will be low, but there will still be a considerable number of updates happening thus increasing the version chains' lengths.

// If there are no active transactions, or if the version pointer is older than the oldest active transaction,
// Collect the head of the chain using compare and swap
// Note that active_txns is sorted in descending order, so its tail should have the oldest txn's timestamp
if (version_ptr_timestamp < active_txns->back()) {

This comment has been minimized.

Copy link
@qdHe

qdHe May 11, 2019

I see in transaction manager that you will return the current timestamp if there are no active transactions. But I still think that returning an empty vector/checking whether it is empty before using it is a more natural way. It will be more understandable and in case someone else calls the function.

test/storage/garbage_collector_test.cpp Outdated Show resolved Hide resolved
test/storage/garbage_collector_test.cpp Show resolved Hide resolved

ProjectedRow::CopyProjectedRowLayout(result->varlen_contents_, redo);

return result;
}

private:
friend class GarbageCollector;

This comment has been minimized.

Copy link
@qdHe

qdHe May 11, 2019

Is it necessary to add GarbageCollector as friend class? Add a public function to provide txnptr_ may be better?

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

^^^seconded

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 16, 2019

We would need getters and setters for all 7 fields of undo record class. Moreover, adding get and set would allow everyone to modify the class.

thepulkitagarwal and others added 6 commits May 12, 2019
Co-Authored-By: utkarsh39 <utkarsh39@hotmail.com>
Address review comments
Merge with master
@@ -150,6 +150,33 @@ BENCHMARK_DEFINE_F(GarbageCollectorBenchmark, HighContention)(benchmark::State &
state.SetItemsProcessed(state.iterations() * num_txns - lag_count);
}

/**

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

nit: use // for these comments. We don't doxygen these anyway

/**
* Struct used for safely accessing the Transaction pointer stored in the Undo Record.
*/
struct TransactionPtr {

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor
  1. This should probably just be a class
  2. I think the name TransactionPtr is giving away the implementation, which defeats the purpose of writing this wrapper class. Supposedly this object represents the owned of an undo record, right? So maybe pick a name along that line...
* Returns true if the Undo Record is compacted, false otherwise
* @return if the Undo Record is compacted
*/
bool IsCompacted() { return txn_ == reinterpret_cast<transaction::TransactionContext *>(-1); }

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

OwnedByGC? If you do the earlier renaming...

* @return pointer to the initialized UndoRecord
*/
static UndoRecord *InitializeInsert(byte *const head, const transaction::timestamp_t timestamp, const TupleSlot slot,
DataTable *const table) {
DataTable *const table, transaction::TransactionContext *const txn) {

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

Do you still need the timestamp parameter now that you have the transaction?

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 16, 2019

To handle this, we would need a .cpp file for undo_record as there is a circular dependency. Forward declaration of TransactionContext in undo_record.h won't work.

This comment has been minimized.

Copy link
@tli2

tli2 Jun 1, 2019

Contributor

If you need a cpp file that's fine


ProjectedRow::CopyProjectedRowLayout(result->varlen_contents_, redo);

return result;
}

private:
friend class GarbageCollector;

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

^^^seconded

// Create the UndoBuffer for this GC run
delta_record_compaction_buffer_ = new UndoBuffer(txn_manager_->buffer_pool_);
// The compaction buffer is empty
compaction_buffer_empty_ = true;

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

I don't think you need a separate boolean to track this, right? You can directly ask the buffer if it is empty.

// All of the undo records in my buffers were unlinked before the oldest running txn in the system.
// We are now safe to deallocate these buffers because no running transaction should hold a reference to them
// anymore
while (!buffers_to_deallocate_.empty()) {

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

Not a problem with your PR, but we will eventually want to rewrite these using the deferred action framework introduced in #358

} else {
// This is a txn that may or may not be visible to any running txns. Proceed with unlinking its UndoRecords
// with an Interval GC approach
bool all_unlinked = true;

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

all_unlinked looks suspiciously like a legacy from our previous implementations. Is there a real failure scenario where you need this?

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 16, 2019

We still need this logic to know when all of the transaction's undo records have been unlinked. #325 eliminates this by only processing a transaction for unlinking when it is older than all the active txns. Here we do not and can not make this assumption (Interval GC).

@tli2 tli2 dismissed yangdsh’s stale review May 15, 2019

stale

txns_to_deallocate_.push_front(txn);
txns_processed++;
} else {
// We didn't unlink all of the UndoRecords (UnlinkUndoRecord returned false due to a write-write conflict),

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

same as before, #325 eliminated this failure scenario. Is there something you are doing that brings it back?

STORAGE_LOG_TRACE("GarbageCollector::PerformGarbageCollection(): last_unlinked_: {}",
static_cast<uint64_t>(last_unlinked_));

// Handover compacted buffer for GC

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

nit: Describe the state you expect these buffers to be in. This comment is somewhat ambiguous.

@@ -39,38 +43,177 @@ class GarbageCollector {
std::pair<uint32_t, uint32_t> PerformGarbageCollection();

private:
/**

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

nit: use // instead of doxygen. These documents are not generated anyways, and the checks will not keep the annotations in sync with the code.

This comment has been minimized.

Copy link
@utkarsh39

utkarsh39 May 16, 2019

In other files as well, doxygen is used for private functions. Should all the private functions use //?

void ProcessDeferredActions();

/**
* Given a version chain, perform interval gc on all versions except the head of the chain

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

nit: This needs to be more detailed. What happens after the method is done? What should the state of the version chain be before?

void ProcessTupleVersionChain(DataTable *table, TupleSlot slot, std::vector<transaction::timestamp_t> *active_txns);

/**
* Given the data table and the tuple slot, try unlinking the version chain head for that tuple slot

This comment has been minimized.

Copy link
@tli2

tli2 May 15, 2019

Contributor

nit: Same as above

@tli2

This comment has been minimized.

Copy link
Contributor

tli2 commented May 15, 2019

Overall this PR looks pretty good. One issue I have with this is that this makes interval GC mandatory for everything. Two clear candidates for generalization here is a threshold number of records to compact (i.e., only compact groups of more than n delta records), and a threshold window for transactions to be considered for compaction (i.e., only compact for transactions that are more than 100 timestamps in the past).

What parameters are optimal needs to be determined with further benchmarking, but this implementation makes me somewhat worried about degenerate cases, where the timings of the transactions are such that the GC always does maximal wasteful work.

huzaifaabbasi and others added 4 commits May 16, 2019
* Refactored Transaction Pointer Struct

* Fixed Comments
Interval GC: Addressing review comments
@utkarsh39

This comment has been minimized.

Copy link

utkarsh39 commented May 16, 2019

Overall this PR looks pretty good. One issue I have with this is that this makes interval GC mandatory for everything. Two clear candidates for generalization here is a threshold number of records to compact (i.e., only compact groups of more than n delta records), and a threshold window for transactions to be considered for compaction (i.e., only compact for transactions that are more than 100 timestamps in the past).

What parameters are optimal needs to be determined with further benchmarking, but this implementation makes me somewhat worried about degenerate cases, where the timings of the transactions are such that the GC always does maximal wasteful work.

We discussed this and we think this could be done as a separate PR later.

@tli2

This comment has been minimized.

Copy link
Contributor

tli2 commented Jun 6, 2019

This is good stuff. But due to changes to the version chain in #404, and ongoing effort to rewrite the GC with the generalized deferred event framework, this can no longer be cleanly merged. We will selectively migrate stuff over to the new GC after we are finished with it.

@tli2 tli2 closed this Jun 6, 2019
@mbutrovich mbutrovich removed the in-progress label Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.