New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A variety of bug fixes and performance improvements #673

Merged
merged 10 commits into from Aug 7, 2018

Conversation

Projects
None yet
2 participants
@etschannen
Copy link
Contributor

etschannen commented Aug 6, 2018

The most notable change is to lower the priority of reading log router tags. Without these changes, when a remote datacenter has been down for a long time in a fearless configuration it will push the cluster into saturation when it comes back online as it copies all of the old data across the WAN.

etschannen added some commits Aug 3, 2018

tlogs serve reads to log routers at a low priority, to prevent them f…
…rom using all their resources catching up a remote dc that has been down for a long time

increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget

@etschannen etschannen requested a review from ajbeamon Aug 6, 2018

@@ -23,6 +23,7 @@ Performance
* Clients optimistically assume the first leader reply from a coordinator is correct. `(PR #425) <https://github.com/apple/foundationdb/pull/425>`_
* Network connections are now closed after no interface needs the connection. [6.0.1] `(Issue #375) <https://github.com/apple/foundationdb/issues/375>`_
* Significantly improved the CPU efficiency of copy mutations to transaction logs during recovery. [6.0.2] `(PR #595) <https://github.com/apple/foundationdb/pull/595>`_
* A cluster configured with usable_regions=2 did not limit the rate at which it could copying data from the primary DC to the remote DC. This caused poor performance when recovering from a DC outage. [6.0.5] `(PR #673) <https://github.com/apple/foundationdb/pull/673>`_

This comment has been minimized.

@ajbeamon

ajbeamon Aug 6, 2018

Contributor

"could copying" -> "could copy"

@@ -36,6 +37,8 @@ Fixes
* A client could fail to connect to a cluster when the cluster was upgraded to a version compatible with the client. This affected upgrades that were using the multi-version client to maintain compatibility with both versions of the cluster. [6.0.4] `(PR #637) <https://github.com/apple/foundationdb/pull/637>`_
* A large number of concurrent read attempts could bring the database down after a cluster reboot. [6.0.4] `(PR #650) <https://github.com/apple/foundationdb/pull/650>`_
* Automatic suppression of trace events which occur too frequently was happening before trace events were suppressed by other mechanisms. [6.0.4] `(PR #656) <https://github.com/apple/foundationdb/pull/656>`_
* After a recovery, the rate at which transactions logs made mutations durable to disk was around 5 times slower than normal. [6.0.5] `(PR #666) <https://github.com/apple/foundationdb/pull/666>`_

This comment has been minimized.

@ajbeamon

ajbeamon Aug 6, 2018

Contributor

"transactions logs" -> "transaction logs"

while( totalSize < SERVER_KNOBS->UPDATE_STORAGE_BYTE_LIMIT && sizeItr != logData->version_sizes.end()
&& (logData->bytesInput.getValue() - logData->bytesDurable.getValue() - totalSize >= SERVER_KNOBS->TLOG_SPILL_THRESHOLD || sizeItr->value.first == 0) )
{
Void _ = wait( yield(TaskUpdateStorage) );

This comment has been minimized.

@ajbeamon

ajbeamon Aug 6, 2018

Contributor

We added the yields here because the work done in updateStorage was sometimes starving higher priority work. It does look like this loop has gotten cheaper, but is there still a possibility we could spend a long time here?

This comment has been minimized.

@ajbeamon

ajbeamon Aug 7, 2018

Contributor

After talking about this, I think we're ok without the yield. The case that could be problematic is if we had a lot of empty versions to process, but I think the risk of that is lower with this spill policy which has us regularly making progress.

@@ -991,6 +955,12 @@ ACTOR Future<Void> tLogPeekMessages( TLogData* self, TLogPeekRequest req, Refere
Void _ = wait( delay(SERVER_KNOBS->TLOG_PEEK_DELAY, g_network->getCurrentTask()) );
}

if( req.tag.locality == tagLocalityLogRouter ) {
Void _ = wait( self->concurrentLogRouterReads.take() );
state FlowLock::Releaser globalReleaser(self->concurrentLogRouterReads);

This comment has been minimized.

@ajbeamon

ajbeamon Aug 6, 2018

Contributor

This confused me for a minute, but I guess it works because the releaser is a state variable? Is there a way to write this that doesn't give the appearance that the lock gets released at the end of the if scope?

This comment has been minimized.

@ajbeamon

ajbeamon Aug 7, 2018

Contributor

We use this pattern elsewhere too, so we can leave this for now and not block this PR.

@@ -854,7 +818,7 @@ void commitMessages( Reference<LogData> self, Version version, const std::vector
self->messageBlocks.push_back( std::make_pair(version, block) );
addedBytes += int64_t(block.size()) * SERVER_KNOBS->TLOG_MESSAGE_BLOCK_OVERHEAD_FACTOR;

self->version_sizes[version] = make_pair(expectedBytes, expectedBytes);
self->version_sizes[version] = std::make_pair(expectedBytes, txsBytes);

This comment has been minimized.

@ajbeamon

ajbeamon Aug 6, 2018

Contributor

I did a quick look and didn't see where we were using the different version sizes distinctly (txsBytes vs. non-txsBytes). Can this just be combined to one value?

This comment has been minimized.

@ajbeamon

ajbeamon Aug 7, 2018

Contributor

I missed the one spot where we did use the first value only. We can disregard this comment.

++it.first;
}
}
Map<Version, std::pair<int,int>>::iterator sizeItr = logData->version_sizes.begin();

This comment has been minimized.

@ajbeamon

ajbeamon Aug 6, 2018

Contributor

I think I get the gist of this, but let's talk it over to make sure I understand it.

This comment has been minimized.

@ajbeamon

ajbeamon Aug 7, 2018

Contributor

We talked, and I think I'm satisfied with this.

@ajbeamon ajbeamon merged commit 24dec15 into apple:release-6.0 Aug 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment