Skip to content

Commit

Permalink
Merge pull request #892 from etschannen/release-6.0
Browse files Browse the repository at this point in the history
A wide variety of bug fixes and performance improvements related to multi region configurations
  • Loading branch information
etschannen committed Nov 10, 2018
2 parents 6175c12 + b8381b3 commit 4bfb05f
Show file tree
Hide file tree
Showing 38 changed files with 1,059 additions and 519 deletions.
2 changes: 1 addition & 1 deletion FDBLibTLS/FDBLibTLSSession.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ bool FDBLibTLSSession::verify_peer() {
if (!rc) {
// log the various failure reasons
for (std::string reason : verify_failure_reasons) {
TraceEvent(reason.c_str(), uid);
TraceEvent(reason.c_str(), uid).suppressFor(1.0);
}
}

Expand Down
9 changes: 8 additions & 1 deletion documentation/sphinx/source/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Release Notes
#############

6.0.14
6.0.15
======

Features
Expand Down Expand Up @@ -30,6 +30,7 @@ Performance
* Significantly reduced master recovery times for clusters with large amounts of data. [6.0.14] `(PR #836) <https://github.com/apple/foundationdb/pull/836>`_
* Reduced read and commit latencies for clusters which are processing transactions larger than 1MB. [6.0.14] `(PR #851) <https://github.com/apple/foundationdb/pull/851>`_
* Significantly reduced recovery times when executing rollbacks on the memory storage engine. [6.0.14] `(PR #821) <https://github.com/apple/foundationdb/pull/821>`_
* Clients update their key location cache much more efficiently after storage server reboots. [6.0.15] `(PR #892) <https://github.com/apple/foundationdb/pull/892>`_

Fixes
-----
Expand Down Expand Up @@ -59,6 +60,8 @@ Fixes
* Excluding a process that was both the cluster controller and something else would cause two recoveries instead of one. [6.0.12] `(PR #784) <https://github.com/apple/foundationdb/pull/784>`_
* Configuring from ``three_datacenter`` to ``three_datacenter_fallback`` would cause a lot of unnecessary data movement. [6.0.12] `(PR #782) <https://github.com/apple/foundationdb/pull/782>`_
* Very rarely, backup snapshots would stop making progress. [6.0.14] `(PR #837) <https://github.com/apple/foundationdb/pull/837>`_
* Sometimes data distribution calculated the size of a shard incorrectly. [6.0.15] `(PR #892) <https://github.com/apple/foundationdb/pull/892>`_
* Changing the storage engine configuration would not effect which storage engine was used by the transaction logs. [6.0.15] `(PR #892) <https://github.com/apple/foundationdb/pull/892>`_

Fixes only impacting 6.0.0+
---------------------------
Expand All @@ -74,6 +77,10 @@ Fixes only impacting 6.0.0+
* The transaction logs were doing a lot of unnecessary disk writes. [6.0.12] `(PR #784) <https://github.com/apple/foundationdb/pull/784>`_
* The master will recover the transaction state store from local transaction logs if possible. [6.0.12] `(PR #801) <https://github.com/apple/foundationdb/pull/801>`_
* A bug in status collection led to various workload metrics being missing and the cluster reporting unhealthy. [6.0.13] `(PR #834) <https://github.com/apple/foundationdb/pull/834>`_
* Data distribution did not stop tracking certain unhealthy teams, leading to incorrect status reporting. [6.0.15] `(PR #892) <https://github.com/apple/foundationdb/pull/892>`_
* Fixed a variety of problems related to changing between different region configurations. [6.0.15] `(PR #892) <https://github.com/apple/foundationdb/pull/892>`_
* fdbcli protects against configuration changes which could cause irreversible damage to a cluster. [6.0.15] `(PR #892) <https://github.com/apple/foundationdb/pull/892>`_
* Significantly reduced both client and server memory usage in clusters with large amounts of data and usable_regions=2. [6.0.15] `(PR #892) <https://github.com/apple/foundationdb/pull/892>`_

Status
------
Expand Down
61 changes: 54 additions & 7 deletions fdbcli/fdbcli.actor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1501,11 +1501,18 @@ ACTOR Future<Void> commitTransaction( Reference<ReadYourWritesTransaction> tr )

ACTOR Future<bool> configure( Database db, std::vector<StringRef> tokens, Reference<ClusterConnectionFile> ccf, LineNoise* linenoise, Future<Void> warn ) {
state ConfigurationResult::Type result;
state int startToken = 1;
state bool force = false;
if (tokens.size() < 2)
result = ConfigurationResult::NO_OPTIONS_PROVIDED;
else {
if(tokens[startToken] == LiteralStringRef("FORCE")) {
force = true;
startToken = 2;
}

state Optional<ConfigureAutoResult> conf;
if( tokens[1] == LiteralStringRef("auto") ) {
if( tokens[startToken] == LiteralStringRef("auto") ) {
StatusObject s = wait( makeInterruptable(StatusClient::statusFetcher( ccf )) );
if(warn.isValid())
warn.cancel();
Expand Down Expand Up @@ -1565,7 +1572,7 @@ ACTOR Future<bool> configure( Database db, std::vector<StringRef> tokens, Refere
}
}

ConfigurationResult::Type r = wait( makeInterruptable( changeConfig( db, std::vector<StringRef>(tokens.begin()+1,tokens.end()), conf) ) );
ConfigurationResult::Type r = wait( makeInterruptable( changeConfig( db, std::vector<StringRef>(tokens.begin()+startToken,tokens.end()), conf, force) ) );
result = r;
}

Expand All @@ -1577,7 +1584,7 @@ ACTOR Future<bool> configure( Database db, std::vector<StringRef> tokens, Refere
case ConfigurationResult::CONFLICTING_OPTIONS:
case ConfigurationResult::UNKNOWN_OPTION:
case ConfigurationResult::INCOMPLETE_CONFIGURATION:
printUsage(tokens[0]);
printUsage(LiteralStringRef("configure"));
ret=true;
break;
case ConfigurationResult::INVALID_CONFIGURATION:
Expand All @@ -1592,6 +1599,26 @@ ACTOR Future<bool> configure( Database db, std::vector<StringRef> tokens, Refere
printf("Database created\n");
ret=false;
break;
case ConfigurationResult::DATABASE_UNAVAILABLE:
printf("ERROR: The database is unavailable\n");
printf("Type `configure FORCE <TOKEN>*' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::STORAGE_IN_UNKNOWN_DCID:
printf("ERROR: All storage servers must be in one of the known regions\n");
printf("Type `configure FORCE <TOKEN>*' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::REGION_NOT_FULLY_REPLICATED:
printf("ERROR: When usable_regions=2, all regions with priority >= 0 must be fully replicated before changing the configuration\n");
printf("Type `configure FORCE <TOKEN>*' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::MULTIPLE_ACTIVE_REGIONS:
printf("ERROR: When changing from usable_regions=1 to usable_regions=2, only one region can have priority >= 0\n");
printf("Type `configure FORCE <TOKEN>*' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::SUCCESS:
printf("Configuration changed\n");
ret=false;
Expand All @@ -1603,7 +1630,7 @@ ACTOR Future<bool> configure( Database db, std::vector<StringRef> tokens, Refere
return ret;
}

ACTOR Future<bool> fileConfigure(Database db, std::string filePath, bool isNewDatabase) {
ACTOR Future<bool> fileConfigure(Database db, std::string filePath, bool isNewDatabase, bool force) {
std::string contents(readFileBytes(filePath, 100000));
json_spirit::mValue config;
if(!json_spirit::read_string( contents, config )) {
Expand Down Expand Up @@ -1643,7 +1670,7 @@ ACTOR Future<bool> fileConfigure(Database db, std::string filePath, bool isNewDa
return true;
}
}
ConfigurationResult::Type result = wait( makeInterruptable( changeConfig(db, configString) ) );
ConfigurationResult::Type result = wait( makeInterruptable( changeConfig(db, configString, force) ) );
// Real errors get thrown from makeInterruptable and printed by the catch block in cli(), but
// there are various results specific to changeConfig() that we need to report:
bool ret;
Expand Down Expand Up @@ -1676,6 +1703,26 @@ ACTOR Future<bool> fileConfigure(Database db, std::string filePath, bool isNewDa
printf("Database created\n");
ret=false;
break;
case ConfigurationResult::DATABASE_UNAVAILABLE:
printf("ERROR: The database is unavailable\n");
printf("Type `fileconfigure FORCE <FILENAME>' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::STORAGE_IN_UNKNOWN_DCID:
printf("ERROR: All storage servers must be in one of the known regions\n");
printf("Type `fileconfigure FORCE <FILENAME>' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::REGION_NOT_FULLY_REPLICATED:
printf("ERROR: When usable_regions=2, all regions with priority >= 0 must be fully replicated before changing the configuration\n");
printf("Type `fileconfigure FORCE <FILENAME>' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::MULTIPLE_ACTIVE_REGIONS:
printf("ERROR: When changing from usable_regions=1 to usable_regions=2, only one region can have priority >= 0\n");
printf("Type `fileconfigure FORCE <FILENAME>' to configure without this check\n");
ret=false;
break;
case ConfigurationResult::SUCCESS:
printf("Configuration changed\n");
ret=false;
Expand Down Expand Up @@ -2550,8 +2597,8 @@ ACTOR Future<int> cli(CLIOptions opt, LineNoise* plinenoise) {
}

if (tokencmp(tokens[0], "fileconfigure")) {
if (tokens.size() == 2 || (tokens.size() == 3 && tokens[1] == LiteralStringRef("new"))) {
bool err = wait( fileConfigure( db, tokens.back().toString(), tokens.size() == 3 ) );
if (tokens.size() == 2 || (tokens.size() == 3 && (tokens[1] == LiteralStringRef("new") || tokens[1] == LiteralStringRef("FORCE")) )) {
bool err = wait( fileConfigure( db, tokens.back().toString(), tokens[1] == LiteralStringRef("new"), tokens[1] == LiteralStringRef("FORCE") ) );
if (err) is_error = true;
} else {
printUsage(tokens[0]);
Expand Down
5 changes: 4 additions & 1 deletion fdbclient/DatabaseConfiguration.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,11 @@ void parseReplicationPolicy(IRepPolicyRef* policy, ValueRef const& v) {
void parse( std::vector<RegionInfo>* regions, ValueRef const& v ) {
try {
StatusObject statusObj = BinaryReader::fromStringRef<StatusObject>(v, IncludeVersion());
StatusArray regionArray = statusObj["regions"].get_array();
regions->clear();
if(statusObj["regions"].type() != json_spirit::array_type) {
return;
}
StatusArray regionArray = statusObj["regions"].get_array();
for (StatusObjectReader dc : regionArray) {
RegionInfo info;
json_spirit::mArray datacenters;
Expand Down
17 changes: 7 additions & 10 deletions fdbclient/DatabaseContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,22 +32,19 @@
#include "EventTypes.actor.h"
#include "fdbrpc/ContinuousSample.h"

class LocationInfo : public MultiInterface<StorageServerInterface> {
class StorageServerInfo : public ReferencedInterface<StorageServerInterface> {
public:
static Reference<LocationInfo> getInterface( DatabaseContext *cx, std::vector<StorageServerInterface> const& alternatives, LocalityData const& clientLocality );
static Reference<StorageServerInfo> getInterface( DatabaseContext *cx, StorageServerInterface const& interf, LocalityData const& locality );
void notifyContextDestroyed();

virtual ~LocationInfo();

virtual ~StorageServerInfo();
private:
DatabaseContext *cx;
LocationInfo( DatabaseContext* cx, vector<StorageServerInterface> const& shards, LocalityData const& clientLocality ) : cx(cx), MultiInterface( shards, clientLocality ) {}
StorageServerInfo( DatabaseContext *cx, StorageServerInterface const& interf, LocalityData const& locality ) : cx(cx), ReferencedInterface<StorageServerInterface>(interf, locality) {}
};

class ProxyInfo : public MultiInterface<MasterProxyInterface> {
public:
ProxyInfo( vector<MasterProxyInterface> const& proxies, LocalityData const& clientLocality ) : MultiInterface( proxies, clientLocality, ALWAYS_FRESH ) {}
};
typedef MultiInterface<ReferencedInterface<StorageServerInterface>> LocationInfo;
typedef MultiInterface<MasterProxyInterface> ProxyInfo;

class DatabaseContext : public ReferenceCounted<DatabaseContext>, NonCopyable {
public:
Expand Down Expand Up @@ -125,7 +122,7 @@ class DatabaseContext : public ReferenceCounted<DatabaseContext>, NonCopyable {
int locationCacheSize;
CoalescedKeyRangeMap< Reference<LocationInfo> > locationCache;

std::map< std::vector<UID>, LocationInfo* > ssid_locationInfo;
std::map< UID, StorageServerInfo* > server_interf;

// for logging/debugging (relic of multi-db support)
Standalone<StringRef> dbName;
Expand Down
5 changes: 5 additions & 0 deletions fdbclient/FDBTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,11 @@ static std::string describe( const int item ) {
return format("%d", item);
}

template <class T>
static std::string describe( Reference<T> const& item ) {
return item->toString();
}

template <class T>
static std::string describe( T const& item ) {
return item.toString();
Expand Down
3 changes: 2 additions & 1 deletion fdbclient/Knobs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ ClientKnobs::ClientKnobs(bool randomize) {
init( DEFAULT_MAX_BACKOFF, 1.0 );
init( BACKOFF_GROWTH_RATE, 2.0 );
init( RESOURCE_CONSTRAINED_MAX_BACKOFF, 30.0 );
init( PROXY_COMMIT_OVERHEAD_BYTES, 23 ); //The size of serializing 7 tags (3 primary, 3 remote, 1 log router) + 2 for the tag length

init( TRANSACTION_SIZE_LIMIT, 1e7 );
init( KEY_SIZE_LIMIT, 1e4 );
Expand All @@ -61,7 +62,7 @@ ClientKnobs::ClientKnobs(bool randomize) {
init( MAX_BATCH_SIZE, 20 ); if( randomize && BUGGIFY ) MAX_BATCH_SIZE = 1; // Note that SERVER_KNOBS->START_TRANSACTION_MAX_BUDGET_SIZE is set to match this value
init( GRV_BATCH_TIMEOUT, 0.005 ); if( randomize && BUGGIFY ) GRV_BATCH_TIMEOUT = 0.1;

init( LOCATION_CACHE_EVICTION_SIZE, 100000 );
init( LOCATION_CACHE_EVICTION_SIZE, 300000 );
init( LOCATION_CACHE_EVICTION_SIZE_SIM, 10 ); if( randomize && BUGGIFY ) LOCATION_CACHE_EVICTION_SIZE_SIM = 3;

init( GET_RANGE_SHARD_LIMIT, 2 );
Expand Down
1 change: 1 addition & 0 deletions fdbclient/Knobs.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ class ClientKnobs : public Knobs {
double DEFAULT_MAX_BACKOFF;
double BACKOFF_GROWTH_RATE;
double RESOURCE_CONSTRAINED_MAX_BACKOFF;
int PROXY_COMMIT_OVERHEAD_BYTES;

int64_t TRANSACTION_SIZE_LIMIT;
int64_t KEY_SIZE_LIMIT;
Expand Down
Loading

0 comments on commit 4bfb05f

Please sign in to comment.