RocksDB 7.10.2 version upgrade #9829

neethuhaneesha · 2023-03-28T22:33:36Z

RocksDB 7.10.2 version upgrade.
cherrypick of #9828

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

The PR has a description, explaining both the problem and the solution.
The description mentions which forms of testing were done and the testing seems reasonable.
Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

foundationdb-ci · 2023-03-28T22:49:54Z

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: 88208c3
Duration 0:16:05
Result: ❌ FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-28T22:53:32Z

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 88208c3
Duration 0:19:46
Result: ❌ FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

fdb-windows-ci · 2023-03-28T23:06:57Z

Doxense CI Report for Windows 10

Commit ID: 88208c3
Result: ✔️ SUCCEEDED
Build Logs (available for 30 days)

foundationdb-ci · 2023-03-28T23:19:55Z

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 88208c3
Duration 0:46:09
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-28T23:22:38Z

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 88208c3
Duration 0:48:51
Result: ❌ FAILED
Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-29T01:09:29Z

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 88208c3
Duration 2:35:47
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci · 2023-03-30T19:26:28Z

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: 00f4281
Duration 0:16:54
Result: ❌ FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

fdb-windows-ci · 2023-03-30T19:28:24Z

Doxense CI Report for Windows 10

Commit ID: 00f4281
Result: ✔️ SUCCEEDED
Build Logs (available for 30 days)

foundationdb-ci · 2023-03-30T19:28:56Z

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 00f4281
Duration 0:19:23
Result: ❌ FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-30T19:50:59Z

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 00f4281
Duration 0:41:24
Result: ❌ FAILED
Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-30T20:22:44Z

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 00f4281
Duration 1:13:11
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-30T21:45:14Z

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 00f4281
Duration 2:35:40
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

fdb-windows-ci · 2023-03-30T22:03:03Z

Doxense CI Report for Windows 10

Commit ID: 4a3af02
Result: ✔️ SUCCEEDED
Build Logs (available for 30 days)

foundationdb-ci · 2023-03-30T22:06:28Z

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 4a3af02
Duration 0:34:28
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-30T22:11:51Z

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 4a3af02
Duration 0:39:48
Result: ❌ FAILED
Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-31T00:07:34Z

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 4a3af02
Duration 2:35:33
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

fdb-windows-ci · 2023-03-31T16:17:10Z

Doxense CI Report for Windows 10

Commit ID: 1c5ee77
Result: ✔️ SUCCEEDED
Build Logs (available for 30 days)

foundationdb-ci · 2023-03-31T16:21:23Z

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 1c5ee77
Duration 0:58:05
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-31T16:39:51Z

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 1c5ee77
Duration 1:16:30
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-03-31T18:00:32Z

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 1c5ee77
Duration 2:37:09
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci · 2023-03-31T22:23:00Z

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: None
Duration 0:08:19
Result: ❌ FAILED
Error: reference not found for primary source and source version pr/9829
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-04-03T17:13:42Z

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 10d9025
Duration 0:19:53
Result: ❌ FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

fdb-windows-ci · 2023-04-03T17:36:40Z

Doxense CI Report for Windows 10

Commit ID: 10d9025
Result: ✔️ SUCCEEDED
Build Logs (available for 30 days)

foundationdb-ci · 2023-04-03T17:43:15Z

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 10d9025
Duration 0:49:25
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-04-03T17:50:57Z

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 10d9025
Duration 0:57:06
Result: ❌ FAILED
Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-04-03T17:54:36Z

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 10d9025
Duration 1:00:47
Result: ❌ FAILED
Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci · 2023-04-03T18:21:42Z

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 3097840
Duration 0:20:47
Result: ❌ FAILED
Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-04-03T18:41:03Z

Result of foundationdb-pr-clang on Linux CentOS 7

Commit ID: 3097840
Duration 0:40:07
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-04-03T19:17:53Z

Result of foundationdb-pr on Linux CentOS 7

Commit ID: 3097840
Duration 1:16:53
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

fdb-windows-ci · 2023-04-03T19:24:49Z

Doxense CI Report for Windows 10

Commit ID: 3097840
Result: ✔️ SUCCEEDED
Build Logs (available for 30 days)

saintstack

LGTM

foundationdb-ci · 2023-04-03T20:37:44Z

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

Commit ID: 3097840
Duration 2:36:48
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

* Fix transaction_too_old error when version vector is enabled When VV is enabled, the comparison of storage server version and read version should use the original read version, otherwise, the client may get the wrong transaction_too_old error. * Fix assertions w.r.t. VV * Avoid using oldest version as read version for VV * Disable a debugging trace event * Cherry pick 8630 * Address review comments * enable AVX and update version for 7.1.25 release * skip proxy when fetching kubectl * add generated.go * update version after 7.1.25 release * Add changes to generated.go from PR8761, and remove change to ConfigureCompiler.cmake * Update generated.go * Update generated.go with 8761 * Rocksdb stats level knob. (apple#8713) * Adding counters for singlekey clear requests (apple#8792) * add bytelimit for prefetch This is a patch to release-7.1 after resolving conflicts from commit in main branch, in order to enable byteLimit in release-7.1 A fraction of byteLimit will be used as the limit to fetch index. For the indexes fetched, fetch records for them in batch. byteLimit always count the index size, it also count record if exist, it at least return 1 index-record entry and always include the last entry despite that adding the last entry despite it might exceed limit. There is a Knob STRICTLY_ENFORCE_BYTE_LIMIT, when it is set, records will be discarded once the byteLimit is hit, despite they are fetched. Otherwise, return the whole batch. * debug seg fault * Revert "debug seg fault" This reverts commit fadcb08. * [release-7.1] Add SS read range bytes metrics. (apple#8697) (apple#8724) * Add SS read range bytes metrics. (apple#8697) * Fix build failure * clang-fmt * fmt * Rocksdb suggest compact range checks * RocksDB 7.7.3 version upgrade * Fix backup worker assertion failure The number of released bytes exceeds the number of acquired bytes in locks. This is because the bytes counted towards release is calculated after a "wait", when more bytes could be allocated. * Increase buggified lock bytes for backup workers To fix simulation failures where the knob value is too small. * Send error when LogRouterPeekPopped happens Otherwise, the remote tlog won't get a response and the parallel peek requests will never be cleared, blocking subsequent peeks. As a result, remote tlog will no longer be able to pop the log router, which in turn can no longer peek tlogs. The whole remote side will become blocked. * Add more debug events * Add DebugTrace.h to 7.1 branch Cherry-pick PR#8856 requires DebugTrace.h due to the use of DebugLogTraceEvent function * Fix the bug of variable int32 overflow. 1.the content length from http response transformed using 'atoi' would rise int32 overflow. 2.the offset's aligning would rise int32 overflow. * Fix -Wformat warning * Add determinstic in gray failure degraded server selection * format source code after switch to clang 15 * Fix clang 15 compiling errors * Fix gcc 11 compiling errors * Fix more warnings * Moving rocksdb read iterator destruction from commit path to actor. (apple#8971) * Release 7.1: Cherry pick pull request apple#9033 (apple#9037) * Merge pull request apple#9033 from sbodagala/main * - Code formatting Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com> * Fix:Exclusion stuck because DD cannot build new teams Bug behavior: When DD has zero healthy machine teams but more unhealthy machine teams than the max machine teams DD plans to build, DD will stop building new machine teams. Due to zero healthy machine team (and zero healthy server team), DD cannot find a healthy destination team to relocate data. When data relocation stops, exclusion stops progressing and stuck. Bug happens when we *shrink* a k-host cluster by first adding k/2 new host; then quickly excluding all old hosts. Fix: Let DD build temporary extra teams to relocate data. The extra teams will be cleaned up later by DD's remove extra teams logic. Simulation test: There is no simulation test to cover cluster expansion scnenario. To most closely simulate this behavior, we intentionally overbuild all possible machine teams to trigger the condition that unhealthy teams is larger than the maximum teams DD wants to build later. * Resolve review comment: No functional change * Add back samples for (non)empty peeks stats [release-7.1] (apple#9074) * Add back samples for (non)empty peeks stats These were lost, likely due to refactoring. Now TLogMetrics have meaningful data like: TLogMetrics ID=59ec9c67b4d07433 Elapsed=5 BytesInput=0 -1 17048 BytesDurable=47.4 225.405 17048 BlockingPeeks=0 -1 0 BlockingPeekTimeouts=0 -1 0 EmptyPeeks=1.6 2.79237 236 NonEmptyPeeks=0 -1 32 ... * Use LATENCY_SAMPLE_SIZE * fix health monitor last logged time * Backport RocksDB cmake file to 7.1 (apple#9093) * Fix the RocksDB compile issue with clang By default, RocksDB is using its own compile/link flags, no matter how FDB flags are. This led to the issue that if FDB decides to use clang/ldd/libc++, RocksDB will pick up the compiler/linker but still use libstdc++, which is incompatible to libc++, causing Symobl Missing error during the link stage. With this patch, if FDB uses libc++, then the information is stored in CMAKE_CXX_FLAGS and being forwarded to RocksDB. RocksDB will then use libc++ and compatible with FDB. * fixup! Fix the clang error in bindings/c * add some rocksdb compile options that can be passed in at build time * Disconnection to satellite TLog should trigger recovery in gray failure detection * Upgrade sphinx and document test harness and code probes * Apply suggestions from code review Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com> Co-authored-by: Bharadwaj V.R <bharadwaj.vr@snowflake.com> * clarify how code probes are reported * clarify statistics of TestHarness * Bump setuptools from 65.3.0 to 65.5.1 in /documentation/sphinx Bumps [setuptools](https://github.com/pypa/setuptools) from 65.3.0 to 65.5.1. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/CHANGES.rst) - [Commits](pypa/setuptools@v65.3.0...v65.5.1) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Add event for txn server initialization and a warning for TLog slow catching up * Change TLog pull async data warning timeout * Adding rocksDB control compaction on deletion knobs. (apple#9165) * Add 7.1.26, 7.1.27 release notes (apple#9186) * Added metrics for read range operations. * Log PingLatency when there is no ping latency samples, but ping attempts * Changing histogram type. (apple#9227) * Release 7.1: Cherry pick pull request apple#9225 (apple#9252) * - Do not add fdbserver processes to the client list. (apple#9225) Note: Server processes started getting reported as clients since 7.1.0 (not sure if this change in behavior was intentional or not), and this breaks the operator upgrade logic. * - Address a compilation error * - Update release-notes. * - Address a review comment/CI failure. * - Address CI, related to release notes, failure. * disable AVX for 7.1.26 release * enable AVX and update version for 7.1.27 release * update version after 7.1.27 release * Increase buggified lock bytes for backup workers to at least 256 MB. We are still encountered simulation failures where the backup worker is waiting on the lock and an assertion fails. * Reduce logging level for verbose events From one of nightly failure due to too many log lines, these are top 3: 60100 FastRestoreLoaderDispatchRequests 79655 FastRestoreGetVersionSize 93888 FastRestoreSplitMutation * Fix typo in fdb.options * update bindings/go/src/fdb/generated.go * Fix getMappedRange metrics(release-7.1) (apple#9331) * Fix getMappedRange metrics Metrics related to getMappedRange API are counted twice, having a set of new metrics specifically for getMappedRange solves the issue. * Fix clang init order issue * Enable rocksdb in simulation in 7.1. Exclude FuzzApi and HighContention tests temporarily for rocksdb. (apple#9374) * Fix IDE build and warnings * Rocksdb knob changes. (apple#9393) * Fix compiler warnings * Add exclude to fdbcli's configure command Right now this only allows one server address being excluded. This is useful when the database is unavailable but we want the recruitment to skip some particular processes. Manually tested the concept works with a loopback cluster. * Allow a comma separated list of excluded addresses * Add ClogTlog workload * Update clogTlog workload to be single region * Exclude failed tlog if recovery stuck more than 30s Because the tlog is clogged, recovery can stuck in initializing_transaction_servers. This exclude allows the recovery to complete. * Change to only clog once for a particular tlog If we repeat clogging, different tlogs may be excluded, which can cause the recovery to stuck. * Move ClogTlog.toml to rare * Fix rare test failures Unclog after DB is recovered, otherwise another recovery may become stuck again. * Address review comments * Allow fdbdecode to read filters from a file * Fix filter delimiter and print sub versions * Use KeyRangeMap for better matching performance * fdbdecode: read backup range files * add filtering * Allow fdbdecode to read filters from a file * Fix filter delimiter and print sub versions * Use KeyRangeMap for better matching performance * Disable filter validate by default * Use RangeMap for backup agent filtering This is more efficient than going through ranges one by one. * Refactor code * Allow fdbbackup, fdbrestore to read keyranges from a file * Use the RangeMapFilters * add command line option * Clang-format * Fix -t flag bug for fdbdecode (apple#9489) * Fix fdbbackup query returning earliest version * Query backup size from a specific snapshot * clean format * Explicitly using min and max restorable version from backup description in query command in stead of going throw snapshots * fix clang build error * Add more comments in fdbbackup query command, and address comments * Change PTreeImpl::insert to overwrite existing entries (apple#9138) * Change PTreeImpl::insert to overwrite existing entries Maintaining partial persistence of course. We can theoretically also avoid creating a new node if the insert version of the node comparing equal to `x` is the latestVersion. There isn't a generic way to tell from the ptree though since insertAt is a concept that only exists within VersionedMap. Either way, avoiding the `contains` call and the tree rotations is already a big improvement. The old node should only be reachable from old roots, and so it should get cleaned up as part of forgetVersions in the storage server. * Update fdbclient/include/fdbclient/VersionedMap.h * Avoid repeated search in VersionedMap::erase(iterator) (apple#9143) * Use KeyspaceSnapshotFile to filter range files * Change mutation and KV logging to SevInfo Set max length as well to avoid TraceEventOverflow. * Output in HEX format for easy regex matching * Refactor decoder to read file as a whole once To reduce the number of network requests. * Add more trace events * Allow log router to detect slow peeks and to switch DC for peeking [release-7.1] (apple#9640) * Add DcLag tests and workload * Add disableSimSpeedup to clog network longer * Ignore the DcLag test * Refactor LogRouter's pullAsyncData * Switch DC if log router peek becomes stuck Trying to a different DC if this happens. * Enable DcLag test * Require at least 2 regions and having satellites * Simplify DcLag code * Limit connection failures to be within tests In particular, disable connection failures when initializing the database during the startup phase, i.e., before running with test specs. * Revert disableSimSpeedup * Fix conflicts after cherrypick * More fixes after cherrypick * Refactor to address comments * Use a constant for connectionFailuresDisableDuration * Fix ClogTlog workload valgrind error * Address comments * Reduce running time for DcLag The switch can happen quicker than the workload detection time, so need to adjust detection time lower than LOG_ROUTER_PEEK_SWITCH_DC_TIME. * Fix issue where the versions on seed storage servers decreased Seed storage servers are recruited as the intial set of storage servers when a database is first created. They function a little bit differently than normal, and do not set an initial version like storages normally do when they get recruited (typically equal to the recovery version). Version correction is a feature where versions advance in sync with the clock, and are equal across FDB clusters. To allow different FDB clusters to have matching versions, they must share the same base version. This defaults to the Unix epoch, and clusters with the version epoch enabled will have a current version equal to the number of microseconds since the Unix epoch. When the version epoch is enabled on a cluster, it causes a one time jump from the clusters current version to the version based on the epoch. After a recovery, the recovery version sent to storages should have advanced by a significant amount. The recovery path contained a `BUGGIFY` to randomly advance the recovery version in simulation, testing the version epoch being enabled. However, it was also advancing the version during an initial recovery, when the seed storage servers are recruited. If a set of storage servers were recruited as seed servers, but another recovery occurred before the bootstrap process was complete, the randomly selected version increase could be smaller during the second recovery than during the first. This could cause the initial set of seed servers to think they should be at a version larger than what the cluuster was actually at. The fix contained in this commit is to only cause a random version jump when the recovery is occuring on an existing database, and not when it is recruiting seed storages. This commit fixes an issue found in simulation, reproducible with: Commit: 93dc4bf Test: fast/DataLossRecovery.toml Seed: 3101495991 Buggify: on Compiler: clang * Added 7.1.28 and 7.1.29 release notes * Reduce running time for ClogTlog When the ClogTlog is running, we may already pass the 450s, i.e., SIM_SPEEDUP_AFTER_SECONDS, and clogging is no longer effective. If that's the case, we want to finish the test quickly. * Remove profile code from SpecialKeySpace workload This part of code has problems with GlobalConfig and is buggy. * disable AVX for 7.1.28 release * enable AVX and update version for 7.1.29 release * update version after 7.1.29 release * Update info trigger new DB info update immediately * Backport exclusion fix apple#9468 (apple#9789) * Don't block the exclusion of stateless processes by the free capacity check * Fix syntax * Make use of precomputed exclude check * Format code * Only consider newly excluded processes * Format code and update comment * Fix finishedQueries metric, add metrics reporting in GetMappedRange test [release-7.1] (apple#9785) * Fix finishedQueries metric, add metrics reporting in GetMappedRange test * refactor to make format work * resolve comments * Fix more comments * Fix bugs and change running time of test * Adding rocksdb bloom filter knobs. (apple#9770) * [Release 7.1] Do not update exclude/failed system metadata in excludeServers if the input list is already excluded/failed (apple#9809) * Add a check in excludeServer function that if the exclusion list already exists, don't need to issue new writes. * Update documentation * Parameterized queue length in GetMappedRange test (apple#9808) Also retry when operation_cancelled happens * Add 7.1.30, 7.1.31 release notes (apple#9822) * Don't stop iterating over all storage processes in exclusion check (apple#9869) * checkSafeExclusion should always create new ExclusionSafetyCheckRequest (apple#9871) * RocksDB 7.10.2 version upgrade (apple#9829) * Changing single key deletions to delete based on number of deletes instead of bytelimit. * Implement check if locality is already excluded in exclude locality command (apple#9878) * Merge pull request apple#9814 from sbodagala/main (apple#9883) FdbServer not able to join cluster Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com> * Update 7.1.30 release notes * Remove printable() from TSS trace events * Fix releast notes * Fixed stuck data movement when a server is removed [release-7.1] (apple#9904) * Fixed stuck data movement when a server is removed When a server is removed, dataDistributionRelocator doesn't remove the work for the destination storage workers. As a result, it can no long move shard into any of the healthy workers in the destination team. * Avoid double complete the work * disable AVX for 7.1.30 release * enable AVX and update version for 7.1.31 release --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com> Co-authored-by: Dan Lambright <dlambrig@gmail.com> Co-authored-by: FoundationDB CI <foundationdb_ci@apple.com> Co-authored-by: neethuhaneesha <nbingi@apple.com> Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com> Co-authored-by: Hao Fu <77984096+hfu94@users.noreply.github.com> Co-authored-by: hao fu <hfu5@apple.com> Co-authored-by: Yao Xiao <87789492+yao-xiao-github@users.noreply.github.com> Co-authored-by: Meng Xu <meng_xu@apple.com> Co-authored-by: Huiyoung <bryant507@foxmail.com> Co-authored-by: sfc-gh-tclinkenbeard <trevor.clinkenbeard@snowflake.com> Co-authored-by: Zhe Wu <halfprice@users.noreply.github.com> Co-authored-by: Sreenath Bodagala <82616783+sbodagala@users.noreply.github.com> Co-authored-by: Meng Xu <42559636+xumengpanda@users.noreply.github.com> Co-authored-by: Xiaoge Su <magichp@gmail.com> Co-authored-by: Aaron Molitor <amolitor@apple.com> Co-authored-by: Markus Pilman <markus.pilman@snowflake.com> Co-authored-by: Bharadwaj V.R <bharadwaj.vr@snowflake.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dan Adkins <dan.adkins@snowflake.com> Co-authored-by: Vishesh Yadav <vishesh_yadav@apple.com> Co-authored-by: Andrew Noyes <andrew.noyes@snowflake.com> Co-authored-by: Lukas Joswiak <lukas.joswiak@snowflake.com> Co-authored-by: Johannes Scheuermann <johscheuer@users.noreply.github.com> Co-authored-by: Oleg Samarin <osamarin@openintegration.inc>

neethuhaneesha force-pushed the rocksupgrade-7.1 branch from 88208c3 to 00f4281 Compare March 30, 2023 19:09

neethuhaneesha force-pushed the rocksupgrade-7.1 branch from 00f4281 to 4a3af02 Compare March 30, 2023 21:31

neethuhaneesha force-pushed the rocksupgrade-7.1 branch from 4a3af02 to 1c5ee77 Compare March 31, 2023 15:23

neethuhaneesha requested review from liquid-helium and jzhou77 March 31, 2023 17:33

neethuhaneesha requested a review from yao-xiao-github March 31, 2023 22:44

liquid-helium previously approved these changes Mar 31, 2023

View reviewed changes

yao-xiao-github previously approved these changes Mar 31, 2023

View reviewed changes

neethuhaneesha force-pushed the rocksupgrade-7.1 branch from 1c5ee77 to 10d9025 Compare April 3, 2023 16:53

RocksDB 7.10.2 version upgrade

3097840

neethuhaneesha dismissed stale reviews from yao-xiao-github and liquid-helium via 3097840 April 3, 2023 18:00

neethuhaneesha force-pushed the rocksupgrade-7.1 branch from 10d9025 to 3097840 Compare April 3, 2023 18:00

neethuhaneesha requested review from liquid-helium, yao-xiao-github and saintstack April 3, 2023 19:30

liquid-helium approved these changes Apr 3, 2023

View reviewed changes

saintstack approved these changes Apr 3, 2023

View reviewed changes

jzhou77 merged commit 4808747 into apple:release-7.1 Apr 3, 2023
2 of 4 checks passed

neethuhaneesha deleted the rocksupgrade-7.1 branch April 3, 2023 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RocksDB 7.10.2 version upgrade #9829

RocksDB 7.10.2 version upgrade #9829

neethuhaneesha commented Mar 28, 2023

foundationdb-ci commented Mar 28, 2023

foundationdb-ci commented Mar 28, 2023

fdb-windows-ci commented Mar 28, 2023

foundationdb-ci commented Mar 28, 2023

foundationdb-ci commented Mar 28, 2023

foundationdb-ci commented Mar 29, 2023

foundationdb-ci commented Mar 30, 2023

fdb-windows-ci commented Mar 30, 2023

foundationdb-ci commented Mar 30, 2023

foundationdb-ci commented Mar 30, 2023

foundationdb-ci commented Mar 30, 2023

foundationdb-ci commented Mar 30, 2023

fdb-windows-ci commented Mar 30, 2023

foundationdb-ci commented Mar 30, 2023

foundationdb-ci commented Mar 30, 2023

foundationdb-ci commented Mar 31, 2023

fdb-windows-ci commented Mar 31, 2023

foundationdb-ci commented Mar 31, 2023

foundationdb-ci commented Mar 31, 2023

foundationdb-ci commented Mar 31, 2023

foundationdb-ci commented Mar 31, 2023

foundationdb-ci commented Apr 3, 2023

fdb-windows-ci commented Apr 3, 2023

foundationdb-ci commented Apr 3, 2023

foundationdb-ci commented Apr 3, 2023

foundationdb-ci commented Apr 3, 2023

foundationdb-ci commented Apr 3, 2023

foundationdb-ci commented Apr 3, 2023

foundationdb-ci commented Apr 3, 2023

fdb-windows-ci commented Apr 3, 2023

saintstack left a comment

foundationdb-ci commented Apr 3, 2023

RocksDB 7.10.2 version upgrade #9829

RocksDB 7.10.2 version upgrade #9829

Conversation

neethuhaneesha commented Mar 28, 2023

Code-Reviewer Section

For Release-Branches

foundationdb-ci commented Mar 28, 2023

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

foundationdb-ci commented Mar 28, 2023

Result of foundationdb-pr-macos on macOS Ventura 13.x

fdb-windows-ci commented Mar 28, 2023

Doxense CI Report for Windows 10

foundationdb-ci commented Mar 28, 2023

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented Mar 28, 2023

Result of foundationdb-pr on Linux CentOS 7

foundationdb-ci commented Mar 29, 2023

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

foundationdb-ci commented Mar 30, 2023

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

fdb-windows-ci commented Mar 30, 2023

Doxense CI Report for Windows 10

foundationdb-ci commented Mar 30, 2023

Result of foundationdb-pr-macos on macOS Ventura 13.x

foundationdb-ci commented Mar 30, 2023

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented Mar 30, 2023

Result of foundationdb-pr on Linux CentOS 7

foundationdb-ci commented Mar 30, 2023

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

fdb-windows-ci commented Mar 30, 2023

Doxense CI Report for Windows 10

foundationdb-ci commented Mar 30, 2023

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented Mar 30, 2023

Result of foundationdb-pr on Linux CentOS 7

foundationdb-ci commented Mar 31, 2023

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

fdb-windows-ci commented Mar 31, 2023

Doxense CI Report for Windows 10

foundationdb-ci commented Mar 31, 2023

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented Mar 31, 2023

Result of foundationdb-pr on Linux CentOS 7

foundationdb-ci commented Mar 31, 2023

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

foundationdb-ci commented Mar 31, 2023

Result of foundationdb-pr-macos on macOS Ventura 13.x

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr-macos on macOS Ventura 13.x

fdb-windows-ci commented Apr 3, 2023

Doxense CI Report for Windows 10

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr on Linux CentOS 7

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr-macos on macOS Ventura 13.x

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr-clang on Linux CentOS 7

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr on Linux CentOS 7

fdb-windows-ci commented Apr 3, 2023

Doxense CI Report for Windows 10

saintstack left a comment

Choose a reason for hiding this comment

foundationdb-ci commented Apr 3, 2023

Result of foundationdb-pr-cluster-tests on Linux CentOS 7