Skip to content

Conversation

@sbodagala
Copy link
Contributor

@sbodagala sbodagala commented Aug 28, 2024

Address an issue related to computing the maximum possible size of a bitmap (that was resulting in an assertion failure): the bitmap size should equal the number of log servers over all logSets, shouldn't be dependent on log servers that are available during recovery. A side effect of #11593, we probably should have caught this while reviewing that PR.

Testing:

With version vector and unicast enabled: 20240828-145101-sre-046abae5a60d2a81 (has failures that we will need to look into).

With version vector disabled: 20240828-145548-sre-657bc8eec0513986 (shows a failure in "FuzzApiCorrectnessClean.toml" test but it is unlikely that was caused by this change).

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

a bitmap (that was resulting in an assertion failure)
@sbodagala sbodagala requested review from dlambrig and jzhou77 August 28, 2024 14:56
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 06de541
  • Duration 0:07:18
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 06de541
  • Duration 0:11:33
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: 06de541
  • Duration 0:21:44
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Contributor

@dlambrig dlambrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add please a note in the PR header describing the failure, e.g "bitmask size should equal the number of tLogs over all logsets, we miscalculated the size", and the failed simulation test?

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 06de541
  • Duration 0:52:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 06de541
  • Duration 0:55:13
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

jzhou77
jzhou77 previously approved these changes Aug 28, 2024
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 06de541
  • Duration 3:00:30
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 06de541
  • Duration 3:00:35
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@sbodagala
Copy link
Contributor Author

Updated the PR header.

and the failed simulation test?

"FuzzApiCorrectnessClean.toml" (mentioned in the header).

@sbodagala
Copy link
Contributor Author

we should assert if id > bsSize in populateBitset

Good point, done. Please note that the corruption/serialization issue that we have been looking into will likely cause this assertion to fail.

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: e08aaa7
  • Duration 0:06:56
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: e08aaa7
  • Duration 0:12:16
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: e08aaa7
  • Duration 0:21:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@dlambrig dlambrig self-requested a review August 28, 2024 20:28
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: e08aaa7
  • Duration 0:51:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: e08aaa7
  • Duration 0:56:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: e08aaa7
  • Duration 3:00:35
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: e08aaa7
  • Duration 3:00:35
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@dlambrig dlambrig requested a review from jzhou77 August 29, 2024 02:25
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: e08aaa7
  • Duration 3:00:46
  • Result: ❌ FAILED
  • Error: Build has timed out.
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

sbodagala added a commit to sbodagala/foundationdb that referenced this pull request Aug 29, 2024
jzhou77 pushed a commit that referenced this pull request Aug 30, 2024
gm42 pushed a commit to gm42/foundationdb that referenced this pull request Sep 19, 2024
MarkSh1 added a commit to owtech/foundationdb that referenced this pull request May 19, 2025
* Fix calculation of empty messages for vv case (apple#11224) (apple#11227)

Co-authored-by: Dan Lambright <hlambright@apple.com>

* Parse version_stamp and print if necessary

* [Release-7.3] Cherrypick Consistency Check Urgent (apple#11228)

* Consistency Check Urgent (Cherrypick from Release-7.1) (apple#11217)

* cherry-pick-distributed-consistency-checker

* code cleanup

* refactor code, decouple consistencyCheckerUrgent and consistency checker

* fix workload for consistencycheckurgent

* add new consistencycheckurgent role type

* fix CI

* address comments

* fix-consistencycheckurgent-large-read (apple#11229)

* Update release-notes-730.rst

* Update release-notes-730.rst

* Update release-notes-730.rst

* IGNORE vv upgrade tests

* Respond to review comments

* Disable compaction compaction for newly added shard. (apple#11238)

* Disable compaction compaction for newly added shard.

* block cache usage

* Fix documentation build that breaks CI (apple#11253)

* Fixing setting perpetual_storage_wiggle_engine is considered as wrongly configured (apple#11252)

* Added max range deletions before flush knob and some knob changes

* disable AVX for 7.3.34 release

* enable AVX and update version for 7.3.35 release

* Fix a DR corruption bug (apple#11246)

The clear in status call can cause corruptions.

To reproduce with clang build:
seed: -f ./tests/slow/ApiCorrectnessSwitchover.toml -s 223736449 -b on
commit: f8eca6a
Apply the ./tests/slow/ApiCorrectnessSwitchover.toml file in this commit.

* update version after 7.3.35 release

* cherry-pick-pr-11209

* Remove assertion equality in tcpvmap size on resolver return.

* Add TraceEvent when FDBServer receives the client sampling information

* Add knob for the TraceEvent that reports client latency txns

* align script with main branch

* Add rocksdb direct_io knobs.

* disable AVX for 7.3.36 release

* enable AVX and update version for 7.3.37 release

* Added perpetualStorageWiggleSpeed check to pick perpetualStoreType (apple#11274)

* update version after 7.3.37 release

* Suppress ChosenMachine to fix simulation error (apple#11277)

* cherry pick pr 11268 - Fix detection of private mutations in version vector (apple#11279)

Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com>

* [Release-7.3] Cherry-pick Enable Accumulative Checksum in MutationRef (apple#11225) (apple#11281)

* Enable Accumulative Checksum in MutationRef (apple#11225)

* code clean up and add accumulative checksum bits to mutation ref

* address comments and fix issues

* address comments

* propagate acs index from commit proxy to storage server

* address comments

* address comments

* address comments

* address comments

* remove unsafe code in SS

* make setAccumulativeChecksumIndex safer

* clean up code

* Add 7.1.32 - 7.1.37 release notes (apple#11286)

* add acs mutation support (apple#11289)

* cherry-pick-dcc-doc (apple#11290)

* add tenanting.

* fix tenantMapKey.

* update fdbkubernetesmonitor to match main, and  add fdb-kubernetes-monitor to the standard build flow

* Revert "Added perpetualStorageWiggleSpeed check to pick perpetualStoreType (#…" (apple#11310)

This reverts commit f5684e6.

* Rocksdb caching knob options. (apple#11312)

* Rocksdb metrics in status json (apple#11320)

* Don't remove team when total team count is within threshold (apple#11295)

* Sharded RocksDB knob changes. (apple#11291)

* async io

* fix dcc assert false

* update go bindings build to play nice with golang 1.22

* disable AVX for 7.3.38 release

* enable AVX and update version for 7.3.39 release

* Disable physical shard move test on 7.3 (apple#11338)

* update version after 7.3.39 release

* versoin upgrade

* disable AVX for 7.3.40 release

* enable AVX and update version for 7.3.41 release

* Increase CommitProxyTerminated severity for failed_to_progress errors. (apple#11315)

For better visibility.

* update version after 7.3.41 release

* Cherry pick 11128 to 7.3 ensure synthetic data is written to existing shards (apple#11324)

Co-authored-by: Dan Lambright <hlambright@apple.com>
Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com>

* Replace the wrong usage of g_simulator in the snapshot code (apple#10984) (apple#11341)

Co-authored-by: Chaoguang Lin <chaoguang.lin@snowflake.com>

* Hex decode add and remove prefixes for fdbrestore.

* Review feedback.

* Fix tab.

* Fix file headers.

* Improve distributed consistency checker (apple#11346) (apple#11349)

* ConsistencyCheckerUrgent repeated run

* address comments

* avoid trace SevError for TesterRecruitmentTimeout unless it keeps failure for over 1 day

* address comments

* address comments

* Throw errors in getConsistentReadVersion

In the current code, errors are retried in getConsistentReadVersion, so it's
possible that the client has cancelled the GRV request, but readVersionBatcher
continue retrying, which can lead to many clients DDoS GRV proxies, especially
when the database has become unavailable for a while and clients are issuing
many GRV requests.

* Adjusting block cache size knob. (apple#11357)

* Add documentation about experimental features

* Add logging to LogRouter's waitForVersion function (apple#11359)

* Add logging to waitForVersion

* Respond to review comments.

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>

* Make it so can run the ycsb container standalone without need of a k8s
context (useful for running ycsb against bare metal cluster).

* packaging/docker/Dockerfile
* packaging/docker/Dockerfile.eks
 Make the ycsb target inherit from foundationdb-base so we pick up
 libfdb_c.so. Add in a version of run_ycsb.sh that doesn't presume
 k8s. Use 'entrypoint' rather than 'cmd' so can override on
 'docker run'.

* packaging/docker/run_ycsb_standalone.sh
 Version of run_ycsb.sh w/o the presumption of k8s.

* fix ss queue rebalance (apple#11375) (apple#11378)

* [Release-7.3] Disable fdb_test.go:TestOpenNotExistTenant (apple#11381)

* Disable fdb_test.go:TestOpenNotExistTenant

This is not supported yet.

* fixup! Fix the comment

* Improvements (apple#11362)

* RocksDB memtable max range deletions knob update. (apple#11387)

* Disabling WRITE_CLIENT_LATENCY_TRACEEVENT knob (apple#11388)

* shard size log

* Adjust knob.

* Cherry pick PR 11385

* Fix an assertion failure when waiting for recovery

CC's checkBetterSingletons() calls getUsedIds() that asserts proxy interfaces
are present. However, when a GRV/commit proxy failed, before CC starts a new
recovery, the proxy's processId becomes empty, thus triggering the failure.

The fix is to cancel the caller while waiting for recovery.

To reproduce 7.1 commit 725a08a clang build:

./fdbserver.6.0.15 -r simulation -f ./tests/restarting/from_5.2.0_until_6.3.0/ClientTransactionProfilingCorrectness-1.txt -s 900000399 -b on
-f ./tests/restarting/from_5.2.0_until_6.3.0/ClientTransactionProfilingCorrectness-2.txt --restarting -s 900000400 -b on

* Respond to review comments

* Add 7.1.58 to 7.1.61 release notes

* Fix globalconfig refresh hang issue (apple#11400)

* Fix globalconfig refresh hang issue

CC sets a version to int_max in ClientDBInfo indicating a refresh, however,
proxy server would reject this version for the error of future_version.

This change fixes this issue by not sending int_max, instead maintaining a
lastKnown in memory and send it to grvproxy to get latest globalconfig.

this change also fixes some java tests that were used to test the fix

* Resolve comments

* Fix a segfault when tlog encounters platform_error

During destruction, rejoinClusterController actor should be cancelled to avoid
accessing TLogData object.

* disable AVX for 7.3.42 release

* enable AVX and update version for 7.3.43 release

* update version after 7.3.43 release

* RKUpdate metrics changes. (apple#11413)

* Added 7.3.38 to 7.3.43 release notes (apple#11417)

* fix API_VERSION. (apple#11301)

* Excluding some sharded rocksdb tests in simulation (apple#11436)

* Correct the path where the fdb-kubernetes-monitor copies the binary into when running in sidecar mode

* Revert variable renaming

* disable AVX for 7.3.44 release

* enable AVX and update version for 7.3.45 release

* update version after 7.3.45 release

* Add exponential backoff mechanism for restarting fdbserver processes in the monitor (apple#11453)

* disable AVX for 7.3.46 release

* enable AVX and update version for 7.3.47 release

* update version after 7.3.47 release

* [RELEASE-7.3] Back port fdbkubernetesmonitor feature (apple#11456)

* Remove some steps are only needed once

* fdbkubernetesmonitor: Update the pod annotation if the cluster file changes

* Initial add of custom prometheus metrics

* Add support in the fdb-kubernetes-monitor to read node labels

* Restructure approach to be compatible with operator setup

* Correct name of the flag

* Add unit tests for PodClient

* only setup node informer if node watch feature is enabeld

* Add additional tests for PodClient

* Update format

* Improve checkall tool (apple#11440) (apple#11476) (apple#11477)

* improve checkall

* fmt

* simplify

* nit

* simplify

* nit

* Add a mechanism to allow to specifiy the command line flags over env variables (apple#11462)

* add TLS ability to fdb kubernetes monitor

* review fixes

* move certloader to internal folder

* fdbkubernetesmonitor: Retry update pod annotations in case of an error (apple#11458)

* Add support for the isolate process group annotation to shutdown fdbserver processes (apple#11464)

* Add support for the isolate process group annotation to shutdown fdbserver processes

* fdb-kubernetes-monitor: Ensure that the annotations are updated when the correct configuration is already loaded

* [Release-7.3] Cherrypick Distributed Consistency Checker Updates (apple#11496)

* add consistency-check-urgent-mode to tester process class (apple#11484)

* update dcc doc (apple#11495)

* patch image for building after EOL

* Release notes for 7.3.20 thru 7.3.23 [release-7.3] (apple#11502)

* Add release notes for 7.3.20 thru 7.3.23

* Update documentation/sphinx/source/release-notes/release-notes-730.rst

Co-authored-by: Giuseppe <16498973+gm42@users.noreply.github.com>

* fixup!

* fixup!

* fixup!

* fixup!

* Update documentation/sphinx/source/release-notes/release-notes-730.rst

Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>

* Update documentation/sphinx/source/release-notes/release-notes-730.rst

Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>

* Update documentation/sphinx/source/release-notes/release-notes-730.rst

Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>

* fixup!

---------

Co-authored-by: Giuseppe <16498973+gm42@users.noreply.github.com>
Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>
Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com>

* Add flag for custom jemalloc build

* fixup!

* Compute known committed version correctly when version vector unicast is enabled (apple#11511) (apple#11520)

* - Compute known committed version correctly when version vector unicast
is enabled.

* - Set ProxyCommitData::minKnownCommittedVersion only if the commit
version is above ProxyCommitData::committedVersion.

* Add client metric test(release-7.3) (apple#11513)

* [release-7.3] Wait for TSS during finishMoveShards. (apple#11485)

* Fix wait

* Fix wait

* Add PeerAddress to all PeerAddr/Peer TraceEvent [release-7.3] (apple#11521)

* Add PeerAddress to all PeerAddr/Peer TraceEvent

This is to address apple#4846

* fixup!

* Decorate TLS handshake errors with peerAddr (apple#10090)

* Use connection debug ID in N2_AcceptHandshakeError

* Decorate TLS handshake errors with peerIP

* only write one value to ostream

* Add PeerAddress to all PeerAddr/Peer TraceEvent

This is to address apple#4846

---------

Co-authored-by: Sam Gwydir <sam.gwydir@snowflake.com>

* Cherry pick Add dynamic knob to disable gray failure recoveries. (apple#11509) (apple#11526)

Co-authored-by: Dan Lambright <hlambright@apple.com>

* [RELEASE-7.3] Change the default base image to RockyLinux9 (apple#11549)

* Change the default base image to RockyLinux9

* Update the sidecar install commands

* Update the python verson used for the sidecar

* Disable a few blob tests (apple#11548)

They can run a long time and cause ExternalTimeout errors.

* Add more trace events for exclude command

Use PRIORITY_SYSTEM_IMMEDIATE for excludeServersAndLocalities() call.

* Fix a cluster controller crash in status server

The logSystemConfig could be changed after the yield(), thus the iterator can
points to invalid memory.

To reproduce with gcc build at commit 3e90644:
   -f tests/slow/DDBalanceAndRemoveStatus.toml -s 3134372275 -b off

* disable AVX for 7.3.48 release

* enable AVX and update version for 7.3.49 release

* Add 7.3.44-7.3.49 release notes (apple#11583)

* cherry-pick rocksdb external timeout fix (apple#11585)

* cherrypick fix-storage-engine-selection (apple#11587)

* update version after 7.3.49 release

* Cherry-pick PR apple#11565 (Recovery version computation when version vector unicast is enabled) (apple#11590)

* Recovery version computation when version vector unicast is enabled (apple#11565)

* - Recovery version computation when version vector unicast is enabled

* - Address a review comment

* - Modify code to not use "max(DV)" as the starting recovery version

* - Remove references to "max(DV)"

* - Address a review comment

* - Address PR review comments

* - Address a review comment

* - Code formatting

* cluster level RV when tLogs advance at different rates (apple#11557) (apple#11599)

* simulate more than one tlog

* Draft use cluster RV for tlogs in version vector

* add TestTLogRecovery2

* Respond to review comments

* Add assert

* Send clusterRV to all locked tlogs

* Fix typo on rebase

* add memory managed IdToInterf structure

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>

* check existence of optional interface (apple#11593) (apple#11605)

Co-authored-by: Dan Lambright <hlambright@apple.com>

* Stash correct id for tlog idenfication in recovery (apple#11604) (apple#11608)

Co-authored-by: Dan Lambright <hlambright@apple.com>

* Fix overflow (apple#11610)

uint64_t should be used instead of int

* Address an assertion failure (apple#11617) (apple#11620)

* refactor to use struct rather than tuple for "uncommitedVersions" (apple#11619) (apple#11626)

* refactor to use struct rather than tuple for uncommitedVersions

* Document UnknownCommittedVersions struct

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>

* Convert ASSERT to error logs. (apple#11622)

* - Address a bug related to computing the ids of log servers (apple#11623) (apple#11631)

* Retry with dryrun in the presence of s3 token error(release-7.3) (apple#11602)

* Retry with dryrun in the presence of s3 token error

s3 token is from local disk and might be expired or invalid,
before this change backup retries to upload data to s3 indefinitely,
thus it is a waste of network bandwidth.

Now retry with a get request of list all buckets in the case of
s3 token error, and only retry the upload when token error disappears.

* Finish testing, set default to false

* Check bucket exist or not, rather than listBucket

* address comments

* [Release-7.3] Fix finishMoveShards partially update location metadata (apple#11640)

* fix complete flag in finishMoveShards

* remove complete flag in finishMoveShards

* remove complete flag in cleanUpDataMoveCore

* Disable physical shard move in 7.3 (apple#11641)

* Enabling rocksdb direct_io and wiggle knobs (apple#11636)

* [Release-7.3] Fix invalid shard id generated by seed shard servers (apple#11647)

* add dataMoveType check when decoding serverKeys metadata

* fix invalid dataMoveId generator in seedShardServers

* fix simulation tests

* add split point and add exclude sharded rocksdb

* add split point in restarting tests at release-7.3.51

* remove unnecessary exclusion of redrood in 7.3 restarting tests

* Reduce retry count when SS interface is not found

To speed up ConsistencyCheckUrgent workload that can take very long time in
simulation.

* change split point to 7.3.50 (apple#11658)

* Drop upgrade tests before 6.3

This is not supported or recommended.

* Remove storage engine type from DataLossRecovery test. (apple#11656)

Co-authored-by: He Liu <heliu@Hes-MacBook-Pro-2.local>

* jzhou77

Co-authored-by: Zhe Wang <zhe.wang@wustl.edu>

* jzhou77

* [Release-7.3] TeamRedundant and TeamUnhealthy data moves choose best destination with probability (apple#11668)

* team redundant and unhealthy data moves can choose best dest with probability

* nits

* nits

* enable wantTrueBestIfMoveout

* fix getteam stuck

* [Release-7.3] Delay team remover when space pivot is low (apple#11665)

* add-usable-region-check-per-shard-for-encode-shard-location-metadata (apple#11671)

* reduce workload of MutationLogReaderCorrectness (apple#11679)

* jzhou77

* [Release-7.3] Validate ServerTeam count per server in simulation (apple#11678)

* validate server team count in simulation

* change naming (not relevant to the PR title)

* address comments and add a new trace event BuildTeamsLastBuildTeamsFailed triggered when buildTeam failed

* Increasing minimum age to wiggle to avoid re-wiggling migrated rocksdb storage servers (apple#11684)

* disable AVX for 7.3.50 release

* enable AVX and update version for 7.3.51 release

* update version after 7.3.51 release

* Add 7.3.50, 7.3.51 release notes (apple#11691)

* - Address a serialization issue that was causing a log server crash (apple#11696) (apple#11697)

during a patch upgrade

* [gray-failure] Remove CC_PAUSE_HEALTH_MONITOR (apple#11701)

* - Address another log server interface incompatibility issue (between (apple#11705)

patch releases 7.3.43 and 7.3.53)

* Adjust sharded rocksdb knobs (apple#11706)

* Use a single iterator pool for all physical shards (apple#11694)

* disable AVX for 7.3.52 release

* enable AVX and update version for 7.3.53 release

* update version after 7.3.53 release

* Add 7.3.52 and 7.3.53 release notes (apple#11711)

Add 7.3.52 and 7.3.53 release notes

Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com>

* Ignore data move conflict on TSS in simulation. (apple#11715)

Co-authored-by: He Liu <heliu@Hes-MacBook-Pro-2.local>

* [release-7.3] Log all incoming connections (apple#11713)

* Log all incoming connections

* Address review comments

* Update FlowTransport.actor.cpp

* Update FlowTransport.actor.cpp

* Refactor

* Format

* initialize for simulation

* - Remove an assertion (that validates prevVersion in the proxy to
log server commit message) that is causing version incompatibility
between 7.3.43 and 7.3.53 patch releases. The assertion can fail,
even on a valid commit message, when the proxy and the log server
are running on different patch release versions.

* Urgent consistency checker fixes (cherrypick 7.3) (apple#11736)

* [fdbserver] Drop duplicate or conflicted requests from urgent consistency checker clients

* Fix edge case in urgent consistency check causing infinite loop

* fixup! Fix edge case in urgent consistency check causing infinite loop

self review

* disable AVX for 7.3.54 release

* enable AVX and update version for 7.3.55 release

* update version after 7.3.55 release

* Add 7.3.54 and 7.3.55 release notes (apple#11740) (apple#11744)

* - Add 7.3.54 and 7.3.55 release notes

* - Address a review comment

* [release-7.3] Enable backward read in consistency checker.

* Do backward reads in consistency checker.

* Add knob for read options in consistency checker.

* Max range deletions knob update to prevent OOMs. (apple#11739)

* [RELEASE-7.3] Add debug id trace event 7.3

* Fix backup dryrun bug

* Fix backup dryrun bug

Currently there is a out-of-scope issue, this change also adds
a knob to control whether to allow dryrun of backup

* fix another bug that misses a wait statement

---------

Co-authored-by: Hao <fdbflowguru@gmail.com>

* Add knobs and metrics for bloom filter. (apple#11785)

* [Release-7.3] TLS should accept same key with different values (apple#11763)

* fix tls

* address comment

* Histogram sample rate updated to 1. (apple#11794)

* Manual flush if the rocksdb flush does not happen within a time interval. (apple#11792)

* disable AVX for 7.3.56 release

* enable AVX and update version for 7.3.57 release

* update version after 7.3.57 release

* Add 7.3.56, 7.3.57 release notes(release-7.3) (apple#11796)

* Add 7.3.56, 7.3.57 release notes

* Address comments

---------

Co-authored-by: Hao <fdbflowguru@gmail.com>

* Fix RPM build error of length restriction

```
/usr/local/share/cmake-3.26/Modules/Internal/CPack/CPackRPM.cmake(666):  message(FATAL_ERROR CPackRPM: source dir path '/codebuild/output/src1356049000/src/foundationdb' is  shorter than debuginfo sources dir path '/usr/src/debug/foundationdb-7.3.56-Linux/clients-el7/src_0'!  Source dir path must be longer than debuginfo sources dir path.  Set CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX variable to a shorter value  or make source dir path longer.  Required for debuginfo packaging. See documentation of  CPACK_RPM_DEBUGINFO_PACKAGE variable for details. )
CMake Error at /usr/local/share/cmake-3.26/Modules/Internal/CPack/CPackRPM.cmake:666 (message):
  CPackRPM: source dir path
  '/codebuild/output/src1356049000/src/foundationdb' is shorter than
  debuginfo sources dir path
  '/usr/src/debug/foundationdb-7.3.56-Linux/clients-el7/src_0'! Source dir
  path must be longer than debuginfo sources dir path.  Set
  CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX variable to a shorter value or make
  source dir path longer.  Required for debuginfo packaging.  See
  documentation of CPACK_RPM_DEBUGINFO_PACKAGE variable for details.
Call Stack (most recent call first):
  /usr/local/share/cmake-3.26/Modules/Internal/CPack/CPackRPM.cmake:1466 (cpack_rpm_debugsymbol_check)
  /usr/local/share/cmake-3.26/Modules/Internal/CPack/CPackRPM.cmake:1968 (cpack_rpm_generate_package)
CPack Error: Error while execution CPackRPM.cmake
```

* Fix ZLIB not found error

```
CMake Error at /usr/local/share/cmake-3.31/Modules/FindOpenSSL.cmake:186 (set_property):
  The link interface of target "OpenSSL::Crypto" contains:

    ZLIB::ZLIB

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /usr/local/share/cmake-3.31/Modules/FindOpenSSL.cmake:733 (_OpenSSL_target_add_dependencies)
  cmake/FDBComponents.cmake:35 (find_package)
  CMakeLists.txt:72 (include)
```

* Only delay the restart of fdbserver if the process exited with an exit code other than 0 (apple#11812)

* [release-7.3] Pause perpetual storage wiggle when TSS count target is met. (apple#11824)

* TSS pause

* Add condition

* [RELEASE-7.3] Refactor locality-based exclusion checks to reduce additional overhead (apple#11848)

* Refactor locality-based exclusion checks to reduce additional overhead

* Update exclusion logic to prevent copies

* Backport "Invalidate gray failure complaints from excluded processes" to 7.3 (apple#11749) (apple#11850)

* Invalidate gray failure complaints from excluded processes (apple#11749)

* Remove INTRA_DC_LATENCY knob logic in CC test

* Lower bound version of CC_DEGRADED_PEER_DEGREE_TO_EXCLUDE (apple#11840) (apple#11852)

* Fix boost URL (apple#11863)

* Extend gray failure recentHealthTriggeredRecoveryTime state to reflect any recovery, including non-gray failure triggered ones

* Update knob documentation

* Add log

* Add knob for direct IO

* Explicitly type 1e3 as int (apple#11894)

cherrypick apple#11695

* Add custom compaction policy based on number of range deletions in file

* compaction policy

* fix build error

* Rocksdb manual flush code changes

* disable AVX for 7.3.58 release

* enable AVX and update version for 7.3.59 release

* update version after 7.3.59 release

* Improve AuditLocationMetadataPostCheck coverage (apple#11888) (apple#11897)

* improve-auditLocationMetadataPostCheck-coverage

* address comments

* nit

* Release notes for 7.3.58 and 7.3.59

* Release notes for 7.3.58 and 7.3.59

* Address feedback

* [release-7.3] Fix cycle test valgrind issue (apple#11907)

* Pause store wiggle if all SS does not have minimum available space. (apple#11911)

* Fix startMoveShards() caused corruption (apple#11934)

At commit: fff5439 with clang, seed -f ./tests/slow/SharedDefaultBackupCorrectness.toml -s 2189316179 -b on
We found a corruption where the destination storage server can get the incorrect
serverKeys mutations. Note this only happens when shard_encode_location_metadata is enabled.

The reason is that one of the actors in the previous iteration encountered
transaction_too_old error, and the transaction restarted. However, because the
actors are not cancelled, these can still modify the next transaction that
retried.

* [release-7.3] Update shared rocksdb knobs. apple#11936

* Migration to consider wiggling based on perpetualStorageEngine and then on configureStorageEngine. apple#11940

* Add 7.3.60, 7.3.61 release notes (apple#11942)

Documentation changes only.

* disable AVX for 7.3.60 release

* enable AVX and update version for 7.3.61 release

* update version after 7.3.61 release

* revert backup change since 7.3.49 (apple#11965)

* disable AVX for 7.3.62 release

* enable AVX and update version for 7.3.63 release

---------

Co-authored-by: Dan Lambright <dlambrig@gmail.com>
Co-authored-by: Dan Lambright <hlambright@apple.com>
Co-authored-by: hao fu <hfu5@apple.com>
Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com>
Co-authored-by: Zhe Wang <zhe.wang@wustl.edu>
Co-authored-by: Yao Xiao <87789492+yao-xiao-github@users.noreply.github.com>
Co-authored-by: Yao Xiao <yxiao6@apple.com>
Co-authored-by: neethuhaneesha <nbingi@apple.com>
Co-authored-by: FoundationDB CI <foundationdb_ci@apple.com>
Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>
Co-authored-by: Xiaoge Su <magichp@gmail.com>
Co-authored-by: Aaron Molitor <amolitor@apple.com>
Co-authored-by: Hao Fu <77984096+hfu94@users.noreply.github.com>
Co-authored-by: Matthew Newhook <matthew@customer.io>
Co-authored-by: Matthew Newhook <matthew.newhook@gmail.com>
Co-authored-by: Chaoguang Lin <chaoguang.lin@snowflake.com>
Co-authored-by: Pierre Zemb <contact@pierrezemb.fr>
Co-authored-by: stack <stack@apache.org>
Co-authored-by: Johannes M. Scheuermann <jscheuermann@apple.com>
Co-authored-by: Johannes Scheuermann <johscheuer@users.noreply.github.com>
Co-authored-by: Nicole Morales <nicole_morales@apple.com>
Co-authored-by: Giuseppe <16498973+gm42@users.noreply.github.com>
Co-authored-by: Sreenath Bodagala <82616783+sbodagala@users.noreply.github.com>
Co-authored-by: Sam Gwydir <sam.gwydir@snowflake.com>
Co-authored-by: He Liu <86634338+liquid-helium@users.noreply.github.com>
Co-authored-by: He Liu <heliu@Hes-MacBook-Pro-2.local>
Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>
Co-authored-by: Vishesh Yadav <vishesh3y@gmail.com>
Co-authored-by: Sreenath Bodagala <sbodagala@apple.com>
Co-authored-by: flowguru <77984096+flowguru@users.noreply.github.com>
Co-authored-by: Hao <fdbflowguru@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants