Skip to content

Forward-port to 7.4: Add DD init and team collection logging for diagnosing slow startups#13002

Merged
saintstack merged 5 commits intoapple:release-7.4from
saintstack:7.4_DD_init_logging
Apr 23, 2026
Merged

Forward-port to 7.4: Add DD init and team collection logging for diagnosing slow startups#13002
saintstack merged 5 commits intoapple:release-7.4from
saintstack:7.4_DD_init_logging

Conversation

@saintstack
Copy link
Copy Markdown
Contributor

When SHARD_ENCODE_LOCATION_METADATA=true we take new codepaths often
opaque. Add logging.

For example, DD init hung for 14-16 minutes with zero visibility into
what was stuck. The only clue was a gap between DDInitUpdatedReplicaKeys
and DDInitGotInitialDD trace events. Diagnosing the root cause required
extensive log splunking of SS metrics to determine that a single
getRange(dataMoveKeys) read was queued on an overloaded storage server.

DDTxnProcessor.actor.cpp:

  • Log elapsed time for the server list + data move read transaction
    (DDInitServerListAndDataMoveReadComplete) with NumDataMoves, NumServers
  • Log elapsed time for the keyServer scan (DDInitKeyServerScanComplete)
    with NumShards
  • Warn when getRange(dataMoveKeys) takes >5 seconds
    (DDInitSlowDataMoveRead)

DataDistribution.actor.cpp:

  • Add NumShards and NumServers to DDInitGotInitialDD
  • Promote DDInitFoundDataMove from SevDebug to SevInfo so individual
    data moves are visible in production logs
  • Add DDInitResumedDataMoves summary event with ValidMoves,
    CancelledMoves, EmptyMoves counts and elapsed time

DDTeamCollection.actor.cpp:

  • Add Reason and Address details to UndesiredStorageServer trace events
    to distinguish version lag, same-address, wrong-class, and exclusion
    causes without needing to correlate with other log lines

Forward-ported from 7.3

@saintstack saintstack requested a review from spraza as a code owner April 15, 2026 23:04
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 4834659
  • Duration 0:38:23
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 4834659
  • Duration 0:46:09
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 4834659
  • Duration 0:47:10
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 4834659
  • Duration 0:51:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 4834659
  • Duration 0:57:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 4834659
  • Duration 1:09:29
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack saintstack closed this Apr 16, 2026
@saintstack saintstack reopened this Apr 16, 2026
@gxglass gxglass self-requested a review April 16, 2026 00:42
gxglass
gxglass previously approved these changes Apr 16, 2026
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 4834659
  • Duration 0:39:04
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 4834659
  • Duration 0:48:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 4834659
  • Duration 0:48:21
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 4834659
  • Duration 0:55:33
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 4834659
  • Duration 0:59:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 4834659
  • Duration 2:28:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@saintstack
Copy link
Copy Markdown
Contributor Author

20260417-234837-stack_forward_port-6f0304ae6ff25c4b compressed=True data_size=41357776 duration=4976249 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:58 sanity=False started=100000 stopped=20260418-004835 submitted=20260417-234837 timeout=5400 username=stack_forward_port

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 0f321c0
  • Duration 0:46:44
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 0f321c0
  • Duration 0:49:52
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 0f321c0
  • Duration 1:11:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 0f321c0
  • Duration 2:18:58
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

spraza
spraza previously approved these changes Apr 20, 2026
@spraza
Copy link
Copy Markdown
Collaborator

spraza commented Apr 20, 2026

@saintstack assuming you ran 100K against 7.4 branch with this change?

@saintstack
Copy link
Copy Markdown
Contributor Author

@spraza Yes. This 20260417-234837-stack_forward_port-6f0304ae6ff25c4b compressed=True data_size=41357776 duration=4976249 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:58 sanity=False started=100000 stopped=20260418-004835 submitted=20260417-234837 timeout=5400 username=stack_forward_port was this PR, against 7.4 branch.

gxglass
gxglass previously approved these changes Apr 21, 2026
Optional<Reference<TransactionState>> trState) {
state Span span("NAPI:WaitStorageMetrics"_loc, generateSpanID(cx->transactionTracingSample));
state double startTime = now();
state double lastLogTime = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make any sense to you?

Image

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 0846c4f
  • Duration 0:41:33
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 0846c4f
  • Duration 0:44:57
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 0846c4f
  • Duration 0:50:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 04b64bc
  • Duration 0:46:52
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 1:02:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 2:24:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@saintstack
Copy link
Copy Markdown
Contributor Author

The following tests FAILED:
70 - java-integration-external-client (Failed)
Errors while running CTest

@saintstack saintstack closed this Apr 21, 2026
@saintstack saintstack reopened this Apr 21, 2026
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 04b64bc
  • Duration 0:47:34
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 0:49:53
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 1:01:37
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 2:41:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@saintstack
Copy link
Copy Markdown
Contributor Author

Reopening. I don't see a macOS build running.

@saintstack saintstack closed this Apr 22, 2026
@saintstack saintstack reopened this Apr 22, 2026
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 04b64bc
  • Duration 0:41:31
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 04b64bc
  • Duration 0:47:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 2:13:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 2:24:26
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 04b64bc
  • Duration 2:42:33
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 04b64bc
  • Duration 0:55:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack saintstack requested a review from spraza April 22, 2026 22:11
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 669e999
  • Duration 0:39:32
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 669e999
  • Duration 0:47:48
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 669e999
  • Duration 0:55:15
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 669e999
  • Duration 1:01:22
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 669e999
  • Duration 1:54:21
  • Result: ❌ FAILED
  • Error: Error while executing command: TEST_USERNAME=fdb-pr-${CODEBUILD_BUILD_NUMBER} make -C e2e foundationdb-pr-tests. Reason: exit status 2
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@saintstack
Copy link
Copy Markdown
Contributor Author

Says pending on the macOS build. I ran it manually:


...
<head></head>
SSH_CLIENT=100.80.133.23 37978 22
--
SSH_CONNECTION=100.80.133.23 37978 100.80.23.254 22
TERM=xterm-256color
.....
[Container] 2026/04/22 06:16:01.103666 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2026/04/22 06:16:01.103687 Phase context status code:  Message:
[Container] 2026/04/22 06:16:01.192815 Expanding base directory path: .
[Container] 2026/04/22 06:16:01.196679 Assembling file list
[Container] 2026/04/22 06:16:01.196692 Expanding .
[Container] 2026/04/22 06:16:36.194770 Set report auto-discover timeout to 5 seconds
[Container] 2026/04/22 06:16:36.195965 Expanding base directory path:  .
[Container] 2026/04/22 06:16:36.201953 Assembling file list
[Container] 2026/04/22 06:16:36.201967 Expanding .
[Container] 2026/04/22 06:16:36.205183 Expanding file paths for base directory .
[Container] 2026/04/22 06:16:36.205195 Assembling file list
[Container] 2026/04/22 06:16:36.205198 Expanding **/*
[Container] 2026/04/22 06:16:37.152008 Found 551 file(s)
[Container] 2026/04/22 06:16:37.152182 Report auto-discover file discovery took 0.957412 seconds
[Container] 2026/04/22 06:16:37.156270 Phase complete: UPLOAD_ARTIFACTS State: SUCCEEDED
[Container] 2026/04/22 06:16:37.156288 Phase context status code:  Message:

Now I'll add manually the success. Meantime will try to figure what's up with this macOS on release-7.4.

@saintstack saintstack merged commit 02be074 into apple:release-7.4 Apr 23, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants