Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-7.1] Fix tester stale interface issue at CC #11149

Merged
merged 1 commit into from Jan 30, 2024

Conversation

kakaiu
Copy link
Member

@kakaiu kakaiu commented Jan 26, 2024

workerAvailabilityWatch is an actor spawned when a new worker is registered to CC. However, the tester (ProcessClass::TesterClass) is ignored in workerAvailabilityWatch. For the removed tester, the corresponding worker has registered to CC but no workerAvailabilityWatch is monitoring the availability of the tester. Therefore, CC does not get notified when the tester has been physically removed and CC still keeps the worker interface. In clusterGetStatus(), CC still contacts to the removed tester but failed. Then, the removed worker interface is added to mergeUnreachable. Finally, "unreachable_processes" message is produced in the status json.

500K Correctness test:
20240126-235133-zhewang-f5c68b34cb706909 compressed=True data_size=24309527 duration=21796697 ended=500000 fail=1 fail_fast=10 max_runs=500000 pass=499999 priority=100 remaining=0 runtime=2:05:59 sanity=False started=500000 stopped=20240127-015732 submitted=20240126-235133 timeout=5400 username=zhewang

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@kakaiu kakaiu changed the title Fix tester stale interface issue at CC [Release-7.1] Fix tester stale interface issue at CC Jan 26, 2024
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: d2e26d9
  • Duration 0:08:00
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: d2e26d9
  • Duration 0:08:10
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Contributor

@liquid-helium liquid-helium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: d2e26d9
  • Duration 0:15:58
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: d2e26d9
  • Duration 0:19:13
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: d2e26d9
  • Duration 0:40:35
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jzhou77 jzhou77 merged commit 05ab6da into apple:release-7.1 Jan 30, 2024
1 of 5 checks passed
Copy link
Contributor

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Can we make sure those changes are included in 7.3 + main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants