Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential macOS CI setup issue #1273

Open
keith opened this issue Oct 29, 2021 · 20 comments
Open

Potential macOS CI setup issue #1273

keith opened this issue Oct 29, 2021 · 20 comments
Assignees

Comments

@keith
Copy link
Member

keith commented Oct 29, 2021

rules_apple tests started failing without any changes there between

this successful build https://buildkite.com/bazel/rules-apple-darwin/builds/4659 on 10/20
and this failing build https://buildkite.com/bazel/rules-apple-darwin/builds/4664 on 10/22

This type of test failure:

		The test runner encountered an error (Failed to establish communication with the test runner. If you believe this error represents a bug, please attach the result bundle at /Users/buildkite/Library/Developer/Xcode/DerivedData/temporary-hgmhctgbjdblljdswqfgutoosvlw/Logs/Test/Test-Transient Testing-2021.10.29_17-57-24-+0000.xcresult. (Underlying Error: Couldn’t communicate with a helper application. Try your operation again. If that fails, quit and relaunch the application and try again. The connection to service on pid 0 named com.apple.testmanagerd.control was invalidated.))

Is often related to the machine setup, but it's hard to say specifically what would be the issue here if the Xcode -runFirstLaunch command was run after any changes.

@thii
Copy link
Member

thii commented Nov 16, 2021

@philwo Can you take a look?

@philwo
Copy link
Member

philwo commented Nov 24, 2021

Hi @keith!

I checked the machines again, I found a little issue (three of them were missing the $HOME/Library/Developer directory, which caused issues in the past) and fixed it, but the test still fails with the same issue.

The machines were upgraded to macOS Big Sur and Xcode 13.0 during that window and then I regenerated the buildkite user's home directory from scratch.. it's hard to say what that might have caused :/

Do you have any idea how to debug this? Does the test work fine on a similarly setup developer machine and only fail on Bazel CI?

@thii
Copy link
Member

thii commented Dec 10, 2021

@philwo The test works locally for me with the same Xcode version.

The error looks pretty similar to this one: https://stackoverflow.com/a/67699037/2780476.

You can reproduce it by running the ButtonsMacLogicTests which was disabled in rules_apple: https://github.com/bazelbuild/rules_apple/pull/1264/files

@keith
Copy link
Member Author

keith commented Dec 10, 2021

If it is that this case is potentially not auto-logging in the user? We do that on our CI and then immediately lock the screen

@thii
Copy link
Member

thii commented Mar 3, 2022

@meteorcloudy Pinging you since looks like you are the owner here now. This issue has been blocking us from using BazelCI for Tulsi for a while now.

This is an infra issue, that started happening after an Xcode upgrade, so it would be nice if it could be revisited in the next Xcode upgrade.

@meteorcloudy
Copy link
Member

OK, I'll look into this. Can you rebase bazelbuild/rules_apple#1264 and check if the error persist? I reran the CI presubmit job, it's now failing with a different error.

@meteorcloudy meteorcloudy self-assigned this Mar 3, 2022
@thii
Copy link
Member

thii commented Mar 3, 2022

Rebased!

@meteorcloudy
Copy link
Member

OK, looks like it's still an issue.

If it is that this case is potentially not auto-logging in the user? We do that on our CI and then immediately lock the screen

@keith How did you do that on your CI? Maybe we can check if we can/should do the same on Bazel CI.
/cc @fweikert

@philwo
Copy link
Member

philwo commented Mar 3, 2022

We are auto-logging in a user after boot (it wouldn't hurt to verify that this still works, of course), but the user who is logged in (ci) is not the same user that the Buildkite jobs run as (buildkite). Maybe that's the issue?

@keith
Copy link
Member Author

keith commented Mar 4, 2022

worth a try

@meteorcloudy
Copy link
Member

@philwo Are you going to try it? Or can you point the script that does the auto-logging in?

@philwo
Copy link
Member

philwo commented Mar 4, 2022

I don't think there's anything we can try here. It's not possible to login as the buildkite user, because the entire user is wiped and all of its processes are killed after each job. :/

The auto-login is just a system setting that can be modified via System Preferences in "Users & Groups" and then "Login Options".

You can also try to do it on the CLI via this third-party tool (https://github.com/xfreebird/kcpassword), but it hasn't been updated in 7 years, I'm not sure if it still works:

sudo -H -u ci brew install xfreebird/utils/kcpassword
enable_autologin "$USERNAME" "$PASSWORD"

@meteorcloudy
Copy link
Member

Oh, I see. We are running the CI jobs as a one-time disposable user, so it must be different from the user for auto-logging in.

@meteorcloudy
Copy link
Member

I guess there is little we can do except upgrading Xcode and hope this issue can go away.

@keith
Copy link
Member Author

keith commented Mar 23, 2022

I rebased the linked PR and do still see the issue. I also found this related post https://stackoverflow.com/questions/67688130/run-macos-test-cases-on-the-jenkins-pipeline but I doubt the buildkite setup has this same issue. It does imply logins might be the issue.

If you're hoping Xcode updates fix things, 13.3 is available now, if you could update the machines that would be great in general

@meteorcloudy
Copy link
Member

If you're hoping Xcode updates fix things, 13.3 is available now, if you could update the machines that would be great in general

We need some time to prepare the upgrade as Xcode 13.3 removed python2 support, some things are going to break..

@keith
Copy link
Member Author

keith commented Mar 24, 2022

If the machines are already on 12.X you don't have to update to 12.3 and drop python2 for Xcode 13.3, but if they are still on 11.x you might have to go straight to the newest (unless you're using MDM and can block that point release)

@meteorcloudy
Copy link
Member

Thanks for the advice, I just checked, we are still on macOS 11.6.1, so it could be a bit tricky.

@keith
Copy link
Member Author

keith commented May 22, 2022

@meteorcloudy any plans to address this soon?

@meteorcloudy
Copy link
Member

Sorry, this is still on our list, but we currently don't have capacity to look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants