Skip to content

Comments

redisTestHook,memcachedTestHook: init; prevent hanging Darwin build after test failures#357879

Merged
mweinelt merged 11 commits intoNixOS:stagingfrom
ofalvai:darwin-redis-test-timeout
Mar 19, 2025
Merged

redisTestHook,memcachedTestHook: init; prevent hanging Darwin build after test failures#357879
mweinelt merged 11 commits intoNixOS:stagingfrom
ofalvai:darwin-redis-test-timeout

Conversation

@ofalvai
Copy link
Contributor

@ofalvai ofalvai commented Nov 21, 2024

Looking at Hydra's "timed out jobs" list, I noticed many Darwin timeouts where the last log lines were all about Redis. It turns out the background Redis job is not terminated correctly on macOS when checkPhase fails (thus postCheck never runs).

I reviewed all Redis test usage in pkgs and added the same workaround, as well as connection retry where it was missing.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Nov 21, 2024
@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Nov 22, 2024
@ofborg ofborg bot requested a review from MrMebelMan November 22, 2024 08:26
@ofborg ofborg bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Nov 22, 2024
@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from 269df0a to 1444c42 Compare November 30, 2024 19:17
@github-actions github-actions bot added 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 2501-5000 This PR causes many rebuilds on Linux and should target the staging branches. and removed 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Nov 30, 2024
@ofalvai ofalvai changed the base branch from master to staging November 30, 2024 19:31
@ofalvai ofalvai marked this pull request as ready for review November 30, 2024 20:49
@ofalvai ofalvai requested a review from a team November 30, 2024 20:50
@afh
Copy link
Member

afh commented Dec 1, 2024

It seems this PR adds almost the same code to the preCheck phase of various packages. Would it be possible and feasible to refactor that code into a script that can be re-used by these packages?

@ofborg ofborg bot added 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. and removed 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. labels Dec 1, 2024
@ofalvai
Copy link
Contributor Author

ofalvai commented Dec 1, 2024

I was thinking the same, but I'm not familiar enough with nixpkgs to know how to approach this. Do you have a specific idea in mind? Maybe a hook?

@ofalvai ofalvai changed the title treewide: prevent hanging Darwin build after test failures various: prevent hanging Darwin build after test failures Dec 5, 2024
@toonn
Copy link
Contributor

toonn commented Dec 28, 2024

I asked in the NixOS dev room and emily suggested a hook indeed. And in addition to mark the failing packages as broken or badPlatforms.

@mweinelt
Copy link
Member

Create a hook to integrate redis into the check phase.

Check out pkgs/by-name/po/postgresqlTestHook which has a similar use case.

@toonn
Copy link
Contributor

toonn commented Dec 28, 2024

Emily also mentioned the following:

I'm also not sure that MAX_RETRIES stuff is a good idea since Hydra builders are often under heavy load and test timeouts usually just break things unnecessarily

I sort of suspect fixing the underlying issue is just a matter of putting a nohup in front of the server call

I'm not sure nohup would prevent the process being assigned to launchd but worth looking into I suppose.

@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from 1444c42 to 52fcb42 Compare January 23, 2025 20:27
@github-actions github-actions bot added 8.has: documentation This PR adds or changes documentation and removed 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. labels Jan 23, 2025
@ofalvai
Copy link
Contributor Author

ofalvai commented Feb 4, 2025

Thank you for the review @mweinelt. The easier ones are fixed now. Next, I'll figure out the unix domain socket thing and then resolve the merge conflict.

@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch 2 times, most recently from b9c82ff to 6d78b3f Compare February 10, 2025 18:26
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Feb 10, 2025
@ofalvai
Copy link
Contributor Author

ofalvai commented Feb 10, 2025

@mweinelt Redis socket support is now ready. However, Memcached works with either a network connection or a socket, it doesn't accept connections on both: https://docs.memcached.org/serverguide/configuring/#unix-sockets

Do you think it would still be useful to add a socket-only mode to memcachedTestHook?

@github-actions github-actions bot added the 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. label Feb 10, 2025
@mweinelt
Copy link
Member

We can probably revisit UDS later.

Copy link
Member

@mweinelt mweinelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just some minor things left.

Comment on lines 10 to 11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cli = "${redis}/bin/redis-cli";
server = "${redis}/bin/redis-server";
cli = lib.getExe' redis "redis-cli";
server = lib.getExe' redis "redis-server";

Comment on lines 12 to 13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
memcached = "${lib.getExe memcached}";
nc = "${lib.getExe netcat}";
memcached = lib.getExe memcached;
nc = lib.getExe netcat;

Comment on lines +24 to +28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis-cli seems to be a bit noisy printing connection errors, while waiting for redis-server to start

django-cacheops> starting redis
django-cacheops> waiting for redis to be ready
django-cacheops> Could not connect to Redis at /build/run/redis.sock: No such file or directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a better idea than 2>/dev/null? I'm afraid it would make eventual debugging a nightmare. Maybe we should sleep 1 first once?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check whether the socket exists (test -s "$REDIS_SOCKET") or the listener has bound to the port (nc -z localhost "$redisTestPort") before interrogating it with redis-cli. 🤔

@mweinelt
Copy link
Member

And please check if you have removed unused redis-server arguments from these packages. Found one in rq for example.

@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from 6d78b3f to b9f900e Compare March 12, 2025 17:29
@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch 2 times, most recently from 398c99d to 718cf5d Compare March 12, 2025 18:53
@mweinelt
Copy link
Member

Eval failure in python3.pkgs.rq.

@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from 718cf5d to 44ba8fd Compare March 18, 2025 19:36
@github-actions github-actions bot added 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. and removed 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. labels Mar 18, 2025
@mweinelt mweinelt merged commit 3252fa2 into NixOS:staging Mar 19, 2025
23 checks passed
@mweinelt
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: darwin Running or building packages on Darwin 6.topic: python Python is a high-level, general-purpose programming language. 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 2501-5000 This PR causes many rebuilds on Linux and should target the staging branches.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants