Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unscd: init at 0.53 #124019

Merged
merged 2 commits into from Apr 18, 2022
Merged

unscd: init at 0.53 #124019

merged 2 commits into from Apr 18, 2022

Conversation

ghost
Copy link

@ghost ghost commented May 22, 2021

Motivation for this change

When using NixOS as a router in a DFZ use case, the glibc nscd causes severe performance issues for the whole system.

This includes a change to the nscd module, so that services.nscd.package = pkgs.unscd; can be used to replace the glibc nscd with unscd.

I have been running this on my systems for a few hours and haven't noticed any problems. It prints these warnings at startup though:

May 22 11:17:20 styx nscd[927]: /etc/nscd.conf:24: ignoring unknown service name 'netgroup'
May 22 11:17:20 styx nscd[927]: /etc/nscd.conf:31: ignoring unknown service name 'services'

Those calls might just be uncached when using unscd.

Detailed explanation from the source file:
nscd problems are not exactly unheard of. Over the years, there were
quite a bit of bugs in it. This leads people to invent babysitters
which restart crashed/hung nscd. This is ugly.

After looking at nscd source in glibc I arrived to the conclusion
that its design is contributing to this significantly. Even if nscd's
code is 100.00% perfect and bug-free, it can still suffer from bugs
in libraries it calls.

As designed, it's a multithreaded program which calls NSS libraries.
These libraries are not part of libc, they may be provided
by third-party projects (samba, ldap, you name it).

Thus nscd cannot be sure that libraries it calls do not have memory
or file descriptor leaks and other bugs.

Since nscd is multithreaded program with single shared cache,
any resource leak in any NSS library has cumulative effect.
Even if a NSS library leaks a file descriptor 0.01% of the time,
this will make nscd crash or hang after some time.

Of course bugs in NSS .so modules should be fixed, but meanwhile
I do want nscd which does not crash or lock up.

So I went ahead and wrote a replacement.

It is a single-threaded server process which offloads all NSS
lookups to worker children (not threads, but fully independent
processes). Cache hits are handled by parent. Only cache misses
start worker children. This design is immune against
resource leaks and hangs in NSS libraries.

It is also many times smaller.
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Added a release notes entry if the change is major or breaking
  • Fits CONTRIBUTING.md.

@r-rmcgibbo
Copy link

r-rmcgibbo commented May 22, 2021

Result of nixpkgs-review pr 124019 at c8578d23 run on aarch64-linux 1

1 package failed to build:
  • nixos-install-tools
1 package built successfully:
  • unscd

Note that build failures may predate this PR, and could be nondeterministic or hardware dependent.
Please exercise your independent judgement. Does something look off? Please file an issue or reach out on IRC.


Result of nixpkgs-review pr 124019 at c8578d23 run on x86_64-linux 1

1 package failed to build:
  • nixos-install-tools
2 packages built successfully:
  • tests.trivial
  • unscd
7 suggestions:
  • warning: missing-patch-comment

    Consider adding a comment explaining the purpose of this patch on the line preceeding.
    Near pkgs/os-specific/linux/unscd/default.nix:19:5:

       |
    19 |     (fetchpatch {
       |     ^
    
  • warning: missing-phase-hooks

    installPhase should probably contain runHook preInstall and runHook postInstall.

    Near pkgs/os-specific/linux/unscd/default.nix:48:3:

       |
    48 |   installPhase = ''
       |   ^
    
  • warning: missing-phase-hooks

    buildPhase should probably contain runHook preBuild and runHook postBuild.

    Near pkgs/os-specific/linux/unscd/default.nix:39:3:

       |
    39 |   buildPhase = ''
       |   ^
    
  • warning: missing-patch-comment

    Consider adding a comment explaining the purpose of this patch on the line preceeding.
    Near pkgs/os-specific/linux/unscd/default.nix:23:5:

       |
    23 |     (fetchpatch {
       |     ^
    
  • warning: missing-patch-comment

    Consider adding a comment explaining the purpose of this patch on the line preceeding.
    Near pkgs/os-specific/linux/unscd/default.nix:27:5:

       |
    27 |     (fetchpatch {
       |     ^
    
  • warning: missing-patch-comment

    Consider adding a comment explaining the purpose of this patch on the line preceeding.
    Near pkgs/os-specific/linux/unscd/default.nix:31:5:

       |
    31 |     (fetchpatch {
       |     ^
    
  • warning: missing-patch-comment

    Consider adding a comment explaining the purpose of this patch on the line preceeding.
    Near pkgs/os-specific/linux/unscd/default.nix:18:5:

       |
    18 |     ./remove-old-socket.diff
       |     ^
    

Note that build failures may predate this PR, and could be nondeterministic or hardware dependent.
Please exercise your independent judgement. Does something look off? Please file an issue or reach out on IRC.

@ghost ghost requested a review from flokli June 1, 2021 15:06
@flokli
Copy link
Contributor

flokli commented Jun 1, 2021

Hold that.

@petabyteboy if you're down for a rabbithole, if I set services.nscd.package = pkgs.unscd; in nixos/tests/google-oslogin/server.nix, the test fails. It seems unscd isn't entirely transparent - you're sure it also still picks up custom NSS modules?

@ghost
Copy link
Author

ghost commented Jun 1, 2021

I think it does pick up custom nss modules, but it currently doesn't proxy all the request types that glibc nscd supports, so that might be one thing. Anyways, I'll look into it, thanks!

@ghost
Copy link
Author

ghost commented Jun 2, 2021

An alternative solution may be using sssd for nss requests, and a seperate service like systemd-resolved for DNS requests. Fedora seems to do that and got rid of nscd entirely.

@ghost ghost closed this Jun 7, 2021
@ghost
Copy link
Author

ghost commented Jun 7, 2021

If unscd doesn't load external nss modules properly, there's no point in using it. I can just disable nscd entirely to have a similar result.

@ghost
Copy link
Author

ghost commented Jun 8, 2021

It does load nss modules, but it doesn't implement the GETFD* calls, which expose the internal cache as a file descriptor.

@flokli
Copy link
Contributor

flokli commented Oct 7, 2021

It does load nss modules, but it doesn't implement the GETFD* calls, which expose the internal cache as a file descriptor.

I poked inside unscd code base, https://github.com/bytedance/unscd/blob/master/nscd.c#L615-L616 lists some things as "won't do.

Can you elaborate a bit more on this? Is this why some nss lookups fall back to the (broken) local lookup? Would this not also be super unsafe to do for 32bit binaries on a 64bit nscd?

@Ericson2314
Copy link
Member

Can we merge this? Even if unscd isn't such a good option in practice for many things, it is good to un-hard-code the option.

@Ericson2314 Ericson2314 reopened this Apr 16, 2022
@Ericson2314 Ericson2314 merged commit b6f5bf2 into NixOS:master Apr 18, 2022
@fabianhjr
Copy link
Member

Seems like this merge caused breakage on the main branch: #169157

@fabianhjr
Copy link
Member

@dasJ
Copy link
Member

dasJ commented Apr 18, 2022

How did ofborg pass on this PR? Or was the result ignored?

Edit: Please don't ignore ofborg failures in the future.

@dasJ
Copy link
Member

dasJ commented Apr 18, 2022

I created a revert: #169190

@Ericson2314
Copy link
Member

Sorry!!

I saw the Cachix failure and missed the manual failure before it which was collapsed --- making me think the cachix failure was spurious.

Will not happen aagin.

@flokli
Copy link
Contributor

flokli commented Sep 29, 2022

unscd as nscd however still doesn't seem to work: #193535

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants