Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ca-specific-schema.sql: add index on RealisationsRefs(referrer) #5366

Merged
merged 1 commit into from Dec 14, 2021
Merged

ca-specific-schema.sql: add index on RealisationsRefs(referrer) #5366

merged 1 commit into from Dec 14, 2021

Conversation

trofi
Copy link
Contributor

@trofi trofi commented Oct 9, 2021

For a typical desktop system (~2K packages) we can easily get 100K
entries in RealisationsRefs. Without indices query for RealisationsRefs
requires linear scan.

Inefficiency is seen as a 100% CPU load of nix-daemon for the following
scenario:

$ nix edit -f . bash # add unused environment variable, like FOO="1"
# populate RealisationsRefs, build fresh system
$ nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
$ nix edit -f . bash # add unused environment variable, like FOO="2"
$ time nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'

In this case bash will be rebuilt a few times and then rest of CPU
time is spent on scanning RealisationsRefs table (about 5 CPU-minutes
on my machine).

Before the change:

$ time nix build -f nixos system ... # step 4 above
real    34m3,613s
user    0m5,232s
sys     0m0,758s

Of all this time about 29.5 minutes are taken by nix-daemon's CPU time.

After the change:

$ time nix build -f nixos system ... # step 4 above
real    4m50,061s
user    0m5,038s
sys     0m0,677s

Of all this time about 1 minute is taken by nix-daemon's CPU time.
Most of the time is spent polling for non-existent realisations on
cache-nixos.org.

@trofi
Copy link
Contributor Author

trofi commented Oct 9, 2021

CC @regnat

@edolstra edolstra added the ca-derivations Derivations with content addressed outputs label Oct 13, 2021
@trofi
Copy link
Contributor Author

trofi commented Oct 21, 2021

Noticed similar performance problem on ValidPath deletion as part of nix-collect-garbage. Added index to Realisations(outputPath) as well. At a glance gives ~10x speedup for nix-collect-garbage.

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. That’s awfully sensible indeed.

Can you just bundle the two schemas upgrades into one to avoid cluttering the migrateCASchema function to quickly?

@trofi
Copy link
Contributor Author

trofi commented Oct 21, 2021

Sounds good! Lumped two indices together into one schema update.

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I’ll just wait for the CI to finish, and merge it this evening or tomorrow.

(/me really wishes Github had a “Merge when CI is green” button like GitLab has)

@thufschmitt
Copy link
Member

The CI timeout is strange… I’ve restarted it this morning, but it seems to be sticky 🤔

@thufschmitt
Copy link
Member

I didn’t try to understand what was going on, but it looks like tests/ca/post-hook.sh.test is hanging when using nixUnstable as the daemon:

$ nix develop .#checks.x86_64-linux.installTests.againstCurrentUnstable --run './bootstrap.sh'
[...]
$ nix develop .#checks.x86_64-linux.installTests.againstCurrentUnstable --configure
[...]
$ nix develop .#checks.x86_64-linux.installTests.againstCurrentUnstable --run "make tests/ca/post-hook.sh.test"
*hangs*

@trofi
Copy link
Contributor Author

trofi commented Oct 30, 2021

Rebased past 19148f1 Apply a 60-minute timeout to the 'tests' workflow in hopes that timeouts disappear.

@trofi
Copy link
Contributor Author

trofi commented Oct 30, 2021

I think it hangs for me as well in nix copy and waiting for (itself?) to unlock the derivation:

$ pstree -p 1086501
-+= 00001 root systemd
 \-+= 01725 root nix-daemon --daemon
   \-+= 1085059 root nix-daemon 1085042
     \-+= 1086501 nixbld10 bash -e /nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh
       \-+- 1088707 nixbld10 make installcheck -j16 -l16
         \-+- 1117459 nixbld10 /nix/store/wv35g5lff84rray15zlzarcqi9fxzz84-bash-4.4-p23/bin/bash -c env TEST_NAME=tests/ca/post-hook TESTS_ENVIRONMENT="NIX_REMOTE= /nix/store/wv35g5lff84rray15zlzarcqi9fxzz84-bash-4.4-p2
           \-+- 1117467 nixbld10 /bin/sh mk/run_test.sh tests/ca/post-hook.sh
             \-+- 1121537 nixbld10 /nix/store/wv35g5lff84rray15zlzarcqi9fxzz84-bash-4.4-p23/bin/bash -e post-hook.sh
               |--- 1125917 nixbld10 nix-build -o /build/nix-test/tests/ca/post-hook/result dependencies.nix --post-build-hook /build/source/tests/push-to-store.sh
               \-+- 1124998 nixbld10 /nix/store/7mbasdw38q0q83pkgaizapanix22a4k7-nix-2.4pre20210922_bcd73eb/bin/nix-daemon
                 \-+= 1125942 nixbld10 /nix/store/7mbasdw38q0q83pkgaizapanix22a4k7-nix-2.4pre20210922_bcd73eb/bin/nix-daemon
                   \-+- 1126027 nixbld10 /bin/sh /build/source/tests/push-to-store.sh
                     \-+- 1126030 nixbld10 xargs nix copy --to file:/build/nix-test/tests/ca/post-hook/remote_store --no-require-sigs
                       \--- 1126032 nixbld10 nix copy --to file:/build/nix-test/tests/ca/post-hook/remote_store --no-require-sigs /build/nix-test/tests/ca/post-hook/store/mbxx0ilc2skad5mxc9bsnfdn88iq3bfv-dependencies-in

@trofi
Copy link
Contributor Author

trofi commented Oct 30, 2021

$ sudo strace -p 1126032 -y
strace: Process 1126032 attached
flock(3</build/nix-test/tests/ca/post-hook/var/nix/db/big-lock>, LOCK_EX

$ sudo lslocks | fgrep /build/nix-test/tests/ca/post-hook/var/nix/db/big-lock
nix             1126032 FLOCK        WRITE* 0          0          0 /build/nix-test/tests/ca/post-hook/var/nix/db/big-lock
nix-daemon      1125942 FLOCK        READ   0          0          0 /build/nix-test/tests/ca/post-hook/var/nix/db/big-lock

@trofi
Copy link
Contributor Author

trofi commented Oct 30, 2021

Created separate bug: #5450

@trofi
Copy link
Contributor Author

trofi commented Oct 30, 2021

Having poked a bit in gdb I think it's caused by this patch. Schema migration wants and exclusive write. It seems to be triggered by a first unfinished copy that holds reader lock.

@trofi
Copy link
Contributor Author

trofi commented Nov 9, 2021

Rebased past merged PR#5506. That should fix the CI.

…outputPath)

For a typical desktop system (~2K packages) we can easily get 100K
entries in RealisationsRefs. Without indices query for RealisationsRefs
requires linear scan.

RealisationsRefs(referrer)
--------------------------

Inefficiency is seen as a 100% CPU load of nix-daemon for the following
scenario:

    $ nix edit -f . bash # add unused environment variable, like FOO="1"
    # populate RealisationsRefs, build fresh system
    $ nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
    $ nix edit -f . bash # add unused environment variable, like FOO="2"
    $ time nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'

In this case `bash `will be rebuilt a few times and then rest of CPU
time is spent on scanning RealisationsRefs table (about 5 CPU-minutes
on my machine).

Before the change:

    $ time nix build -f nixos system ... # step 4 above
    real    34m3,613s
    user    0m5,232s
    sys     0m0,758s

Of all this time about 29.5 minutes are taken by nix-daemon's CPU time.

After the change:

    $ time nix build -f nixos system ... # step 4 above
    real    4m50,061s
    user    0m5,038s
    sys     0m0,677s

Of all this time about 1 minute is taken by nix-daemon's CPU time.
Most of the time is spent polling for non-existent realisations on
cache-nixos.org.

Realisations(outputPath)
------------------------

After running CA system for two weeks I got ~1M entries in Realisations
table. `nix-collect-garbage` became very slow (seemingly 100 path deletions
per second). It happens due to a slow cascading delete from Realisations
triggered by deletion from ValidPaths.

The fix is to add an index on primary key from ValidPaths(id) that
triggers cascading deletions.

Before the change:
    $ time nix-collect-garbage -d --max-freed 100G
    <interrupted before finish, took too long>
    real    23m32.411s
    user    17m49.679s
    sys     4m50.609s

Most of time was spent in re-scanning Realisations table on each path deletion.

After the change:
    $ time nix-collect-garbage -d --max-freed 100G

    real    8m43.226s
    user    6m16.317s
    sys     1m40.188s

Time is spent scanning sqlite indices and in kernel when unlinking directories.
@trofi
Copy link
Contributor Author

trofi commented Nov 10, 2021

Rebase helped.

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for letting this rot. It’s all fine now the the test is fixed. Unless @edolstra doesn’t agree I think we can merge this one (and I’ll update #5479 afterwards).

@thufschmitt thufschmitt merged commit 3fc8042 into NixOS:master Dec 14, 2021
@trofi trofi deleted the speedup-ca-query branch December 14, 2021 13:01
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881/162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ca-derivations Derivations with content addressed outputs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants