Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving counting performance #170

Merged
merged 10 commits into from
Aug 23, 2021
Merged

Improving counting performance #170

merged 10 commits into from
Aug 23, 2021

Conversation

Byron
Copy link
Member

@Byron Byron commented Aug 23, 2021

Currently single-threaded object counting during pack generation is 3x slower than when git does it. It takes git about 52s to count the linux kernel pack by tree traversal while gitoxide takes 160s on a single thread. All this while we know that gitoxide's pack access performance is en-par or even faster than the one in git.

Obviously there must be something here that causes this behaviour, here are a few ideas:

  • tree traversal doesn't store the information of decoded trees, even though it could and should. Down to ~140s
  • every traversed object is looked up currently during counting, which causes its entry to be decoded. This should be postponed to the 'entry' stage, which is multi-threaded. This decreases the time from 160s to 108s. Now counting ends after 100s.

I think it's worth noting that gitoxide does seemingly a little more than counting as it records some pack information that can later be used for faster entry lookups. Thus I have trouble imagining that it's going to be any faster than this in single-threaded mode. Overall, writing a pack of 3.6GB takes around 104 seconds on an M1 MacBook Air. It took 97s with a 1GB pack lookup cache instead of 512MB (with 58.12% efficiency instead of 52%)

➜  gitoxide git:(counting-performance) ✗ cargo build --release --no-default-features --features lean,cache-efficiency-debug --bin gixp && /usr/bin/time -lp ./target/release/gixp -v pack-create -r ../../torvalds/linux HEAD --statistics
   Compiling git-pack v0.9.0 (/Users/byron/dev/github.com/Byron/gitoxide/git-pack)
   Compiling git-odb v0.20.2 (/Users/byron/dev/github.com/Byron/gitoxide/git-odb)
   Compiling git-repository v0.7.2 (/Users/byron/dev/github.com/Byron/gitoxide/git-repository)
   Compiling gitoxide-core v0.10.2 (/Users/byron/dev/github.com/Byron/gitoxide/gitoxide-core)
   Compiling gitoxide v0.8.2 (/Users/byron/dev/github.com/Byron/gitoxide)
    Finished release [optimized] target(s) in 37.26s
MemoryCappedHashmap(536870912B)[600002824108]: 3561558 / 6849380 (hits/misses) = 52.00%, puts = 5868531
 12:03:44 counting done 8.1M objects in 91.66s (88.6k objects/s)
 12:03:50 resolving done 8.1M counts in 5.33s (1.5M counts/s)
 12:03:51   sorting done 8.1M counts in 0.89s (9.2M counts/s)
7765ddef2a3a30fc81eff9a19c228f21efbfb249.pack[================>----------------]
counting phaseries
        input objects                  1015172===============================>-]
        expanded objects               7118452   ===   ===   ===   ===   ===   ]
        decoded objects                5864317
        total objects                  0
generation phase
        decoded and recompressed       26507
        pack-to-pack copies            8097502
        missing objects                0
 12:03:56   writing done 3.6GB in 5.51s (651.0MB/s)
 12:03:56 consuming done 8.1M entries in 5.51s (1.5M entries/s)
real 104.05
user 101.44
sys 14.62
          2634465280  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              437169  page reclaims
              560051  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  65  voluntary context switches
              671311  involuntary context switches
        701160684628  instructions retired
        347450406272  cycles elapsed
          1686633344  peak memory footprint

For the same task, here is git. Note that git builds a better pack and it's Organges vs. Apples, but it's all we got a 3.10GiB pack in 137s .

➜  gitoxide git:(counting-performance) echo HEAD | /usr/bin/time -lp  git -C ../../torvalds/linux/.git   pack-objects --all-progress --stdout --revs >/dev/null
Enumerating objects: 8124009, done.
Counting objects: 100% (8124009/8124009), done.
Delta compression using up to 8 threads
Compressing objects: 100% (1321393/1321393), done.
Writing objects: 100% (8124009/8124009), 3.19 GiB | 54.35 MiB/s, done.
Total 8124009 (delta 6755648), reused 8120773 (delta 6752412), pack-reused 0
real 136.75
user 60.84
sys 12.45
          2413199360  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
             1365706  page reclaims
              859771  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                 135  signals received
                  87  voluntary context switches
              722739  involuntary context switches
        525791099498  instructions retired
        225718777412  cycles elapsed
          2012098816  peak memory footprintl

Now there is one avenue left to explore:

  • memory capped cache access performance isn't optimal, even though the cache hit rate is at about 52% it does a lot of cache trashing. There should be a free list to reduce allocator pressure.

Shortest time noe 86s for counting and 98s for everything. 44s and 50s with multi threaded counting, making it about 2.7x faster than git.

Now we are talking!

This could be a real avenue, in single-threaded mode this probably
shouldn't happen and the work should rather be postponed to
entry generation, copying, which right now is ridiculously fast
as it doesn't have to do much work anymore.

> counting done 8.1M objects in 107.57s (75.5k objects/s)
…which will already be worth quite a bit.
Next is delaying the pack access for another win.

Ultimately it won't be enough to be as fast as git though.
@Byron Byron mentioned this pull request Aug 23, 2021
43 tasks
@Byron Byron merged commit dce4f97 into main Aug 23, 2021
@Byron Byron deleted the counting-performance branch August 23, 2021 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant