Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish migrating gitoxide from nom 7 to winnow 0.5 #956

Merged
merged 21 commits into from
Aug 19, 2023
Merged

Conversation

epage
Copy link
Contributor

@epage epage commented Aug 1, 2023

Follow up to #948 and #951

gix-actor, gix-object, gix-ref, and gix-testtools are interrelated, so this does it all at once

Things left out

  • Removing winnow from the API of the above packages
  • Switching from_bytes functions to using parse instead of (the implied) parse_next

BREAKING CHANGE: nom, and now winnow, is exposed in the public API of some of these packages

@epage epage marked this pull request as draft August 1, 2023 22:03
@epage epage changed the title Finish migrating gitoxide from nom 7 to winnow 0.5 WIP: Finish migrating gitoxide from nom 7 to winnow 0.5 Aug 1, 2023
@epage
Copy link
Contributor Author

epage commented Aug 1, 2023

Since I'm seeing local test failures, I'm rely on CI for the cases I'm not easily hitting due to other errors

@epage
Copy link
Contributor Author

epage commented Aug 2, 2023

Huh, mac-fast is failing in CI differently than local (in gix-worktree) but all other test jobs passed.

@Byron
Copy link
Owner

Byron commented Aug 2, 2023

So good to see this happening, and it's basically done already and working in the first stage :)!

Since I'm seeing local test failures, I'm rely on CI for the cases I'm not easily hitting due to other errors

I'd love to treat this as a bug, knowing that there were problems with git-lfs initially, but maybe the root cause is another one. One could try to checkout a fresh worktree and try just test there, just to be sure it's not some strange state that accumulated locally. If the issue is reproducible, I invite you to open an issue with the details so this can be fixed.

Huh, mac-fast is failing in CI differently than local (in gix-worktree) but all other test jobs passed.

It looks like everything passed now except for lint, whose failure is expected I presume due to the early stage of the PR.

gix-object/src/commit/decode.rs Outdated Show resolved Hide resolved
gix-object/src/lib.rs Outdated Show resolved Hide resolved
gix-object/src/lib.rs Outdated Show resolved Hide resolved
tests/tools/src/lib.rs Outdated Show resolved Hide resolved
@Byron
Copy link
Owner

Byron commented Aug 2, 2023

Just a quick note before I forget or it becomes a surprise later: d3f65d8 will be conflicting, and I recommend rebasing onto main and keep only the test. It should be easy from there to properly implement a fix for it. My TODO's are hopefully worth a chuckle, given how winnow should solve all of them.

@epage
Copy link
Contributor Author

epage commented Aug 2, 2023

Already rebased and working on cleaning it up right now

gix-object/src/commit/decode.rs Outdated Show resolved Hide resolved
@Byron
Copy link
Owner

Byron commented Aug 3, 2023

I am cheering you on from the side-lines, the goal is already in sight :)!

@epage epage force-pushed the winnow branch 2 times, most recently from 9abdc47 to 1b82390 Compare August 4, 2023 17:08
@epage epage force-pushed the winnow branch 8 times, most recently from 89053a2 to f1088a8 Compare August 9, 2023 15:34
@epage
Copy link
Contributor Author

epage commented Aug 17, 2023

With winnow-rs/winnow#316 and winnow-rs/winnow#317 merged into 0.5 and backported to 0.4 and 0.3

TagRefIter(sig)
510192e (main): 165.88 ns
ee75de1: 159.62 ns
86d7fd1 (w1): 157.66 ns
ac0e81c (w2): 140.30 ns
b37a909 (w3): 147.65 ns
fee441d (w4): 148.87 ns
b3f0418 (w5): 157.37 ns
86ea47f (winnow04): 157.88 ns
9ed7df0 (w6): 157.88 ns
f0cbf81 (w7): 158.08 ns
2fb4a54: 158.66 ns
12f03db (w8): 175.76 ns
3f8c91f (w9): 155.50 ns
a07590c: 155.83 ns
266864f: 155.77 ns
df226dd: 159.27 ns
93fc441: 158.16 ns
62f7bc6: 160.28 ns
be1f4b5 (winnow05): 155.82 ns
80d7991: 155.48 ns
b7ee9a1 (winnow): 162.52 ns

And for across-the-board

$ cargo bench -p gix-object@0.34.0 --bench decode-objects
     Running benches/decode_objects.rs (target/release/deps/decode_objects-2030aacefe52a773)
Gnuplot not found, using plotters backend
CommitRef(sig)          time:   [536.77 ns 537.97 ns 539.26 ns]
                        change: [-34.148% -33.943% -33.752%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

CommitRefIter(sig)      time:   [648.26 ns 662.31 ns 684.63 ns]
                        change: [-30.898% -29.723% -28.166%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

TagRef(sig)             time:   [170.18 ns 170.50 ns 170.86 ns]
                        change: [+4.3542% +4.7672% +5.1841%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

TagRefIter(sig)         time:   [174.34 ns 174.55 ns 174.77 ns]
                        change: [+1.5350% +2.1648% +2.7832%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

TreeRef(sig)            time:   [101.54 ns 101.70 ns 101.88 ns]
                        change: [+7.2389% +7.5092% +7.7863%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

TreeRefIter(sig)        time:   [46.948 ns 46.997 ns 47.054 ns]
                        change: [-6.4018% -6.2109% -5.9962%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

Seeing some of those lows, I feel like we could do better

@Byron
Copy link
Owner

Byron commented Aug 18, 2023

It's strange that these benchmark improvements seem to have no bearing on the real-world test.

❯ /Users/byron/dev/github.com/Byron/gitoxide/target/release/ein t h
 08:25:29 traverse commit graph done 1.2M commits in 10.35s (113.0k commits/s)
 08:25:29        estimate-hours Extracted and organized data from 1169974 commits in 15.651417ms (74751952 commits/s)
total hours: 1119736.25
total 8h days: 139967.03
total commits = 1169974
total authors: 31645
total unique authors: 24103 (23.83% duplication)

linux ( master) +369 -819 [!] took 10s
❯ ein t h
 08:25:54 traverse commit graph done 1.2M commits in 8.50s (137.6k commits/s)
 08:25:54        estimate-hours Extracted and organized data from 1169974 commits in 15.961291ms (73300712 commits/s)
total hours: 1119736.25
total 8h days: 139967.03
total commits = 1169974
total authors: 31645
total unique authors: 24103 (23.83% duplication)

(The cache was hot for both runs and I made sure the the …/target/release/ein binary is truly the one from b7ee9a1)

I am absolutely puzzled by that. Is this something you can reproduce?

@epage
Copy link
Contributor Author

epage commented Aug 18, 2023

$ git switch main
$ cd ..
$ cp -r gitoxide gitoxide1
$ cd gitoxide1
$ git switch winnow
$ cd ../linux
$
$ cargo run --release --manifest-path ../gitoxide/Cargo.toml --bin ein -- t h
$ cargo run --release --manifest-path ../gitoxide/Cargo.toml --bin ein -- t h
$ cargo run --release --manifest-path ../gitoxide/Cargo.toml --bin ein -- t h
$
$ cargo run --release --manifest-path ../gitoxide1/Cargo.toml --bin ein -- t h
$ cargo run --release --manifest-path ../gitoxide1/Cargo.toml --bin ein -- t h
$ cargo run --release --manifest-path ../gitoxide1/Cargo.toml --bin ein -- t h

I found that both main and winnow had the same range of completion times, from ~13.5ms to 15ms.

@Byron
Copy link
Owner

Byron commented Aug 18, 2023

Thank you, this probably means that it's an issue with the way it is compiled. On x86 there seem to be optimizations that don't kick in anymore on ARM64.

What's more surprising is that after a cargo clean, the main/nom version at 510192e delivers just 112.6k commits/s, whereas the winnow version at b7ee9a1 gets up to 115k commits/s on a fresh build as well.

But here it comes. The version I produced with cargo install that is from 9ef69e1 delivers 139.1k commits/s. After cargo installing the nom version at 510192e I am getting the same ~139 commit/s.

Now, after cargo installing the winnow version at b7ee9a1 I also see numbers around 139k commits/s. Running ./target/release/ein after installing it directly also yields the good values. However, rebuilding it with cargo build --release --bin ein shows the bad values once again.

The binary sizes are similar, so there must be something about cargo install that makes all the difference - maybe it builds something like target-cpu=native?

@epage
Copy link
Contributor Author

epage commented Aug 18, 2023

The binary sizes are similar, so there must be something about cargo install that makes all the difference - maybe it builds something like target-cpu=native?

Looking at the code, I'm not seeing --target-cpu being used anywhere. The only difference I see with cargo build is that it hard codes --release. Might be interesting to look at verbose output in case I missed anything.

@Byron
Copy link
Owner

Byron commented Aug 18, 2023

I didn't see it that with -v either, but tried to build with -C target-cpu=native - the resulting binary was just as fast as the bad case. In desperation, I wanted to see if ~/.cargo/bin is a magic place, and fortunately after copying the 'bad' binary onto the 'good' one it was still slow when executing it through the PATH.

But then it struck me: cargo install runs on the latest available dependencies, whereas cargo build runs on Cargo.lock. After running cargo update it actually got as fast as it should be even after using cargo build --release.

I think this case is solved now 🎉.

@Byron
Copy link
Owner

Byron commented Aug 18, 2023

Performance-wise, it seems like the benchmarks were a great indicator after all and by the looks of it (comparing winnow to nom), all values are better or have regressed by less than 5% (TagRef).

Gnuplot not found, using plotters backend
CommitRef(sig)          time:   [1.1752 µs 1.1888 µs 1.2056 µs]
                        change: [+24.581% +25.593% +26.850%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

CommitRefIter(sig)      time:   [1.2681 µs 1.2690 µs 1.2699 µs]
                        change: [+20.260% +20.443% +20.635%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

TagRef(sig)             time:   [245.54 ns 245.73 ns 245.95 ns]
                        change: [-5.2706% -4.9124% -4.6490%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  7 (7.00%) high mild
  1 (1.00%) high severe

TagRefIter(sig)         time:   [243.32 ns 243.75 ns 244.42 ns]
                        change: [-6.7406% -6.4258% -6.0412%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

TreeRef(sig)            time:   [138.67 ns 139.40 ns 140.11 ns]
                        change: [+4.5696% +5.1061% +5.6137%] (p = 0.00 < 0.05)
                        Performance has regressed.

TreeRefIter(sig)        time:   [61.047 ns 61.104 ns 61.186 ns]
                        change: [-0.9602% -0.8084% -0.6524%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

Thus I think performance won't be preventing a merge anymore, quite the opposite :).

@epage epage marked this pull request as ready for review August 18, 2023 16:50
@epage
Copy link
Contributor Author

epage commented Aug 18, 2023

I leave it to you. For me, there are more things I want to look at but that can be done independent of this being merged.

@epage
Copy link
Contributor Author

epage commented Aug 18, 2023

I dropped b7ee9a1 because it just seemed to slow things down (oddly enough)

@Byron
Copy link
Owner

Byron commented Aug 18, 2023

I leave it to you. For me, there are more things I want to look at but that can be done independent of this being merged.

Great, then I will work on merging this PR now as it seems ready in every way, while looking forward to seeing those ideas of yours to come to fruition.

Before I go: I have a feeling your CPU might be twice as fast as mine. With that said, what's the commits/s you get when running ein t h on the linux kernel? The best number I have seen is ~139k commits/s.

@epage
Copy link
Contributor Author

epage commented Aug 18, 2023

$ cargo run --release --manifest-path ../gitoxide/Cargo.toml --bin ein -- t h
    Finished release [optimized] target(s) in 0.09s
     Running `/home/epage/src/personal/gitoxide/target/release/ein t h`
 14:45:32 traverse commit graph done 1.1M commits in 8.30s (131.6k commits/s)
 14:45:32        estimate-hours Extracted and organized data from 1092424 commits in 13.815333ms (79073304 commits/s)
total hours: 1050880.75
total 8h days: 131360.09
total commits = 1092424
total authors: 29848
total unique authors: 22742 (23.81% duplication)

@Byron
Copy link
Owner

Byron commented Aug 18, 2023

Aw, snap, I hoped to see something in the 200k range. Of course, the workload is bound by zlib performance and parsing isn't holding anything back here, so improvements are probably out of range unless object decompression gets faster over time. Oh well, I keep an eye on that number with a new baseline of ~140k commits/s expecting it to go up over time.

I guess the real benchmark is against git shortlog:

✦ ❯ /usr/bin/time -lp git shortlog >/dev/null
real 11.27
user 10.95
sys 0.31
          2335965184  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              143036  page reclaims
                   7  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 229  involuntary context switches
         75705099109  instructions retired
         36068289312  cycles elapsed
          1516510016  peak memory footprint

linux ( master) +369 -819 [!] took 11s
✦ ❯ /usr/bin/time -lp ein t h
 22:00:38 traverse commit graph done 1.2M commits in 8.49s (137.8k commits/s)
 22:00:38        estimate-hours Extracted and organized data from 1169974 commits in 14.844667ms (78814432 commits/s)
total hours: 1119736.25
total 8h days: 139967.03
total commits = 1169974
total authors: 31645
total unique authors: 24103 (23.83% duplication)
real 8.84
user 10.03
sys 0.56
          1024720896  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               66234  page reclaims
                 240  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   5  voluntary context switches
              228344  involuntary context switches
         67037989003  instructions retired
         32757758823  cycles elapsed
           209817216  peak memory footprint

Interestingly, besides memory usage, ein is just a little bit faster if it was single-threaded, but can truly get ahead thanks to trivial multi-threading (trivial because Rust makes it so).

Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a million for working on this topic! It's a great outcome and I couldn't have wished for more.

@Byron Byron merged commit ef54aab into Byron:main Aug 19, 2023
17 checks passed
@epage epage deleted the winnow branch August 19, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants