Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check modification time when loading stored modules #2698

Closed
wants to merge 1 commit into from

Conversation

lukaszcz
Copy link
Collaborator

@lukaszcz lukaszcz added this to the 0.6.1 milestone Mar 21, 2024
@lukaszcz lukaszcz self-assigned this Mar 21, 2024
@janmasrovira
Copy link
Collaborator

janmasrovira commented Mar 21, 2024

I've done a rudimentary benchmark to check whether getting the last modified time was cheaper than computing the hash. It turns out that computing the hash is much faster.

The benchmark consists in replicating the juvix stdlib 50 times and then compare the time it takes to hash every juvix file as opposed to get the last modification time.

#!/usr/bin/env bash

ORIGINAL_DIR="juvix-stdlib"

rm -rf benchtmp
mkdir benchtmp

for i in {1..50}; do
    NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
    cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done

hyperfine --warmup 2 \
    --command-name "hash-256" \
    'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 sha256sum' \
    --command-name "last-modified-time" \
    'find benchtmp -type f -name "*.juvix" -exec stat --format="%y %n" {} \;'

The result is:

Benchmark 1: hash-256
  Time (mean ± σ):      56.4 ms ±   0.5 ms    [User: 17.2 ms, System: 40.5 ms]
  Range (min … max):    55.4 ms …  57.6 ms    51 runs

Benchmark 2: last-modified-time
  Time (mean ± σ):      1.636 s ±  0.013 s    [User: 1.216 s, System: 0.419 s]
  Range (min … max):    1.614 s …  1.656 s    10 runs

Summary
  hash-256 ran
   29.01 ± 0.33 times faster than last-modified-time

My expectation is that replicating a similar benchmark in Haskell would give similar results, so I'm not sure merging this pr will be an improvement.

@lukaszcz
Copy link
Collaborator Author

Yeah, maybe it's not worth merging this PR. I guess this depends a lot on the OS, how long it takes to read modification time in comparison to computing sha256. Probably there's some disk caching involved.

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Mar 22, 2024

This has nothing to do with the difference between checking last modified time and computing hash. It's because of some weird properties of the find program.

With the benchmark:

#!/usr/bin/env bash

ORIGINAL_DIR="juvix-stdlib"

rm -rf benchtmp
mkdir benchtmp

for i in {1..50}; do
    NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
    cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done

hyperfine --warmup 2 \
    --command-name "hash-256" \
    'find benchtmp -type f -name "*.juvix" -exec sha256sum {} \;' \
    --command-name "last-modified-time" \
    'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 stat --format="%y %n"'

I get the results:

Benchmark 1: hash-256
  Time (mean ± σ):      3.116 s ±  0.024 s    [User: 2.052 s, System: 1.018 s]
  Range (min … max):    3.086 s …  3.177 s    10 runs
 
Benchmark 2: last-modified-time
  Time (mean ± σ):      47.4 ms ±   0.7 ms    [User: 10.2 ms, System: 38.5 ms]
  Range (min … max):    45.8 ms …  49.2 ms    61 runs
 
Summary
  'last-modified-time' ran
   65.78 ± 1.14 times faster than 'hash-256'

Apparently, the -exec option in find is just much slower than -print0 plus xargs.

@lukaszcz
Copy link
Collaborator Author

Actually, when run in a comparable way:

#!/usr/bin/env bash

ORIGINAL_DIR="juvix-stdlib"

rm -rf benchtmp
mkdir benchtmp

for i in {1..50}; do
    NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
    cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done

hyperfine --warmup 2 \
    --command-name "hash-256" \
    'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 sha256sum' \
    --command-name "last-modified-time" \
    'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 stat --format="%y %n"'

checking modification time is ~1.5 times faster, as one would expect:

Benchmark 1: hash-256
  Time (mean ± σ):      67.4 ms ±   0.9 ms    [User: 19.1 ms, System: 49.6 ms]
  Range (min … max):    65.5 ms …  70.8 ms    41 runs
 
Benchmark 2: last-modified-time
  Time (mean ± σ):      47.3 ms ±   0.6 ms    [User: 8.8 ms, System: 39.9 ms]
  Range (min … max):    46.4 ms …  50.0 ms    61 runs
 
Summary
  'last-modified-time' ran
    1.42 ± 0.03 times faster than 'hash-256'

The difference is probably much bigger for typical usage, because when you run the benchmark several times the file contents will be in the OS disk cache after the first run, i.e., in RAM, which hyperfine has no control over.

@lukaszcz
Copy link
Collaborator Author

Okay, here is a version which drops filesystem caches on every run (Linux only). That makes more sense. The difference is 1.86 times faster in favour of checking modification time.

#!/usr/bin/env bash

ORIGINAL_DIR="juvix-stdlib"

rm -rf benchtmp
mkdir benchtmp

for i in {1..50}; do
    NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
    cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done

hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
    --command-name "hash-256" \
    'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 sha256sum' \
    --command-name "last-modified-time" \
    'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 stat --format="%y %n"'

Results:

Benchmark 1: hash-256
  Time (mean ± σ):     471.3 ms ±  36.3 ms    [User: 31.9 ms, System: 134.8 ms]
  Range (min … max):   426.8 ms … 527.5 ms    10 runs
 
Benchmark 2: last-modified-time
  Time (mean ± σ):     253.4 ms ±   2.1 ms    [User: 15.9 ms, System: 96.2 ms]
  Range (min … max):   249.6 ms … 256.2 ms    10 runs
 
Summary
  'last-modified-time' ran
    1.86 ± 0.14 times faster than 'hash-256'

@janmasrovira
Copy link
Collaborator

I'd say the benchmark with cache is more relevant since that will be the most common.
For reference, my results are:

Benchmark 1: hash-256
  Time (mean ± σ):      59.8 ms ±   1.3 ms    [User: 17.5 ms, System: 43.8 ms]
  Range (min … max):    57.7 ms …  64.4 ms    48 runs

Benchmark 2: last-modified-time
  Time (mean ± σ):      47.4 ms ±   1.3 ms    [User: 12.9 ms, System: 36.1 ms]
  Range (min … max):    45.5 ms …  53.0 ms    60 runs

Summary
  last-modified-time ran
    1.26 ± 0.04 times faster than hash-256

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Mar 22, 2024

I'd say the benchmark with cache is more relevant since that will be the most common.

Yes, actually, you're right. The first time we do recompilation anyway.

@paulcadman paulcadman modified the milestones: 0.6.1, 0.6.2 Mar 25, 2024
@lukaszcz
Copy link
Collaborator Author

Since this doesn't make a significant impact on performance, we decided not to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve per-module compilation by checking modification time
3 participants