How Fast Can You Hash
--------------------

This notebook runs some basic performance tests to determine how fast we can run data through the SHA-512 or MD5 algorithms.

This test uses fairly large files in an attempt to minimise the effect of the operating system caching disk data in RAM.

First we set up some functions that perform basic operations to compare:

* `zero_to_null` just passes zeros through the CPU, and so is as fast as anything could possibly go.
* `zero_to_hash` runs the zeros through SHA512, so is the fastest we can possibly hash (no disk I/O).
* `zero_to_file` streams the zeros to a file - disk write I/O only, no hash.
* `file_to_hash` hashes a file (created with `file_to_zero`), so this is disk read I/O and hashing.

In [1]:
import subprocess

def zero_to_null(x):
    return subprocess.check_output("dd if=/dev/zero of=/dev/null bs=1m count=1000",stderr=subprocess.STDOUT,shell=True)

def zero_to_hash(x):
    return subprocess.check_output("dd if=/dev/zero bs=1m count=1000 | openssl dgst -sha512",stderr=subprocess.STDOUT,shell=True)

def zero_to_hash_md5(x):
    return subprocess.check_output("dd if=/dev/zero bs=1m count=1000 | openssl dgst -md5",stderr=subprocess.STDOUT,shell=True)

def zero_to_file(x):
    return subprocess.check_output("dd if=/dev/zero of=test.zero.%s bs=1m count=1000" % x,stderr=subprocess.STDOUT,shell=True)

def file_to_hash(x):
    return subprocess.check_output("dd if=test.zero.%s bs=1m count=1000 | openssl dgst -sha512" % x,stderr=subprocess.STDOUT,shell=True)

# And as an example, here's how we can run it:
print(zero_to_hash_md5(1))

b'1000+0 records in\n1000+0 records out\n1048576000 bytes transferred in 2.143335 secs (489226400 bytes/sec)\ne5c834fbdaa6bfd8eac5eb9404eefdd4\n'


Now we can run them in parallel..., and ramp them up to see what happens when we run eight at once...

In [3]:
from multiprocessing import Pool, TimeoutError
import time
import re

def ramp(work_function):
    matcher = re.compile("\((\d+) bytes\/sec\)", re.MULTILINE)
    # start worker processes
    for size in range(1,9):
        with Pool(processes=size) as pool:
            start = time.time()

            # process in arbitrary order
            total_bytes_per_sec = 0
            for i in pool.imap_unordered(work_function, range(size)):
                #print(i)
                total_bytes_per_sec += int(matcher.search(i.decode("utf-8")).group(1))

            end = time.time()
            print(size, end - start, total_bytes_per_sec)

Ramping zero-to-null tops out at 4 parallel processes, which make sense because this laptop has four cores.

In [32]:
ramp(zero_to_null)

1 0.07068204879760742 18059938450
2 0.05475807189941406 43874315555
3 0.05520486831665039 63935700615
4 0.06678485870361328 74882499427
5 0.07371282577514648 82503736802
6 0.09348011016845703 76472343318
7 0.1224210262298584 66292609954
8 0.12909793853759766 71127003960


Similarly ramping the hash function (no I/O) tops out at 4 cores, at about 950MB/s (about 300MB/s/core but it seems there are some overheads/contention that drops it down slightly when running on all four).

In [5]:
ramp(zero_to_hash)

1 3.257678985595703 322724946
2 3.259582042694092 649649948
3 3.3683509826660156 937987893
4 4.3554089069366455 977290258
5 5.568627119064331 952358469
6 6.776917934417725 941826178
7 7.708115100860596 961465940
8 8.757246017456055 965819873


Remarkably, on this laptop, we can stream data into a file at 1,200MB/s (!) which is shared across all cores. Further testing outside of this notebook indicated that this was real I/O speed and not due to files being cached in RAM.

In [34]:
ramp(zero_to_file)

1 1.0067451000213623 1159417574
2 1.9087018966674805 1200274272
3 3.00907301902771 1152990121
4 4.346774101257324 1006375847
5 5.0535972118377686 1102269952
6 5.700016021728516 1176617828
7 7.790185213088989 1005546787
8 9.874356985092163 924100630


Consequently, as the I/O is so fast, and the CPU has only four cores, we cannot saturate the bandwith of this machine:

In [38]:
ramp(file_to_hash)

1 3.3825089931488037 311277138
2 3.341557025909424 632465965
3 3.7369539737701416 875616590
4 4.151122093200684 1053215608
5 5.213958024978638 1023341286
6 6.675849914550781 954463818
7 7.648998022079468 974894618
8 9.033785820007324 933483048


In [39]:
ramp(file_to_hash)

1 3.5086770057678223 301952787
2 3.683816909790039 589347858
3 4.415766000747681 741013990
4 5.2036590576171875 821603443
5 5.980767011642456 903950593
6 9.35149097442627 676495276
7 8.825182914733887 843763533
8 10.978984117507935 771277916


Switching to MD5, the maximum speed is about 2800MB/s (interesting that the performance now tops out at 6, which implies some level of low-level paralellism is allowing this to run even faster!)

In [4]:
ramp(zero_to_hash_md5)

1 2.138369083404541 492098811
2 2.11928391456604 992937563
3 2.10658597946167 1499559027
4 2.177093029022217 1971763487
5 2.237515926361084 2362044839
6 2.2916860580444336 2763366883
7 2.730095148086548 2892153697
8 3.0149781703948975 2852377164


So, in general, if we're hashing lots of files, we'll tend to run out of I/O before we run out of CPU. However, it depends on lots of things, so it's probably worth benchmarking your own kit.

Note that, if the data you are caching is fairly small, your operating system will likely cache it all in RAM rather ran re-reading from disk. In that case you'll get much higher speeds when tests are re-run.

Also, there's some subtle issues not investigated here. For example, if you have a lot of small files, then your read speeds can be very low on HDD-based systems, because the disk spends more time seeking to the start of files than it does reading data, and seeking is generally slower than reading.

Secondly, on some systems, particularly smaller HDD arrays, I/O speed can drop when you run multiple threads, because the different read requests start to compete with each-other. More heavily RAID-ed systems can compensate for this, but you only have so make HD read heads you can position at one time, and the precise balance will depend on file sizes and how they are distributed across the drives.

Generally, with SSD's, these issues are less severe.

