# devlog 2024-04-25

_Author: Tyler Coles_

Testing cache utilities. This script:
1. tests that reading and writing from archives works,
2. checks that we can choose to gzip or not, and
3. measures the impact of gzipping on read/write time and file size.

In [2]:
import os
import shutil
import tempfile

from epymorph import geo_library
from epymorph.geo.static import StaticGeoFileOps as F

# Our subject geo can be anything, but this one is a useful demo because it's sizeable.
geo = geo_library['maricopa_cbg_2019']()

tempdir = tempfile.mkdtemp()

print("Save a geo without compression:")
%timeit F.save_as_archive(geo, f"{tempdir}/geo.tar")
print("Read a geo without compression:")
%timeit F.load_from_archive(f"{tempdir}/geo.tar")

print()

print("Save a geo compressed:")
%timeit F.save_as_archive(geo, f"{tempdir}/geo.tgz")
print("Read a geo with compression:")
%timeit F.load_from_archive(f"{tempdir}/geo.tgz")

print()

size_tar = os.path.getsize(f"{tempdir}/geo.tar")
size_tgz = os.path.getsize(f"{tempdir}/geo.tgz")

print(f"Bytes as a tar: {size_tar:>9,}")
print(f"Bytes as a tgz: {size_tgz:>9,}")
print(f"Compression ratio: {(size_tgz / size_tar):.1%}")

shutil.rmtree(tempdir)

# NOTE: the %timeit magics break isort and autopep8, so you're on your own for formatting

Save a geo without compression:
17.8 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Read a geo without compression:
3.74 ms ± 73.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Save a geo compressed:
20.6 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Read a geo with compression:
4.87 ms ± 57.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Bytes as a tar:   153,600
Bytes as a tgz:   134,722
Compression ratio: 87.7%


## Conclusion

We get decent savings in bytes by storing geos gzipped, and it doesn't take much longer to read and write. ✓