# I/O Performance for Versioned-HDF5 Files

For these tests, we have generated `.h5` data files using the `generate_data_deterministic.py` script from the [VersionedHDF5 repository](https://github.com/Quansight/versioned-hdf5), using the standard options ([see details here](#standard))

We performed the following tests:
1. [Test Large Fraction Changes Sparse](#test1)
2. [Test Mostly Appends Sparse](#test2)
3. [Test Small Fraction Changes Sparse](#test3)
4. [Test Mostly Appends Dense](#test4)
5. [Test Large Fraction Changes (Constant Array Size) Sparse](#test5)

**These tests were last run on**

In [None]:
from datetime import datetime
print(datetime.utcnow(), "UTC")

# Setup

The path to the generated test files is

In [None]:
path = "/home/melissa/projects/versioned-hdf5/analysis" # change this as necessary

In [None]:
import h5py
import json
import time
import numpy as np
import performance_tests
import matplotlib.pyplot as plt
from versioned_hdf5 import VersionedHDF5File

<a id='test1'></a>

# Test 1: Large Fraction Changes (Sparse)

In [None]:
testname = "test_large_fraction_changes_sparse"

## Reading in sequential mode

For this test, we'll read data from all versions in a file, sequentially. 

In [None]:
def read_times(filename):
    h5pyfile = h5py.File(filename, 'r+')
    vfile = VersionedHDF5File(h5pyfile)
    t0 = time.time()
    for vname in vfile._versions:
        if vname != '__first_version__':
            version = vfile[vname]
            group_key = list(version.keys())[0]
            val = version[group_key]['val']
    t = time.time()-t0
    h5pyfile.close()
    return t

In [None]:
%timeit read_times(f"{testname}_50_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_100_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_500_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_1000_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_5000_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_10000_14_None.h5")

In [None]:
rtimes = []
num_transactions = [50, 100, 500, 1000, 5000, 10000]
for n in num_transactions:
    filename = f"{testname}_{n}_14_None.h5"
    rtimes.append(read_times(filename))
    
plt.plot(num_transactions, rtimes, 'o-')
selected = [0, 3, 4, 5]
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Sequential read time (in sec) for {testname}")
plt.xlabel("Number of transactions")
plt.show()

As expected, read times increase for files with a larger number of versions, but the growth is close to linear.

## Reading specific version

For this test, we'll compute the times required to read a specific version from the versioned-hdf5 file. 

**Note**. Although possible, it is not recommended to read versions using integer indexing as the performance of reading versions from their name it far superior.

In [None]:
def read_version(filename, n):
    # Open file to read version
    h5pyfile = h5py.File(filename, 'r+')
    vfile = VersionedHDF5File(h5pyfile)
    # If you want to choose a version at random,
    # N = len(vfile._versions.keys())
    # index = np.random.randint(0, N)
    index = n // 2
    vname = list(vfile._versions.keys())[index]
    t0 = time.time()
    version = vfile[vname]
    # Do not use the syntax below for performance reasons:
    #version = vfile[-index]
    group_key = list(version.keys())[0]
    val = version[group_key]['val']
    t = time.time()-t0
    h5pyfile.close()
    return t

In [None]:
%timeit read_version(f"{testname}_10000_14_None.h5", 10000)

In [None]:
tests = [f"{testname}_50_14_None.h5",
         f"{testname}_100_14_None.h5",
         f"{testname}_500_14_None.h5",
         f"{testname}_1000_14_None.h5",
         f"{testname}_5000_14_None.h5",
         f"{testname}_10000_14_None.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]

for _ in range(50):
    vtimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        vtimes.append(read_version(filename, n))
    plt.plot(num_transactions, vtimes, '*-')

plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read random version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

From this test, it is clear that reading an arbitrary version from the file is not affected by the number of versions in the file.

## Reading latest version vs. reading first version

In [None]:
def read_first(filename):
    # Open file to read version
    h5pyfile = h5py.File(filename, 'r+')
    vfile = VersionedHDF5File(h5pyfile)
    t0 = time.time()
    version = vfile['initial_version']
    group_key = list(version.keys())[0]
    val = version[group_key]['val']
    t = time.time()-t0
    h5pyfile.close()
    return t

In [None]:
print(read_first(f"{testname}_10000_14_None.h5"))

In [None]:
result = %timeit -o read_first(f"{testname}_10000_14_None.h5")

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]
for _ in range(50):
    ftimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        ftimes.append(read_first(filename))
    
    plt.plot(num_transactions, ftimes, '*-')
    
plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read first version for {testname}")
plt.legend(["first", "second", "third"])
plt.xlabel("Number of transactions")
plt.show()

In [None]:
def read_last(filename):
    # Open file to read version
    h5pyfile = h5py.File(filename, 'r+')
    vfile = VersionedHDF5File(h5pyfile)
    t0 = time.time()
    #
    # Current version is 0
    # This is the same as 
    # version = vfile[vfile._versions.attrs['current_version']]
    #
    version = vfile[0]
    group_key = list(version.keys())[0]
    val = version[group_key]['val']
    t = time.time()-t0
    h5pyfile.close()
    return t

In [None]:
print(read_last(f"{testname}_10000_14_None.h5"))

In [None]:
result = %timeit read_last(f"{testname}_10000_14_None.h5")

In [None]:
def read_no_versions(filename):
    # Open file to read version
    h5pyfile = h5py.File(filename, 'r+')
    t0 = time.time()
    val = h5pyfile[list(h5pyfile.keys())[0]]['val']
    t = time.time()-t0
    h5pyfile.close()
    return t

In [None]:
result = %timeit read_no_versions(f"{testname}_10000_14_None_no_versions.h5")

In [None]:
tests_no_versions = [f"{testname}_50_14_None_no_versions.h5",
                     f"{testname}_100_14_None_no_versions.h5",
                     f"{testname}_500_14_None_no_versions.h5",
                     f"{testname}_1000_14_None_no_versions.h5",
                     f"{testname}_5000_14_None_no_versions.h5",
                     f"{testname}_10000_14_None_no_versions.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]

for _ in range(50):
    ltimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        ltimes.append(read_last(filename))
    plt.plot(num_transactions, ltimes, '*-')    
    
notimes = []
for i in range(6):
    filename = tests_no_versions[i]
    n = num_transactions[i]
    notimes.append(read_no_versions(filename))
    
plt.plot(num_transactions, notimes, 'ko-', ms=6)
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Time (in sec) to read latest version for {testname}")
plt.legend(["first", "second", "third"])
plt.xlabel("Number of transactions")
plt.show()

In this case, we can see that:
- reading the latest version is not as performant as reading an unversioned file;
- the time required to read the latest version from a Versioned HDF5 file increases modestly with the number of versions stored in the file.

<a id='test2'></a>

# Test 2: Mostly appends (Sparse)

In [None]:
testname = "test_mostly_appends_sparse"

## Reading in sequential mode

If we read data from each version of the file, sequentially, we obtain the following:

In [None]:
%timeit read_times(f"{testname}_50_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_100_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_500_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_1000_14_None.h5")

Again, as we expected, read times increase (this time, quadratically) with the increase in the number of versions.

In [None]:
rtimes = []
num_transactions = [50, 100, 500, 1000]
for n in num_transactions:
    filename = f"{testname}_{n}_14_None.h5"
    rtimes.append(read_times(filename))
    
plt.plot(num_transactions, rtimes, 'o-')
selected = [0, 1, 2, 3]
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Sequential read time (in sec) for {testname}")
plt.xlabel("Number of transactions")
plt.show()

## Reading specific version

Now, let's see the times required to read a specific version from each file.

In [None]:
%timeit read_version(f"{testname}_1000_14_None.h5", 500)

Similarly to what we observed in the first example, the number of versions in the file does not affect the time needed to read a specific version.

In [None]:
tests = [f"{testname}_50_14_None.h5",
         f"{testname}_100_14_None.h5",
         f"{testname}_500_14_None.h5",
         f"{testname}_1000_14_None.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000]

for _ in range(50):
    vtimes = []
    for i in range(4):
        filename = tests[i]
        n = num_transactions[i]
        vtimes.append(read_version(filename, n))
    plt.plot(num_transactions, vtimes, '*-')

plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read random version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

## Reading latest version vs. reading first version

In [None]:
print(read_first(f"{testname}_1000_14_None.h5"))

In [None]:
result = %timeit -o read_first(f"{testname}_1000_14_None.h5")

In [None]:
num_transactions = [50, 100, 500, 1000]
for _ in range(50):
    ftimes = []
    for i in range(4):
        filename = tests[i]
        n = num_transactions[i]
        ftimes.append(read_first(filename))
    
    plt.plot(num_transactions, ftimes, '*-')
    
plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read first version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

In [None]:
print(read_last(f"{testname}_1000_14_None.h5"))

In [None]:
result = %timeit read_no_versions(f"{testname}_1000_14_None_no_versions.h5")

In [None]:
tests_no_versions = [f"{testname}_50_14_None_no_versions.h5",
                     f"{testname}_100_14_None_no_versions.h5",
                     f"{testname}_500_14_None_no_versions.h5",
                     f"{testname}_1000_14_None_no_versions.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000]

for _ in range(50):
    ltimes = []
    for i in range(4):
        filename = tests[i]
        n = num_transactions[i]
        ltimes.append(read_last(filename))
    plt.plot(num_transactions, ltimes, '*-')    
    
notimes = []
for i in range(4):
    filename = tests_no_versions[i]
    n = num_transactions[i]
    notimes.append(read_no_versions(filename))
    
plt.plot(num_transactions, notimes, 'ko-', ms=6)
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Time (in sec) to read latest version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

<a id='test3'></a>

# Test 3: Small Fraction Changes (Sparse)

In [None]:
testname = "test_small_fraction_changes_sparse"

## Reading in sequential mode

In [None]:
%timeit read_times(f"{testname}_50_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_100_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_500_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_1000_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_5000_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_10000_14_None.h5")

In [None]:
rtimes = []
num_transactions = [50, 100, 500, 1000, 5000, 10000]
for n in num_transactions:
    filename = f"{testname}_{n}_14_None.h5"
    rtimes.append(read_times(filename))
    
plt.plot(num_transactions, rtimes, 'o-')
selected = [0, 3, 4, 5]
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Sequential read time (in sec) for {testname}")
plt.xlabel("Number of transactions")
plt.show()

## Reading specific version

The times required to read a specific version from each file are similarly unnaffected by the number of existing versions in the file.

In [None]:
%timeit read_version(f"{testname}_10000_14_None.h5", 10000)  

In [None]:
tests = [f"{testname}_50_14_None.h5",
         f"{testname}_100_14_None.h5",
         f"{testname}_500_14_None.h5",
         f"{testname}_1000_14_None.h5",
         f"{testname}_5000_14_None.h5",
         f"{testname}_10000_14_None.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]

for _ in range(50):
    vtimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        vtimes.append(read_version(filename, n))
    plt.plot(num_transactions, vtimes, '*-')

plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read random version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

From this test, it is clear that reading an arbitrary version from the file is not affected by the number of versions in the file.

## Reading latest version vs. reading first version

In [None]:
print(read_first(f"{testname}_10000_14_None.h5"))

In [None]:
result = %timeit -o read_first(f"{testname}_10000_14_None.h5")

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]
for _ in range(50):
    ftimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        ftimes.append(read_first(filename))
    
    plt.plot(num_transactions, ftimes, '*-')
    
plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read first version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

In [None]:
print(read_last(f"{testname}_10000_14_None.h5"))

In [None]:
result = %timeit read_last(f"{testname}_10000_14_None.h5")

In [None]:
result = %timeit read_no_versions(f"{testname}_10000_14_None_no_versions.h5")

In [None]:
tests_no_versions = [f"{testname}_50_14_None_no_versions.h5",
                     f"{testname}_100_14_None_no_versions.h5",
                     f"{testname}_500_14_None_no_versions.h5",
                     f"{testname}_1000_14_None_no_versions.h5",
                     f"{testname}_5000_14_None_no_versions.h5",
                     f"{testname}_10000_14_None_no_versions.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]

for _ in range(50):
    ltimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        ltimes.append(read_last(filename))
    plt.plot(num_transactions, ltimes, '*-')    
    
notimes = []
for i in range(6):
    filename = tests_no_versions[i]
    n = num_transactions[i]
    notimes.append(read_no_versions(filename))
    
plt.plot(num_transactions, notimes, 'ko-', ms=6)
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Time (in sec) to read latest version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

In this case, we can see that:
- reading the latest version is not as performant as reading an unversioned file;
- the time required to read the latest version from a Versioned HDF5 file increases modestly with the number of versions stored in the file.

<a id='test4'></a>

# Test 4: Mostly appends (Dense)

Finally, we test

In [None]:
testname = "test_mostly_appends_dense"

## Reading in sequential mode

Once again, we can see a quadratic behaviour on the graph, which is expected from the file sizes and the size of the arrays on each file.

In [None]:
%timeit read_times(f"{testname}_50_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_100_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_500_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_1000_14_None.h5")

Again, as we expected, read times increase (this time, quadratically) with the increase in the number of versions.

In [None]:
rtimes = []
num_transactions = [50, 100, 500, 1000]
for n in num_transactions:
    filename = f"{testname}_{n}_14_None.h5"
    rtimes.append(read_times(filename))
    
plt.plot(num_transactions, rtimes, 'o-')
selected = [0, 1, 2, 3]
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Sequential read time (in sec) for {testname}")
plt.xlabel("Number of transactions")
plt.show()

## Reading specific version

Now, let's see the times required to read a specific version from each file.

In [None]:
%timeit read_version(f"{testname}_1000_14_None.h5", 500)

Similarly to what we observed in the first example, the number of versions in the file does not affect the time needed to read a specific version.

In [None]:
tests = [f"{testname}_50_14_None.h5",
         f"{testname}_100_14_None.h5",
         f"{testname}_500_14_None.h5",
         f"{testname}_1000_14_None.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000]

for _ in range(50):
    vtimes = []
    for i in range(4):
        filename = tests[i]
        n = num_transactions[i]
        vtimes.append(read_version(filename, n))
    plt.plot(num_transactions, vtimes, '*-')

plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read random version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

## Reading latest version vs. reading first version

In [None]:
print(read_first(f"{testname}_1000_14_None.h5"))

In [None]:
result = %timeit -o read_first(f"{testname}_1000_14_None.h5")

In [None]:
num_transactions = [50, 100, 500, 1000]
for _ in range(50):
    ftimes = []
    for i in range(4):
        filename = tests[i]
        n = num_transactions[i]
        ftimes.append(read_first(filename))
    
    plt.plot(num_transactions, ftimes, '*-')
    
plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read first version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

In [None]:
print(read_last(f"{testname}_1000_14_None.h5"))

In [None]:
result = %timeit read_no_versions(f"{testname}_1000_14_None_no_versions.h5")

In [None]:
tests_no_versions = [f"{testname}_50_14_None_no_versions.h5",
                     f"{testname}_100_14_None_no_versions.h5",
                     f"{testname}_500_14_None_no_versions.h5",
                     f"{testname}_1000_14_None_no_versions.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000]

for _ in range(50):
    ltimes = []
    for i in range(4):
        filename = tests[i]
        n = num_transactions[i]
        ltimes.append(read_last(filename))
    plt.plot(num_transactions, ltimes, '*-')    
    
notimes = []
for i in range(4):
    filename = tests_no_versions[i]
    n = num_transactions[i]
    notimes.append(read_no_versions(filename))
    
plt.plot(num_transactions, notimes, 'ko-', ms=6)
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Time (in sec) to read latest version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

<a id='test5'></a>

# Test 5: Large Fraction Changes (Sparse) - Constant Size

In [None]:
testname = "test_large_fraction_constant_sparse"

## Reading in sequential mode

In [None]:
%timeit read_times(f"{testname}_50_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_100_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_500_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_1000_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_5000_14_None.h5")

In [None]:
%timeit read_times(f"{testname}_10000_14_None.h5")

In [None]:
rtimes = []
num_transactions = [50, 100, 500, 1000, 5000, 10000]
for n in num_transactions:
    filename = f"{testname}_{n}_14_None.h5"
    rtimes.append(read_times(filename))
    
plt.plot(num_transactions, rtimes, 'o-')
selected = [0, 3, 4, 5]
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Sequential read time (in sec) for {testname}")
plt.xlabel("Number of transactions")
plt.show()

## Reading specific version

The times required to read a specific version from each file are similarly unnaffected by the number of existing versions in the file.

In [None]:
%timeit read_version(f"{testname}_10000_14_None.h5", 10000)  

In [None]:
tests = [f"{testname}_50_14_None.h5",
         f"{testname}_100_14_None.h5",
         f"{testname}_500_14_None.h5",
         f"{testname}_1000_14_None.h5",
         f"{testname}_5000_14_None.h5",
         f"{testname}_10000_14_None.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]

for _ in range(50):
    vtimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        vtimes.append(read_version(filename, n))
    plt.plot(num_transactions, vtimes, '*-')

plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read random version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

From this test, it is clear that reading an arbitrary version from the file is not affected by the number of versions in the file.

## Reading latest version vs. reading first version

In [None]:
print(read_first(f"{testname}_10000_14_None.h5"))

In [None]:
result = %timeit -o read_first(f"{testname}_10000_14_None.h5")

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]
for _ in range(50):
    ftimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        ftimes.append(read_first(filename))
    
    plt.plot(num_transactions, ftimes, '*-')
    
plt.xticks(num_transactions)
plt.title(f"Time (in sec) to read first version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

In [None]:
print(read_last(f"{testname}_10000_14_None.h5"))

In [None]:
result = %timeit read_last(f"{testname}_10000_14_None.h5")

In [None]:
result = %timeit read_no_versions(f"{testname}_10000_14_None_no_versions.h5")

In [None]:
tests_no_versions = [f"{testname}_50_14_None_no_versions.h5",
                     f"{testname}_100_14_None_no_versions.h5",
                     f"{testname}_500_14_None_no_versions.h5",
                     f"{testname}_1000_14_None_no_versions.h5",
                     f"{testname}_5000_14_None_no_versions.h5",
                     f"{testname}_10000_14_None_no_versions.h5"]

In [None]:
num_transactions = [50, 100, 500, 1000, 5000, 10000]

for _ in range(50):
    ltimes = []
    for i in range(6):
        filename = tests[i]
        n = num_transactions[i]
        ltimes.append(read_last(filename))
    plt.plot(num_transactions, ltimes, '*-')    
    
notimes = []
for i in range(6):
    filename = tests_no_versions[i]
    n = num_transactions[i]
    notimes.append(read_no_versions(filename))
    
plt.plot(num_transactions, notimes, 'ko-', ms=6)
plt.xticks(np.array(num_transactions)[selected])
plt.title(f"Time (in sec) to read latest version for {testname}")
plt.xlabel("Number of transactions")
plt.show()

In this case, we can see that:
- reading the latest version is not as performant as reading an unversioned file;
- the time required to read the latest version from a Versioned HDF5 file increases modestly with the number of versions stored in the file.

# Summary

- `test_mostly_appends_sparse` and `test_mostly_appends_dense` show a quadratic behaviour with respect to file creation and sequential read times, while `test_large_fraction_changes_sparse` and `test_small_fraction_changes_sparse` show a linear behaviour in those same tests. This reflects what we observe in file sizes and can be partially explained by the increase in the dimension of the arrays which are stored at each version.
- Adding new versions and reading specific versions (by version name) from an existing file is almost unnaffected by the number of existing versions in each file. However, more tests are needed for a more robust conclusion.

<a id='standard'></a>
## Standard parameters

- `test_large_fraction_changes_sparse`: 
    - `num_rows_initial = 5000`
    - `num_rows_per_append = 10`
    - `num_inserts = 10`
    - `num_deletes = 10`
    - `num_changes = 1000`
- `test_small_fraction_changes_sparse`
    - `num_rows_initial = 5000`
    - `num_rows_per_append = 10`
    - `num_inserts = 10`
    - `num_deletes = 10`
    - `num_changes = 10`
- `test_mostly_appends_sparse`:
    - `num_rows_initial = 1000`
    - `num_rows_per_append = 1000`
    - `num_inserts = 10`
    - `num_deletes = 10`
    - `num_changes = 10`  
- `test_mostly_appends_dense`
    - `num_rows_initial_0 = 30`
    - `num_rows_initial_1 = 30`
    - `num_rows_per_append_0 = 1`
    - `num_inserts_0 = 1`
    - `num_inserts_1 = 10`
    - `num_deletes_0 = 1`
    - `num_deletes_1 = 1`
    - `num_changes = 10`