# Threading

**Source:** *Python and HDF5* by Andrew Collette, O'Reilly 2013.

<img src="./img/MT.png" width=600/>

Currently, the HDF5 library does not use multiple threads internally and most binary versions are not built with thread-saftey enabled. You have **two options**:

1. Compile a version of the HDF5 library with thread-safety enabled.
2. Use the non-thread-safe version and carefully schedule (synchronize) the thread access to HDF5 structures.

In [11]:
import numpy as np, h5py, threading, random, time, timeit

In [12]:
f = h5py.File("thread_demo.hdf5", "w")

In [13]:
dset = f.create_dataset("data", (2, 1024), dtype='f')

We will use two threads to update this dataset. One thread writes the first row and another thread writes the second row.

We use a *lock* to ensure that only one thread writes to the dataset (i.e., uses the HDF5 library).

In [14]:
lock = threading.RLock()

In [15]:
class ComputeThread(threading.Thread):
    
    def __init__(self, axis):
        self.axis = axis   # One thread does dset[0,:], the other dset[1, :].
        threading.Thread.__init__(self)
    
    def run(self):
        """ Perform a series of (simulated) computations and save to dataset.
        """
        for idx in range(1024):
            random_number = random.random()*0.01
            time.sleep(random_number)               # Perform computation
            with lock:
                dset[self.axis, idx] = random_number     # Save to dataset

In [18]:
thread1 = ComputeThread(0)

In [19]:
thread2 = ComputeThread(1)

In [20]:
thread1.start()
thread2.start()

Wait until both threads have finished.

In [23]:
thread1.join()
thread2.join()