A toy class encapsulating some of the difficulties of yt's existing io handling (for reading from a gadget hdf5 file):

In [44]:
import contextlib
import h5py
import os
from typing import List, Tuple
import numpy as np

class FileHandler:
    
    def __init__(self, filename):
        self.filename = os.path.expanduser(filename)
        
    def read_field(self, hdf_keys: List[str]):        
        with h5py.File(self.filename, "r") as f:
            contents = np.array(f["/".join(hdf_keys)])
        return contents
    
    def read_fields_bad(self, particle_type: str, field_list: List[str]) -> dict:
        fields = {}
        for field in field_list:
            # this opens/closes file every read!
            fields[field] = read_field([particle_type, field])

        return fields
    
    def read_fields_good(self, particle_type: str, field_list: List[str]) -> dict:
        fields = {}
        # opens the file once, but duplicates code in read_field
        with h5py.File(self.filename, "r") as f:
            for field in field_list:                
                fields[field] = np.array(f[f"{particle_type}/{field}"])

        return fields    
    
    def read_smoothing_length(self, particle_type):
        with h5py.File(self.filename, "r") as f:
            hsml = np.array(f[f"{particle_type}/SmoothingLength"])
        return hsml         
    
    def read_coordinates(self, particle_type: str):
        with h5py.File(self.filename, "r") as f:
            xyz = np.array(f[f"{particle_type}/Coordinates"])
            hsml = 0.
            if particle_type == "PartType0":
                # this double reads! 
                hsml = self.read_smoothing_length(particle_type)
                
        return xyz, hsml
    
       

The above class nicely encapsulates how to read a single field from an hdf file with the `read_field` method, but when using that method to read multiple fields (like in `read_fields_bad`), it would require opening and closing the file multiple times. To avoid this, we could write a new method, `read_fields_good`, where the new method opens and closes the file handle explicitly. This results in code duplication, and if not careful can lead to some less than ideal situations. 

In yt, we have many convenience functions for pulling certain data columns from disk. In the above class, the `read_smoothing_length` method mimics one of yt's commonly implemented io functions. But in yt, we also return the smoothing length whenever we read the coordinates, and so it's tempting to write (as in the above):

```python
    def read_coordinates(self, particle_type: str):
        with h5py.File(self.filename, "r") as f:
            xyz = np.array(f[f"{particle_type}/Coordinates"])
            hsml = 0.
            if particle_type == "PartType0":
                # this double reads! 
                hsml = self.read_smoothing_length(particle_type)
```

This, however, would re-open an already open file! The above example is fairly trivial to fix -- we could just copy the internals of `read_smoothing_length` up to `read_coordinates`. But that results in yet more code duplication and in the case of the real yt example, it's actually not quite so easy to re-write it in this way (the real functions are more complex). 

So ideally, we want to be able to:

* encapsulate and re-use the most basic file operation (fetching data from disk)
* avoid unnecessarily opening/closing files
* avoid opening already opened files
* minimize code duplication

One way to do this is with a nested `@contextlib.contextmanager`:

In [51]:
import contextlib
import h5py
import os
from typing import List, Tuple
import numpy as np

class FileHandler:
    
    def __init__(self, filename):
        self.filename = os.path.expanduser(filename)
        
    @contextlib.contextmanager
    def transaction(self, handle = None):
        if handle is None:
            with self.open_handle() as handle:
                yield handle
        else:            
            yield handle
        
    @contextlib.contextmanager
    def open_handle(self):
        f = h5py.File(self.filename, "r")
        yield f
        f.close()
        
    def read_field(self, hdf_keys: List[str], handle=None):        
        with self.transaction(handle) as f:
            contents = np.array(f["/".join(hdf_keys)])
        return contents
    
    def read_fields(self, particle_type: str, field_list: List[str], handle=None) -> dict:
        fields = {}
        with self.transaction(handle) as f:
            for field in field_list:
                fields[field] = self.read_field([particle_type, field], handle=f)
        return fields
    
    def read_smoothing_length(self, particle_type, handle=None):
        with self.transaction(handle) as f:
            hsml = self.read_field([particle_type, "SmoothingLength"], handle=f)
        return hsml         
    
    def read_coordinates(self, particle_type: str, handle=None):
        with self.transaction(handle) as f:
            xyz = self.read_field([particle_type, "Coordinates"], handle=f)
            hsml = 0.
            if particle_type == "PartType0":
                # this no longer double reads! 
                hsml = self.read_smoothing_length(particle_type, handle=f)
                
        return xyz, hsml
    
    def read_fields_with_coords(self, particle_type: str, fields: List[str]):
        
        with self.transaction() as f:
            field_data = self.read_fields(particle_type, fields, handle=f)
            coords, hsml = self.read_coordinates(particle_type, handle=f)
            
        return field_data, coords, hsml

The above construction creates a recursive `transaction` generator. If you don't give it any arguments, it yields a new file handle that you can use in a typical `with` constructor. But if you pass in an existing handle, it will simply yield that handle. The benefit of this construction is that you can build very flexible methods that rely on tightly constrained behavior without re-opening files. Consider `read_fields_with_coords`: 

In [53]:
file_handler = FileHandler("~/hdd/data/yt_data/yt_sample_sets/snapshot_033/snap_033.0.hdf5")

field_data, xyz, hsml = file_handler.read_fields_with_coords('PartType0', ['Density'])
field_data, xyz, hsml

({'Density': array([ 6577205. , 15850306. ,  6765328.5, ...,  6816981. , 22548702. ,
         25834210. ], dtype=float32)},
 array([[ 7.6320577 , 11.81454   ,  0.5112596 ],
        [ 7.630863  , 11.814384  ,  0.51114064],
        [ 7.633304  , 11.81966   ,  0.51152855],
        ...,
        [ 9.948605  ,  8.47677   , 14.566635  ],
        [ 9.948661  ,  8.478258  , 14.567051  ],
        [ 9.94791   ,  8.478077  , 14.566901  ]], dtype=float32),
 array([0.00320586, 0.00230037, 0.00324003, ..., 0.00309351, 0.00218892,
        0.00213684], dtype=float32))

In `read_fields_with_coords`, we open a file handle and pass that handle down -- that handle gets passed down all the way to the base `read_field` call, where we define the actual file-specific method for reading off of disk. Every other method is a derective for reading data in different ways, but we only define the actual file specific behavior in a single spot. This allows us to construct methods by combining any of the existing methods without worrying about whether or not our file is already open. Furthermore, it allows us to extend this class to other file types easily -- simply swap out how the file is opened in `open_handle` (and abstract away some of the above gadget-specific conventions, like the specification of `'Coordinates'`)! 
