Distributed Interactive Visualization and Exploration of large datasets.
What is pyDive?
Use pyDive to work with homogeneous, n-dimensional arrays that are too big to fit into your local machine's memory. pyDive provides containers whose elements are distributed across a cluster or stored in a large hdf5/adios-file if the cluster is still too small. All computation and data-access is then done in parallel by the cluster nodes in the background. If you feel like working with numpy arrays pyDive has reached the goal!
- Since all cluster management is given to IPython.parallel you can take your existing profiles for pyDive. No further cluster configuration needed.
- Save bandwidth by slicing an array in parallel on disk first before loading it into main memory!
- GPU-cluster array available thanks to pycuda with additional support for non-contiguous memory.
- As all of pyDive's distributed array types are auto-generated from local arrays like numpy, hdf5, pycuda, etc... you can easily make your own local array classes distributed too.
import pyDive pyDive.init(profile='mpi') h5field = pyDive.h5.open("myData.h5", "myDataset", distaxes=(0,1)) ones = pyDive.ones_like(h5field) # Distribute file i/o and computation across the cluster h5field[::10,:] = h5field[::10,:].load() + 5.0 * ones[::10,:]