VELOC: VEry-Low Overhead Checkpointing System
VeloC is a multi-level checkpoint/restart runtime that delivers high performance and scalability for complex heterogeneous storage hierarchies without sacrificing ease of use and flexibility.
It is primarily used as a fault-tolerance tool for tightly coupled HPC applications running on supercomputing infrastructure but is essential in many other use cases: suspend-resume, migration, debugging.
VeloC is a collaboration between Argonne National Laboratory and Lawrence Livermore National Laboratory as part of the Exascale Computing Project.
The documentation of VeloC is available here: http://veloc.rtfd.io
It includes a quick start guide as well that covers the basics needed to use VeloC on your system.
In case of questions and comments or help, please contact the VeloC team at firstname.lastname@example.org
The current Travis status of the VeloC master branch is:
Copyright (c) 2018, UChicago Argonne LLC, operator of Argonne National Laboratory
Copyright (c) 2018, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory.