Skip to content

Latest commit

 

History

History
99 lines (76 loc) · 5.73 KB

README.md

File metadata and controls

99 lines (76 loc) · 5.73 KB

License

Cheetah - An Experiment Harness and Campaign Management System

Overview

Cheetah is an experiment harness for running codesign experiments to study the effects of online data analysis at the exascale. It provides a way to run large campaigns of experiments to understand the advantages and tradeoffs of different compression and reduction algorithms run using different orchestration mechanisms. Experiments can be run to analyze data offline, in situ (via a function that is part of the application), or online (in a separate, stand-alone application). The workflow may be composed so that different executables reside on separate nodes, or share compute nodes, in addition to fine-tuning the number of processes per node.

Users create a campaign specification file in Python that describes the applications that form the workflow, and the parameters that they are interested in exploring. Cheetah creates the campaign endpoint on the target machine, and users can then launch experiments using the generated submission script.

Cheetah's runtime framework, Savanna, translates experiment metadata into scheduler calls for the underlying system and manages the allocated resources for running experiments. Savanna contains definitions for different supercomputers; based upon this information about the target machine, Savanna uses the appropriate scheduler interface (aprun, jsrun, slurm) and the corresponding scheduler options to launch experiments.

Cheetah is centered around ADIOS, a middleware library that provides an I/O framework along with a publish-subscribe API for exchanging data in memory. Typically, all ADIOS-specific settings are set in an XML file that is read by the application. Cheetah provides an interface to edit ADIOS XML files to tune I/O options.

Installation

  • Dependency: Linux, Python 3.5+, psutil

  • On supercomputers it should be installed at a location accessible from the parallel file system

  • Cheetah can be installed via the Spack package manager as spack install codar-cheetah@develop.

  • Users can also download Cheetah and set the PATH:

    git clone git@github.com:CODARcode/cheetah.git
    cd cheetah          
    python3 -m venv venv-cheetah
    source venv-cheetah/bin/activate
    pip install --editable .
  • Cheetah has been tested on Summit (ORNL), Andes (ORNL), Theta (ANL), Cori (LBNL), and standalone Linux computers

Setting up a Cheetah environment
source <cheetah dir>/venv-cheetah/bin/activate

Documentation

The recommended start is to go through the Cheetah Tutorial under docs/Tutorials.
The Cheetah documentation can be found at https://codarcode.github.io/cheetah.

Releases

The current release is 1.1.1.

Supported Systems

System Name Cheetah Support System supports Node-Sharing Cheetah Node-Sharing Support
Local Linux machines N/A N/A
Summit (ORNL)
Andes (ORNL)
Spock (ORNL) -- --
Theta (ANL) N/A
Cori (LBNL) In progress

Authors

The primary authors of Cheetah are Kshitij Mehta (ORNL) and Bryce Allen (University of Chicago). All contributors are listed here.

Citing Cheetah

To refer to Cheetah in a publication, please cite the following paper:

  • Mehta, Kshitij, Allen, Bryce, Wolf, Matthew, Logan, Jeremy, Suchyta, Eric, Singhal, Swati, Choi, Jong Youl, Takahashi, Keichi, Huck, Kevin, Yakushin, Igor, Sussman, Alan, Munson, Todd, Foster, Ian, and Klasky, Scott.
    "A codesign framework for online data analysis and reduction"
    Journal: Concurrency and Computation: Practice and Experience
    https://doi.org/10.1002/cpe.6519.

Other paper:

Examples

For more examples of using Cheetah, see the examples directory.

Contributing

Cheetah is open source and we invite the community to collaborate. Create a pull-request to add your changes to the dev branch.

Reporting Bugs

Please open an issue on the github issues page to report a bug.

License

Cheetah is licensed under the Apache License v2.0. See the accompanying Copyright.txt for more details.