Skip to content
Debugging Tool based on Statistical Analysis
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
GUI
cmakemodules
examples
src
unit_tests
wrap
.gitignore
CMakeLists.txt
COPYING
Doxyfile
INSTALL
LICENSE
Options.cmake
README.md
_configs.sed
aclocal.m4
config.h.in
package.conf

README.md

AutomaDeD

  1. Description

AutomaDeD (Automata-based Debugging for Dissimilar parallel tasks) is a tool for automatic diagnosis of performance and correctness problems in MPI applications. It creates control-flow models of each MPI process and, when a failure occurs, these models are leveraged to find the origin of problems automatically. MPI calls are intercepted (using wrappers) to create the models. When an MPI application hangs, AutomaDeD creates a progress-dependence graph that helps finding the process (or group of processes) that caused the hang. Please refer to [1, 2] for more details.

Prodometer

This version of AutomaDeD implements the diagnosis algorithm of the Prodometer technique, which performs loop-aware progress-dependence analysis. For more information, please refer to Prodometer.

  1. Building

For Unix-based machines (with Cmake), simply execute:

$ cmake -DCMAKE_INSTALL_PREFIX=<install_path>

To use callpath library(to normalize the library loading order):

$ cmake -DCMAKE_INSTALL_PREFIX=<install_path> -DSTATE_TRACKER_WITH_CALLPATH=ON

This will require two additional libraries callpath and adept_utils. You can get those from the following link:

https://github.com/scalability-llnl.

Then:

$ make
$ make install

It requires a C++ MPI compiler wrapper (like mpic++) The configure script should detect automatically your MPI compiler installation. If you want to specify a particular compiler, it can be done standard CMake techniques.

Boost should be installed in your system. CMake will try to detect boost in your system. To set the path for Boost for CMake to find, please use: -D BOOST_ROOT=.

  1. Running

You have to link your MPI application against AutomaDeD's library. This could be done using either the static or the shared library. Once this is done, you can run your buggy application. You can use LD_PRELOAD=/lib/libstracker.so srun -n 16 -ppdebug ./test to run test application.

Take a look at the './example' directory to see some use cases.

To run with callpath library, please set env variable: AUT_USE_CALL_PATH=TRUE

You can stop dumping the tool output file using: export AUT_DO_NOT_DUMP=TRUE

If you choose to attach other debuggers on the LP process identified by the tool, you can use: export AUT_DO_NOT_EXIT=TRUE, to make sure the tool does not exit

  1. About BG/Q systems

For BG/Q system, you need to specify Toolchain file for CMake: -D CMAKE_TOOLCHAIN_FILE=cmakemodules/Toolchain/BlueGeneQ-gnu.cmake

  1. Using the GUI

AutomaDeD comes with a GUI which can read the AUT* file generated by the tool. The GUI, has a documentation file which explains how to use the GUI.

  1. Known issues

If callpath is used, currently it does not give the full file name and line number information in the output file. So GUI can not be used. This support will be added soon.

  1. References

[1] Subrata Mitra, Ignacio Laguna, Dong H. Ahn, Saurabh Bagchi, Martin Schulz, Todd Gamblin, Accurate application progress analysis for large-scale parallel debugging, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2014.

[2] Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin, Probabilistic Diagnosis of Performance Faults in Large-Scale Parallel Applications, International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.

[3] Ignacio Laguna, Todd Gamblin, Bronis R. de Supinski, Saurabh Bagchi, Greg Bronevetsky, Dong H. Ahn, Martin Schulz, Barry Rountree, "Large Scale Debugging of Parallel Tasks with AutomaDeD, ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC), Seattle, WA, Nov 2011.

[4] Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz, AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Chicago Illinois, Jun-Jul, 2010.

[5] Science & Technology Review, Supercomputing Tools Speed Simulations, July, 2014.

Authors

The main code infrastructure of AutomaDeD was written by: Ignacio Laguna (ilaguna@llnl.gov), LLNL

The code that implements the Prodometer algorithm was written by: Subrata Mitra (mitra4@purdue.edu), Purdue University

Project contributors:
Dong H. Ahn (LLNL)
Saurabh Bagchi (Purdue University)
Bronis R. de Supinski (LLNL)
Todd Gamblin (LLNL)
Martin Schulz (LLNL)
Greg Bronevetsky

You can’t perform that action at this time.