AutomaDeD

Description

AutomaDeD (Automata-based Debugging for Dissimilar parallel tasks) is a tool for automatic diagnosis of performance and correctness problems in MPI applications. It creates control-flow models of each MPI process and, when a failure occurs, these models are leveraged to find the origin of problems automatically. MPI calls are intercepted (using wrappers) to create the models. When an MPI application hangs, AutomaDeD creates a progress-dependence graph that helps finding the process (or group of processes) that caused the hang. Please refer to [1, 2] for more details.

Prodometer

This version of AutomaDeD implements the diagnosis algorithm of the Prodometer technique, which performs loop-aware progress-dependence analysis. For more information, please refer to Prodometer.

Building

For Unix-based machines (with Cmake), simply execute:

$ cmake -DCMAKE_INSTALL_PREFIX=<install_path>

To use callpath library(to normalize the library loading order):

$ cmake -DCMAKE_INSTALL_PREFIX=<install_path> -DSTATE_TRACKER_WITH_CALLPATH=ON

This will require two additional libraries callpath and adept_utils. You can get those from the following link:

https://github.com/scalability-llnl.

Then:

$ make
$ make install

It requires a C++ MPI compiler wrapper (like mpic++) The configure script should detect automatically your MPI compiler installation. If you want to specify a particular compiler, it can be done standard CMake techniques.

Boost should be installed in your system. CMake will try to detect boost in your system. To set the path for Boost for CMake to find, please use: -D BOOST_ROOT=.

Running

You have to link your MPI application against AutomaDeD's library. This could be done using either the static or the shared library. Once this is done, you can run your buggy application. You can use LD_PRELOAD=/lib/libstracker.so srun -n 16 -ppdebug ./test to run test application.

Take a look at the './example' directory to see some use cases.

To run with callpath library, please set env variable: AUT_USE_CALL_PATH=TRUE

You can stop dumping the tool output file using: export AUT_DO_NOT_DUMP=TRUE

If you choose to attach other debuggers on the LP process identified by the tool, you can use: export AUT_DO_NOT_EXIT=TRUE, to make sure the tool does not exit

About BG/Q systems

For BG/Q system, you need to specify Toolchain file for CMake: -D CMAKE_TOOLCHAIN_FILE=cmakemodules/Toolchain/BlueGeneQ-gnu.cmake

Using the GUI

AutomaDeD comes with a GUI which can read the AUT* file generated by the tool. The GUI, has a documentation file which explains how to use the GUI.

Known issues

If callpath is used, currently it does not give the full file name and line number information in the output file. So GUI can not be used. This support will be added soon.

References

[1] Subrata Mitra, Ignacio Laguna, Dong H. Ahn, Saurabh Bagchi, Martin Schulz, Todd Gamblin, Accurate application progress analysis for large-scale parallel debugging, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2014.

[2] Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin, Probabilistic Diagnosis of Performance Faults in Large-Scale Parallel Applications, International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.

[3] Ignacio Laguna, Todd Gamblin, Bronis R. de Supinski, Saurabh Bagchi, Greg Bronevetsky, Dong H. Ahn, Martin Schulz, Barry Rountree, "Large Scale Debugging of Parallel Tasks with AutomaDeD, ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC), Seattle, WA, Nov 2011.

[4] Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz, AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Chicago Illinois, Jun-Jul, 2010.

[5] Science & Technology Review, Supercomputing Tools Speed Simulations, July, 2014.

Authors

The main code infrastructure of AutomaDeD was written by: Ignacio Laguna (ilaguna@llnl.gov), LLNL

The code that implements the Prodometer algorithm was written by: Subrata Mitra (mitra4@purdue.edu), Purdue University

Project contributors:
Dong H. Ahn (LLNL)
Saurabh Bagchi (Purdue University)
Bronis R. de Supinski (LLNL)
Todd Gamblin (LLNL)
Martin Schulz (LLNL)
Greg Bronevetsky

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
GUI		GUI
cmakemodules		cmakemodules
examples		examples
src		src
unit_tests		unit_tests
wrap		wrap
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
COPYING		COPYING
Doxyfile		Doxyfile
INSTALL		INSTALL
LICENSE		LICENSE
Options.cmake		Options.cmake
README.md		README.md
_configs.sed		_configs.sed
aclocal.m4		aclocal.m4
config.h.in		config.h.in
package.conf		package.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

AutomaDeD

Prodometer

Authors

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

LLNL/AutomaDeD

Folders and files

Latest commit

History

Repository files navigation

AutomaDeD

Prodometer

Authors

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages