Nanos6 is a runtime that implements the OmpSs-2 parallel programming model, developed by the Programming Models group at the Barcelona Supercomputing Center.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
api
commands
docs
loader
m4
paraver-cfg/nanos6
scripts
src
tests
.gitignore
COPYING
Makefile.am
README.md
configure.ac

README.md

Nanos6 Runtime

Nanos6 is a runtime that implements the OmpSs-2 parallel programming model, developed by the Programming Models group at the Barcelona Supercomputing Center.

Installation

Build requirements

To install Nanos6 the following tools and libraries must be installed:

  1. automake, autoconf, libtool, make and a C and C++ compiler
  2. boost >= 1.59
  3. hwloc
  4. numactl
  5. Finally, it's highly recommended to have a installation of Mercurium with OmpSs-2 support enabled. When installing OmpSs-2 for the first time, you can break the chicken and egg dependence between Nanos6 and Mercurium in both sides: on one hand, you can install Nanos6 without specifying a valid installation of Mercurium. On the other hand, you can install Mercurium without a valid installation of Nanos6 using the --enable-nanos6-bootstrap configuration flag.

Optional libraries and tools

In addition to the build requirements, the following libraries and tools enable additional features:

  1. extrae to generate execution traces for offline performance analysis with paraver
  2. elfutils and libunwind to generate sample-based profiling
  3. graphviz and pdfjam or pdfjoin from TeX to generate graphical representations of the dependency graph
  4. parallel to generate the graph representation in parallel
  5. PAPI to generate statistics that include hardware counters
  6. CUDA to enable CUDA tasks

Build procedure

Nanos6 uses the standard GNU automake and libtool toolchain. When cloning from a repository, the building environment must be prepared through the following command:

$ autoreconf -f -i -v

When the code is distributed through a tarball, it usually does not need that command.

Then execute the following commands:

$ ./configure --prefix=INSTALLATION_PREFIX ...other options...
$ make all check
$ make install

where INSTALLATION_PREFIX is the directory into which to install Nanos6.

The configure script accepts the following options:

  1. --with-nanos6-mercurium=prefix to specify the prefix of the Mercurium installation
  2. --with-boost to specify the prefix of the Boost installation
  3. --with-libunwind=prefix to specify the prefix of the libunwind installation
  4. --with-papi=prefix to specify the prefix of the PAPI installation
  5. --with-libnuma=prefix to specify the prefix of the numactl installation
  6. --with-extrae=prefix to specify the prefix of the extrae installation
  7. --enable-cuda to enable support for CUDA tasks

The location of elfutils and hwloc is always retrieved through pkg-config. The location of PAPI can also be retrieved through pkg-config if it is not specified through the --with-papi parameter. If they are installed in non-standard locations, pkg-config can be told where to find them through the PKG_CONFIG_PATH environment variable. For instance:

$ export PKG_CONFIG_PATH=$HOME/installations-mn4/elfutils-0.169/lib/pkgconfig:/apps/HWLOC/2.0.0/INTEL/lib/pkgconfig:$PKG_CONFIG_PATH

After Nanos6 has been installed, it can be used by compiling your C, C++ and Fortran codes with Mercurium using the --ompss-2 flag. Example:

$ mcc -c --ompss-2 a_part_in_c.c
$ mcxx -c --ompss-2 a_part_in_c_plus_plus.cxx
$ mcxx --ompss-2 a_part_in_c.o a_part_in_c_plus_plus.o -o app

Execution

Nanos6 applications can be executed as is. The number of cores that are used is controlled by running the application through the taskset command. For instance:

$ taskset -c 0-2,4 ./app

would run app on cores 0, 1, 2 and 4.

Tracing, debugging and other options

Nanos6 applications, unlike Nanos++ applications do not require recompiling their code to generate extrae traces or to generate additional information. This is instead controlled through environment variables, envar from now on, at run time.

Generating extrae traces

To generate an extrae trace, run the application with the NANOS6 envar set to extrae.

Currently there is an incompatibility when generating traces with PAPI. To solve it, define the following envar:

$ export NANOS6_EXTRAE_AS_THREADS=1

The resulting trace will show the activity of the actual threads instead of the activity at each CPU. In the future, this problem will be fixed.

Generating a graphical representation of the dependency graph

To generate the graph, run the application with the NANOS6 envar set to graph.

By default, the graph nodes include the full path of the source code. To remove the directories, set the NANOS6_GRAPH_SHORTEN_FILENAMES envar to 1.

The resulting file is a PDF that contains several pages. Each page represents the graph at a given point in time. Setting the NANOS6_GRAPH_SHOW_DEAD_DEPENDENCIES envar to 1 forces future and previous dependencies to be shown with different graphical attributes.

The NANOS6_GRAPH_DISPLAY envar, if set to 1, will make the resulting PDF to be opened automatically. The default viewer is xdg-open, but it can be overridden through the NANOS6_GRAPH_DISPLAY_COMMAND envar.

For best results, we suggest to display the PDF with "single page" view, showing a full page and to advance page by page.

Verbose logging

To enable verbose logging, run the application with the NANOS6 envar set to verbose.

By default it generates a lot of information. This is controlled by the NANOS6_VERBOSE envar, which can contain a comma separated list of areas. The areas are the following:

Section Description
AddTask Task creation
DependenciesByAccess Dependencies by their accesses
DependenciesByAccessLinks Dependencies by the links between the accesses to the same data
DependenciesByGroup Dependencies by groups of tasks that determine common predecessors and common successors
LeaderThread
TaskExecution Task execution
TaskStatus Task status transitions
TaskWait Entering and exiting taskwaits
ThreadManagement Thread creation, activation and suspension
UserMutex User-side mutexes (critical)

The case is ignored, and the all keyword enables all of them. Additionally, and area can have the ! prepended to it to disable it. For instance, NANOS6_VERBOSE=AddTask,TaskExecution,TaskWait is a good starting point.

By default, the output is emitted to standard error, but it can be sent to a file by specifying it through the NANOS6_VERBOSE_FILE envar. Also the NANOS6_VERBOSE_DUMP_ONLY_ON_EXIT can be set to 1 to delay the output to the end of the program to avoid getting it mixed with the output of the program.

Sample-based profiling

To enable sample-based profiling, run the application with the NANOS6 envar set to profile.

In this mode, the runtime records backtraces of the threads up to a given depth and with a given frequency. These parameters can be set through the following envars:

Name Default value Description
NANOS6_PROFILE_NS_RESOLUTION 1000 Sampling interval in nanoseconds
NANOS6_PROFILE_BACKTRACE_DEPTH 4 Number of stack frames to collect (excluding inlines) in each sample.
NANOS6_PROFILE_BUFFER_SIZE 1000000000 Number of sampling events to preallocate together in a chunk. The default value corresponds to 1 second of samples.

At the end of the execution, the runtime generates four files that contain entries sorted by decreasing frequency. Their first column contains the sample count, and the rest, the actual entry values. Their contents are the following:

line-profile-PID.txt: Source code lines

function-profile-PID.txt: Function names

inline-profile-PID.txt: Function names and source code lines including inlines

Since the sampling is performed over the return addresses in the stack, if the compiler performs inlining, a given address can correspond to several functions. This file shows for the number of samples that have the same associated source code lines.

backtrace-profile-by-line-PID.txt: Function names and source code lines including inlines of a full backtrace

Shows the number of samples that have a full backtrace that corresponds to the same exact source code lines.

backtrace-profile-by-address-PID.txt: Function names and source code lines including inlines of a full backtrace

Shows the number of samples that have a full backtrace with the same exact return addresses.

When compiling, Mercurium performs transformations to the original source code. At this time, Mercurium cannot preserve the original source code lines and function names. Hence, the outputs of the profiler are based on the transformed code. However, the transformed source code can be preserved by passing the -keep parameter to Mercurium.

Mercurium generates additional functions that wrap the task code. These appear in the backtraces and their names begin with nanos6_ol_ and nanos6_unpack_ and are followed by a number.

Obtaining statistics

To enable collecting statistics, run the application with the NANOS6 envar set to either stats or stats-papi. The first collects timing statistics and the second also records hardware counters.

By default, the statistics are emitted standard error when the program ends. The output can be sent to a file through the NANOS6_STATS_FILE envar.

The contents of the output contains the average for each task type and the total task average of the following metrics:

  • Number of instances
  • Mean instantiation time
  • Mean pending time (not ready due to dependencies)
  • Mean ready time
  • Mean execution time
  • Mean blocked time (due to a critical or a taskwait)
  • Mean zombie time (finished but not yet destroyed)
  • Mean lifetime (time between creation and destruction)

The output also contains information about:

  • Number of CPUs
  • Total number of threads
  • Mean threads per CPU
  • Mean tasks per thread
  • Mean thread lifetime
  • Mean thread running time

Most codes consist of an initialization phase, a calculation phase and final phase for verification or writing the results. Usually these phases are separated by a taskwait. The runtime uses the taskwaits at the outermost level to identify phases and will emit individual metrics for each phase.

Debugging

By default, the runtime is optimized for speed and will assume that the application code is correct. Hence, it will not perform most validity checks. To enable validity checks, run the application with the NANOS6 envar set to debug. This will enable many internal validity checks that may be violated with the application code is incorrect. In the future we may include a validation mode that will perform extensive application code validation.

To debug an application with a regular debugger, please compile its code with the regular debugging flags and also the -keep flag. This flag will force Mercurium to dump the transformed code in the local file system, so that it will be available for the debugger.

To debug dependencies, it is advised to reduce the problem size so that very few tasks trigger the problem, and then use let the runtime make a graphical representation of the dependency graph as shown previously.

Processing the NANOS6 envar involves selecting at run time a runtime compiled for the corresponding instrumentation. This part of the bootstrap is performed by a component of the runtime called "loader. To debug problems due to the installation, run the application with the NANOS6_LOADER_VERBOSE environment variable set to any value.

Runtime information

Information about the runtime may be obtained by running the application with the NANOS6_REPORT_PREFIX envar set, or by invoking the following command:

$ nanos6-info --runtime-details
Runtime path /opt/nanos6/lib/libnanos6-optimized.so.0.0.0
Runtime Version 2017-11-07 09:26:03 +0100 5cb1900
Runtime Branch master
Runtime Compiler Version g++ (Debian 7.2.0-12) 7.2.1 20171025
Runtime Compiler Flags -DNDEBUG -Wall -Wextra -Wdisabled-optimization -Wshadow -fvisibility=hidden -O3 -flto
Initial CPU List 0-3
NUMA Node 0 CPU List 0-3
Scheduler priority
Dependency Implementation linear-regions-fragmented
Threading Model pthreads

The NANOS6_REPORT_PREFIX envar may contain a string that will be prepended to each line. For instance, it can contain a sequence that starts a comment in the output of the program. Example:

$ NANOS6_REPORT_PREFIX="#" ./app
Some application output ...
#	string	version	2017-11-07 09:26:03 +0100 5cb1900		Runtime Version
#	string	branch	master		Runtime Branch
#	string	compiler_version	g++ (Debian 7.2.0-12) 7.2.1 20171025		Runtime Compiler Version
#	string	compiler_flags	-DNDEBUG -Wall -Wextra -Wdisabled-optimization -Wshadow -fvisibility=hidden -O3 -flto		Runtime Compiler Flags
#	string	initial_cpu_list	0-3		Initial CPU List
#	string	numa_node_0_cpu_list	0-3		NUMA Node 0 CPU List
#	string	scheduler	priority		Scheduler
#	string	dependency_implementation	linear-regions-fragmented		Dependency Implementation
#	string	threading_model	pthreads		Threading Model