Skip to content

Commit

Permalink
rename powmon to var_monitor throughout (#523)
Browse files Browse the repository at this point in the history
  • Loading branch information
slabasan committed Mar 12, 2024
1 parent 732adeb commit 7a006bf
Show file tree
Hide file tree
Showing 24 changed files with 167 additions and 166 deletions.
6 changes: 3 additions & 3 deletions scripts/license.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@
r"^src/examples/using-with-cmake/c/.*\.c$",
r"^src/examples/using-with-cmake/c\+\+/.*\.c$",
r"^src/examples/using-with-make/c/.*\.c$",
# variorum powmon
r"^src/powmon/.*CMakeLists.txt$",
r"^src/powmon/.*\.[ch]$",
# variorum monitoring utility
r"^src/var_monitor/.*CMakeLists.txt$",
r"^src/var_monitor/.*\.[ch]$",
# variorum tests
r"^src/tests/.*CMakeLists.txt$",
r"^src/tests/.*\.cpp$",
Expand Down
4 changes: 2 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,8 @@ endif()
### Add our examples
add_subdirectory(examples)

### Add powmon sampler
add_subdirectory(powmon)
### Add var_monitor sampler
add_subdirectory(var_monitor)

### Add config helpers
add_subdirectory(config)
Expand Down
4 changes: 2 additions & 2 deletions src/docs/sphinx/BuildingVariorum.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
###################

Variorum can be built from source with CMake or with ``spack``. Building
Variorum creates the ``libvariorum`` library, the ``powmon`` monitoring tool,
and Variorum examples.
Variorum creates the ``libvariorum`` library, the ``var_monitor`` monitoring
tool, and Variorum examples.

********************
Build Dependencies
Expand Down
83 changes: 0 additions & 83 deletions src/docs/sphinx/Powmon.rst

This file was deleted.

83 changes: 83 additions & 0 deletions src/docs/sphinx/VarMonitor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
..
# Copyright 2019-2023 Lawrence Livermore National Security, LLC and other
# Variorum Project Developers. See the top-level LICENSE file for details.
#
# SPDX-License-Identifier: MIT
###################################
Monitoring Binaries with Variorum
###################################

While the Variorum API allows for detailed critical path analysis of the power
profile of user applications as well as for integration with system software
such as Kokkos, Caliper, and Flux through code annotations, there are scenarios
where such annotations are not possible. In order to support such scenarios, we
provide the ``var_monitor`` tool, which can monitor a binary externally with
Variorum in a vendor-neutral manner. This tool can monitor an application
externally without requiring any code changes or annotations.

The ``variorum/src/var_monitor`` directory contains this tool, which is built
along with the regular Variorum build. While a target executable is running,
``var_monitor`` collects time samples of power usage, power limits, energy,
thermals, and other performance counters for all sockets in a node at a regular
interval. By default, it collects basic node-level power information, such as
CPU, memory, and GPU power, at 50ms intervals, which it reports in a CSV format.
It also supports a verbose (``-v``) mode, where additional registers and sensors
are sampled for the advanced user. The sampling rate is configurable with the
``-i`` option. As an example, the command below will sample the power usage
while executing a sleep for 10 seconds in a vendor neutral manner:

.. code:: bash
$ var_monitor -a "sleep 10"
The resulting data is written to two files:

.. code:: bash
hostname.var_monitor.dat
hostname.var_monitor.summary
Here, ``hostname`` will change based on the node where the monitoring is
occurring. The ``summary`` file contains global information such as execution
time. The ``dat`` file contains the time sampled data, such as power, thermals,
and performance counters in a column-delimited format. The output differs on
each platform based on available counters.

``var_monitor`` also supports profiling across multiple nodes with the help of
resource manager commands (such as ``srun`` or ``jsrun``) or MPI commands (such
as ``mpirun``). As shown in the example below, the user can specify the number
of nodes through ``mpirun`` and utilize ``var_monitor`` with their application.

.. code:: bash
$ mpirun -np <num-nodes> ./var_monitor -a ./application
We also provide a set of simple plotting scripts for ``var_monitor``, which are
located in the ``src/var_monitor/scripts`` folder. The ``var_monitor-plot.py``
script can generate per-node as well as aggregated (across multiple nodes)
graphs for the default version of ``var_monitor`` that provides node-level and
CPU, GPU and memory data. This script works across all architectures that
support Variorum's JSON API for collecting power. Additionally, for IBM sensors
data, which can be obtained with the ``var_monitor -v`` (verbose) option, we
provide a post processing and R script for plots.

In addition to ``var_monitor`` that is vendor-neutral, for Intel systems only,
we provide two other power capping tools, ``power_wrapper_static``, and
``power_wrapper_dynamic`` that allow users to set a static (or dynamic) power
cap and then monitor their binary application.

The example below will set a package-level power limit of 100W on each socket,
and then sample the power usage while executing a sleep for 10 seconds:

.. code:: bash
$ power_wrapper_static -w 100 -a "sleep 10"
Similarly, the example below will set an initial package-level power limit of
100W on each socket, sample the power usage, and then dynamically adjust the
power cap step-wise every 500ms while executing a sleep for 10 seconds:

.. code:: bash
$ power_wrapper_dynamic -w 100 -a "sleep 10"
13 changes: 7 additions & 6 deletions src/docs/sphinx/VariorumAPI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,11 +122,12 @@ The API to obtain node utilization has the following format. It takes a string
(``char**``) by reference as input, and populates this string with a JSON object
with total CPU, system CPU, user CPU, total memory, and GPU (when available)
utilizations. It reports the utilization of each available GPU. GPU utilization
is obtained using the NVML and RSMI APIs. The total memory utilization is computed
using ``/proc/meminfo``, and CPU utilizations is computed using ``/proc/stat``.
is obtained using the NVML and RSMI APIs. The total memory utilization is
computed using ``/proc/meminfo``, and CPU utilizations is computed using
``/proc/stat``.

The ``variorum_get_utilization_json(char **get_util_obj_str)`` function
returns a string type nested JSON object. An example is provided below:
The ``variorum_get_utilization_json(char **get_util_obj_str)`` function returns
a string type nested JSON object. An example is provided below:

.. code::
Expand All @@ -149,8 +150,8 @@ returns a string type nested JSON object. An example is provided below:
The ``*`` here refers to socket ID, and the ``#`` refers to GPU ID.

The ``variorum_get_utilization_json(char **get_util_obj_str)`` function
returns a string type nested JSON object. An example is provided below:
The ``variorum_get_utilization_json(char **get_util_obj_str)`` function returns
a string type nested JSON object. An example is provided below:

.. code::
Expand Down
2 changes: 1 addition & 1 deletion src/docs/sphinx/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ the nation's exascale computing imperative.
VariorumAPI
Examples
HWArchitectures
Powmon
VarMonitor
Utilities

.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion src/examples/variorum-monitoring-to-file-example.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ int main(int argc, char **argv)
}

gethostname(hostname, 1024);
ret = asprintf(&fname, "%s.powmon.dat", hostname);
ret = asprintf(&fname, "%s.var_monitor.dat", hostname);
if (ret < 0)
{
printf("Fatal Error: Cannot allocate memory for fname.\n");
Expand Down
2 changes: 1 addition & 1 deletion src/examples/variorum-poll-power-to-file-example.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ int main(int argc, char **argv)
}

gethostname(hostname, 1024);
ret = asprintf(&fname, "%s.powmon.dat", hostname);
ret = asprintf(&fname, "%s.var_monitor.dat", hostname);
if (ret < 0)
{
printf("Fatal Error: Cannot allocate memory for fname.\n");
Expand Down
20 changes: 0 additions & 20 deletions src/powmon/scripts/powmon-ibm-post-process.sh

This file was deleted.

12 changes: 6 additions & 6 deletions src/powmon/CMakeLists.txt → src/var_monitor/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ set(CMAKE_CXX_FLAGS "-pthread")

message(STATUS "Adding variorum demoapps")

set(powmon_sources
set(var_monitor_sources
highlander.c
powmon.c
var_monitor.c
)
message(STATUS " [*] Adding demoapp: powmon")
add_executable(powmon ${powmon_sources})
target_link_libraries(powmon variorum ${variorum_deps})
message(STATUS " [*] Adding demoapp: var_monitor")
add_executable(var_monitor ${var_monitor_sources})
target_link_libraries(var_monitor variorum ${variorum_deps})

set(power_wrapper_static_sources
highlander.c
Expand All @@ -37,7 +37,7 @@ target_link_libraries(power_wrapper_dynamic variorum ${variorum_deps})
include_directories(${CMAKE_SOURCE_DIR}/variorum
${CMAKE_SOURCE_DIR}/variorum/Intel)

install(TARGETS powmon power_wrapper_static power_wrapper_dynamic
install(TARGETS var_monitor power_wrapper_static power_wrapper_dynamic
DESTINATION bin)

# quick hack
Expand Down
20 changes: 10 additions & 10 deletions src/powmon/README.md → src/var_monitor/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
POWMON
======
VAR_MONITOR
===========
This directory contains three Variorum-based power monitors. The resulting
data is written to two files:
* hostname.powmon.dat
* hostname.powmon.summary
* hostname.var_monitor.dat
* hostname.var_monitor.summary

`hostname` will change based on the node where the monitoring is occurring. The
`summary` file contains global information such as execution time. The `dat`
Expand All @@ -16,20 +16,20 @@ All three monitors are wrappers around some other process that will be
executing on the node and includes logic so that only one power monitor is run
per node.

powmon
------
var_monitor
-----------
While a target executable is running, sample power usage and power limits (and
other performance counters) for all sockets in a node at a regular interval.

The example below will sample the power usage while executing a sleep for 10
seconds:

$ powmon -a "sleep 10"
$ var_monitor -a "sleep 10"

Powmon also allows sampling of utilization. The example below will sample
The var_monitor also allows sampling of utilization. The example below will sample
utilization metrics as well as power while executing a sleep for 10 seconds:

$ powmon -u -a "sleep 10"
$ var_monitor -u -a "sleep 10"

power_wrapper_static
--------------------
Expand Down Expand Up @@ -61,6 +61,6 @@ If you launch one of the power monitors and it appears to finish successfully,
but does not produce the result files, launch the power monitoring with the
`-c` flag. This will remove any semaphores leftover in shared memory.

$ powmon -c
$ var_monitor -c
$ power_wrapper_static -c
$ power_wrapper_dynamic -c
2 changes: 1 addition & 1 deletion src/powmon/common.c → src/var_monitor/common.c
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ void parse_json_power_obj(char *s, int num_sockets)
}

// If we're on a CPU-only build, we don't have num_gpus_per_socket.
// Powmon doesn't need to print this, but needs to know this value.
// var_monitor doesn't need to print this, but needs to know this value.
if (json_object_get(node_obj, "num_gpus_per_socket") != NULL)
{
num_gpus_per_socket = json_integer_value(json_object_get(node_obj,
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ int main(int argc, char **argv)
char hostname[64];
gethostname(hostname, 64);

rc = asprintf(&fname_dat, "%s.powmon.dat", hostname);
rc = asprintf(&fname_dat, "%s.var_monitor.dat", hostname);
if (rc == -1)
{
fprintf(stderr,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ int main(int argc, char **argv)
char hostname[64];
gethostname(hostname, 64);

rc = asprintf(&fname_dat, "%s.powmon.dat", hostname);
rc = asprintf(&fname_dat, "%s.var_monitor.dat", hostname);
if (rc == -1)
{
fprintf(stderr,
Expand Down
Loading

0 comments on commit 7a006bf

Please sign in to comment.