MLPerf Common - a collection of common MLPerf tools

MLPerf Logging

MLPerf common can be installed via pip install by adding the following line to the requirements.txt file:

git+https://github.com/NVIDIA/mlperf-common.git

Integration using torch.distributed (pytorch)

In mlperf_logger.py module define:

from mlperf_common.logging import MLLoggerWrapper
from mlperf_common.frameworks.pyt import PyTCommunicationHandler

mllogger = MLLoggerWrapper(PyTCommunicationHandler(), value=None)

Then use mllogger by importing from mlperf_logger import mllogger in other modules.

Integration using MPI (horovod/hugectr/mxnet/tensorflow)

In mlperf_logger.py global module define:

from mlperf_common.logging import MLLoggerWrapper
from mlperf_common.frameworks.mxnet import MPICommunicationHandler

mllogger = MLLoggerWrapper(MPICommunicationHandler(), value=None)

Then use mllogger by importing from mlperf_logger import mllogger in other modules.

Optionally, you can pass an MPI communicator during the initialization of MPICommunicationHandler().

comm = MPI.COMM_WORLD
mllogger = MLLoggerWrapper(MPICommunicationHandler(comm), value=None)

by default, MPICommunicationHandler() creates a global communicator.

Logging additional metrics

MLPerf logger can be used to track additional non-required metric, for example throughput. The recommended way is to add a line such as:

mllogger.event(key='tracked_stats', metadata={'step': epoch}, value={"throughput": throughput, "metric_a": metric_a, 'metric_b': metric_b})

where throughput is recommended to be samples per second, logged every epoch or as often as it is reasonable for a given benchmark. Additional metrics, metric_a and metric_b, can represent any numerical value that requires logging. The key tracked_stats and an increasing value for step are required.

Scaleout Bridge

init_bridge

Instead of previous sbridge = init_bridge(rank), initialize sbridge as follows:

from mlperf_common.frameworks.pyt import PyTNVTXHandler, PyTCommunicationHandler

sbridge = init_bridge(PyTNVTXHandler(), PyTCommunicationHandler(), mllogger)

or, for horovod/tf/mxnet:

from mlperf_common.frameworks.mxnet import MXNetNVTXHandler, MPICommunicationHandler

sbridge = init_bridge(MXNetNVTXHandler(), MPICommunicationHandler(), mllogger)

and start your profiling as usual

sbridge.start_prof()
sbridge.stop_prof()

EmptyObject

Current ScaleoutBridgeBase class replaces previous EmptyObject class, so just replace EmptyObject() with ScaleoutBridgeBase().

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
client		client
mlperf_common		mlperf_common
src		src
utils		utils
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client

client

mlperf_common

mlperf_common

src

src

utils

utils

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

MLPerf Common - a collection of common MLPerf tools

MLPerf Logging

Integration using torch.distributed (pytorch)

Integration using MPI (horovod/hugectr/mxnet/tensorflow)

Logging additional metrics

Scaleout Bridge

init_bridge

EmptyObject

About

Releases 1

Packages

Contributors 6

Languages

License

NVIDIA/mlperf-common

Folders and files

Latest commit

History

Repository files navigation

MLPerf Common - a collection of common MLPerf tools

MLPerf Logging

Integration using torch.distributed (pytorch)

Integration using MPI (horovod/hugectr/mxnet/tensorflow)

Logging additional metrics

Scaleout Bridge

init_bridge

EmptyObject

About

Resources

License

Stars

Watchers

Forks

Languages