# NRM python upstream client library tutorial

This tutorial covers the use of NRM's python upstream client library, in the context of running an external resource management strategy. Its cell's output are deterministic, and the executed version that is vendored in the source tree is checked by the project's CI, so its behavior should always be up-to-date with the latest version of the software. No cells should be throwing exceptions, as  

This notebook uses `nrm`'s python library bindings and needs the `nrmd` daemon in the `$PATH`. Assuming the project is cloned (and the code unmodified), one therefore needs to run the following from the root of the project before running it:

```bash
./shake.sh build
./shake.sh pyclient 
```

The next cell sets the working directory of the notebook at the root of the project.

In [1]:
%%capture
cd ..

In [2]:
%load_ext nb_black

<IPython.core.display.Javascript object>

The next cell imports the upstream client library. and configures hosts.

In [3]:
from nrm.tooling import Local, Remote, lib

<IPython.core.display.Javascript object>

This notebook will start `nrmd` on the same machine as the notebok, but the same interface should be available for remote execution:

In [4]:
host = Local()
# host=Remote( target="cc@129.114.108.201")

<IPython.core.display.Javascript object>

Note that the two classes (`Local` and `Remote`) offer the same methods to start nrmd and interact via blocking message passing primitives. The supported methods are the following:

In [5]:
import inspect

for a, x in inspect.getmembers(host, predicate=inspect.ismethod):
    print("%s: %s" % (a, x.__doc__))

__init__: None
check_daemon:  checks if nrmd is alive 
run_workload:  Runs a workload via NRM. The `nrmd` daemon must be running. 
start_daemon:  start nrmd 
stop_daemon:  stops nrmd 
workload_exit_status:  Check the workload's exit status. 
workload_finished:  Checks NRM to see whether all tasks are finished. 
workload_recv:  Receive a message from NRM's upstream API. 
workload_send:  Send a message to NRM's upstream API. 


<IPython.core.display.Javascript object>

The next cell defines some node (`nrmd`) daemon configurations as dictionaries:

In [6]:
daemonCfgs = {
    "redirected_log": {"logfile": "/tmp/logfile_experiment1"},
    # "other":'todo'
}

<IPython.core.display.Javascript object>

`nrmd`'s configuration can be defined in the json/yaml/[Dhall](https://dhall-lang.org/) formats. Admissible values are defined in file [resources/configurationSchema.json](./resources/configurationSchema.json), and alternatively available as a Dhall type in [resources/types/Cfg.dhall](resources/types/Configuration.dhall). Schema files get large, so the next cells shows the Dhall Configuration type as a more readable alternative.

In [7]:
%%script dhall resolve
./resources/defaults/Cfg.dhall

{ argo_nodeos_config =
    "argo_nodeos_config"
, argo_perf_wrapper =
    "nrm-perfwrapper"
, controlCfg =
    None
    { learnCfg :
        < KnapsackConstraints :
            { _1 : { _1 : Double } }
        | LagrangeConstraints :
            { _1 : { _1 : Double } }
        >
    , minimumControlInterval :
        { fromuS : Double }
    , referenceMeasurementRoundInterval :
        Integer
    , speedThreshold :
        Double
    }
, downstreamCfg =
    { downstreamBindAddress = "ipc:///tmp/nrm-downstream-event" }
, dummy =
    True
, hwloc =
    "hwloc"
, hwmonCfg =
    { hwmonEnabled = True, hwmonPath = "/sys/class/hwmon" }
, libnrmPath =
    None Text
, logfile =
    "/tmp/nrm.log"
, nodeos =
    False
, perf =
    "perf"
, pmpi_lib =
    "pmpi_lib"
, raplCfg =
    None { raplFrequency : { fromHz : Double }, raplPath : Text }
, singularity =
    False
, slice_runtime =
    < Dummy | Nodeos | Singularity >.Dummy
, upstreamCfg =
    { pubPort = +2345, rpcPort = +3456, upstreamBi

<IPython.core.display.Javascript object>

Optional values are filled using defaults that can be found in [resources/defaults/Cfg.json](resources/defaults/Configuration.json) (also available in the Dhall format):

In [8]:
%%bash
cat ./resources/defaults/Cfg.json | jq

{
  "pmpi_lib": "pmpi_lib",
  "verbose": "error",
  "logfile": "/tmp/nrm.log",
  "singularity": false,
  "argo_nodeos_config": "argo_nodeos_config",
  "upstreamCfg": {
    "upstreamBindAddress": "*",
    "rpcPort": 3456,
    "pubPort": 2345
  },
  "perf": "perf",
  "argo_perf_wrapper": "nrm-perfwrapper",
  "downstreamCfg": {
    "downstreamBindAddress": "ipc:///tmp/nrm-downstream-event"
  },
  "nodeos": false,
  "hwloc": "hwloc",
  "dummy": true,
  "slice_runtime": "dummy",
  "hwmonCfg": {
    "hwmonPath": "/sys/class/hwmon",
    "hwmonEnabled": true
  }
}


<IPython.core.display.Javascript object>

A workload need a command, some arguments, and a manifest, also represented as a python dictionary.

In [9]:
workloads = {
    "dummy": [
        {
            "cmd": "sleep",
            "args": ["5"],
            "sliceID": "toto",
            "manifest": {"app": {"slice": {"cpus": 1, "mems": 1}}, "name": "default"},
        }
    ],
    # "other":todo
}

<IPython.core.display.Javascript object>

Example manifest files are in [resources/examples](../resources/examples) in JSON/YAML/Dhall format. For instance, the manifest file [resources/examples/perfwrap.json](../resources/examples/perfwrap.json) enables enables performance monitoring:

In [10]:
%%bash
cat resources/examples/perfwrap.json | jq

{
  "hwbind": false,
  "app": {
    "scheduler": {
      "fIFO": {}
    },
    "power": {
      "slowdown": 1,
      "profile": false,
      "policy": "noPowerPolicy"
    },
    "perfwrapper": {
      "perfwrapper": {
        "perfLimit": 100000,
        "perfFreq": 1
      }
    },
    "slice": {
      "cpus": 1,
      "mems": 1
    }
  },
  "name": "default"
}


<IPython.core.display.Javascript object>

Manifest options are documented in schema file [resources/manifestSchema.json](../resources/manifestSchema.json). The next cell shows the corresponding [Dhall](https://dhall-lang.org/) type.

In [11]:
%%script dhall resolve
./resources/types/Manifest.dhall

{ app :
    { instrumentation :
        Optional { ratelimit : { fromHz : Double } }
    , perfwrapper :
        < Perfwrapper :
            { _1 :
                { perfFreq :
                    { fromHz : Double }
                , perfLimit :
                    { fromOps : Integer }
                }
            }
        | PerfwrapperDisabled
        >
    , power :
        { policy :
            < Combined | DDCM | DVFS | NoPowerPolicy >
        , profile :
            Bool
        , slowdown :
            Integer
        }
    , scheduler :
        < FIFO | HPC | Other : { _1 : Integer } >
    , slice :
        { cpus : Integer, mems : Integer }
    }
, hwbind :
    Bool
, image :
    Optional
    { binds : Optional (List Text), magetype : < Docker | Sif >, path : Text }
, name :
    Text
}


<IPython.core.display.Javascript object>

Under-specified manifests like the one in our `workloads` above (with missing optional fields from the schema) fill missing values with defaults, which are located in file [resources/defaults/Manifest.json](../../resources/examples/default.json):

In [12]:
%%bash
cat resources/defaults/Manifest.json | jq

{
  "hwbind": false,
  "app": {
    "scheduler": {
      "fIFO": {}
    },
    "power": {
      "slowdown": 1,
      "profile": false,
      "policy": "noPowerPolicy"
    },
    "perfwrapper": {
      "perfwrapperDisabled": {}
    },
    "slice": {
      "cpus": 1,
      "mems": 1
    }
  },
  "name": "default"
}


<IPython.core.display.Javascript object>

The `dhall` and `dhall-to-json` utilities are available as convenience in this environment should you need them. Dhall is useful as a configuration language in itself.

In [32]:
%%script dhall-to-json
let Manifest = ./resources/types/Manifest.dhall 
let appendName = \(m: Manifest) -> m // {name = m.name ++ "-appended" }
in appendName ./resources/defaults/Manifest.dhall

{"image":null,"hwbind":false,"app":{"scheduler":"FIFO","instrumentation":null,"power":{"slowdown":1,"profile":false,"policy":"NoPowerPolicy"},"perfwrapper":"PerfwrapperDisabled","slice":{"cpus":1,"mems":1}},"name":"default-appended"}


<IPython.core.display.Javascript object>

Remember that any json document is one little step away from being a Python dictionaryy:

In [33]:
import json

with open("resources/defaults/Manifest.json") as f:
    print(json.load(f))

{'hwbind': False, 'app': {'scheduler': {'fIFO': {}}, 'power': {'slowdown': 1, 'profile': False, 'policy': 'noPowerPolicy'}, 'perfwrapper': {'perfwrapperDisabled': {}}, 'slice': {'cpus': 1, 'mems': 1}}, 'name': 'default'}


<IPython.core.display.Javascript object>

The next cell defines some experiments:

In [34]:
experiments = {
    "example": (daemonCfgs["redirected_log"], workloads["dummy"]),
    # "other": todo
}

<IPython.core.display.Javascript object>

The next two cells show how to start and stop the daemon. A failure in either of them indicates a problem with NRM's setup.

In [35]:
host.start_daemon(daemonCfgs["redirected_log"])
assert host.check_daemon()

<IPython.core.display.Javascript object>

In [36]:
host.stop_daemon()
assert host.check_daemon() == False

<IPython.core.display.Javascript object>

We now are ready to run an external resource management strategy. Using the low-level message passing interface:

In [29]:
for name, (daemonCfg, workload) in experiments.items():
    # starting the daemon (does act as silent restart)
    host.start_daemon(daemonCfg)
    # running the workload
    host.run_workload(workload)
    # message passtxing exchange:
    while not host.workload_finished():
        measurement_message = host.workload_recv()
        command_message = "insert your code here"
        host.workload_send(command_message)
    print(host.workload_exit_status())
    print(host.check_daemon())

host.stop_daemon()

KeyboardInterrupt: 

<IPython.core.display.Javascript object>