Allows safer access to model specific registers (MSRs)
Switch branches/tags
Clone or download
dianarg and slabasan Merge pull request #51: Update default writemask for sticky bits of
THERM_STATUS registers (0x1B1 and 0x19C). This fixes #50. Remove trailing
whitespace in whitelists.

Signed-off-by: Diana Guttman <diana.r.guttman@intel.com>
Latest commit 1891a63 Dec 10, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
msrsave Merge pull request #47: In 'msrsave -r FILE', if writing an MSR value… Nov 2, 2018
rpm Add After=network.target to systemd unit. Oct 16, 2018
upstream_patch_proposals Added description section to each patch file. Jul 30, 2015
whitelists Merge pull request #51: Update default writemask for sticky bits of Dec 10, 2018
.gitignore Adding slurm spank plugin to enable MSR save restore. Sep 21, 2018
.travis.yml Test msr-safe against multiple OS. Nov 15, 2018
LICENSE - Updating license, adding license header to top of each file Sep 5, 2017
Makefile Adding slurm spank plugin to enable MSR save restore. Sep 21, 2018
README Enable passing of command line parameters to the module. This fixes #45 Nov 15, 2018
THANKS 1. Restored some missing files. Jul 16, 2015
msr-smp.c - Updating license, adding license header to top of each file Sep 5, 2017
msr_batch.c Enable passing of command line parameters to the module. This fixes #45 Nov 15, 2018
msr_batch.h Enable passing of command line parameters to the module. This fixes #45 Nov 15, 2018
msr_entry.c Enable passing of command line parameters to the module. This fixes #45 Nov 15, 2018
msr_safe.h - Updating license, adding license header to top of each file Sep 5, 2017
msr_whitelist.c Enable passing of command line parameters to the module. This fixes #45 Nov 15, 2018
msr_whitelist.h Enable passing of command line parameters to the module. This fixes #45 Nov 15, 2018

README

MSR-SAFE
========

The msr-safe.ko module is comprised of the following source files:

    Makefile
    msr_entry.c         Original MSR driver with added calls to batch and
                        whitelist implementations.
    msr_batch.[ch]      MSR batching implementation
    msr_whitelist.[ch]  MSR whitelist implementation
    whitelists          Sample text whitelist that may be input to msr_safe

Kernel Build & Load
-------------------

Building and loading the msr-safe.ko module can be done with the commands
below. When no command line arguments are specified, the kernel will
dynamically assign major numbers to each device. A successful load of the
msr-safe kernel module will have `msr_batch` and `msr_whitelist` in `/dev/cpu`,
and will have an `msr_safe` present under each CPU directory in `/dev/cpu/*`.

    $ git clone https://github.com/LLNL/msr-safe
    $ cd msr-safe
    $ make
    $ insmod msr-safe.ko

Kernel Load with Command Line Arguments
---------------------------------------

Alternatively, this module can be loaded with command line arguments. The
arguments specify the major device number you want to associate with a
particular device. When loading the kernel, you can specify 1 or all 3 of the
msr devices.

    $ insmod msr-safe.ko mdev_msr_safe=<#> \
                         mdev_msr_whitelist=<#> \
                         mdev_msr_batch=<#>

Configuration Notes After Install
---------------------------------

Setup permissions and group ownership for `/dev/cpu/msr_batch`,
`/dev/cpu/msr_whitelist`, and `/dev/cpu/*/msr_safe` as you like since the
whitelist will protect you from harm.

Sample whitelists for specific architectures are provided in `whitelists/`
directory. These are meant to be a starting point, and should be used with
caution. Each site may add to, remove from, or modify the write masks in the
whitelist depending on specific needs.

To configure whitelist:

    cat whitelist/wl_file > /dev/cpu/msr_whitelist

Where `wl_file` can be determined as follows:

    printf 'wl_%.2x%x\n' $(lscpu | grep "CPU family:" | awk -F: '{print $2}') $(lscpu | grep "Model:" | awk -F: '{print $2}')

To confirm successful whitelist configured:

    cat /dev/cpu/msr_whitelist

To enumerate the current whitelist (i.e., implies whitelist was loaded
successfully):

    cat < /dev/cpu/msr_whitelist

To remove whitelist (as root):

    echo > /dev/cpu/msr_whitelist

msrsave
-------

The msrsave utility provides a mechanism for saving and restoring MSR values
based on entries in the whitelist. To restore MSR values, the register must
have an appropriate writemask.

Modification of MSRs that are marked as safe in the whitelist may impact
subsequent users on a shared HPC system. It is important the resource manager
on such a system use the msrsave utility to save and restore MSR values between
allocating compute nodes to users. An example of this has been implemented for
the SLURM resource manager as a SPANK plugin. This plugin can be built with the
"make spank" target and installed with the "make install-spank" target. This
uses the SLURM SPANK infrastructure to make a popen(3) call to the msrsave
command line utility in the job epilogue and prologue.

Release
-------

msr-safe is released under the GPLv3 license. For more details, please see the
[LICENSE](https://github.com/LLNL/msr-safe/blob/master/LICENSE) file.