-
Notifications
You must be signed in to change notification settings - Fork 24
/
README
92 lines (66 loc) · 3.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
MSR-SAFE
========
The msr-safe.ko module is comprised of the following source files:
Makefile
msr_entry.c Original MSR driver with added calls to batch and
whitelist implementations.
msr_batch.[ch] MSR batching implementation
msr_whitelist.[ch] MSR whitelist implementation
whitelists Sample text whitelist that may be input to msr_safe
Kernel Build & Load
-------------------
Building and loading the msr-safe.ko module can be done with the commands
below. When no command line arguments are specified, the kernel will
dynamically assign major numbers to each device. A successful load of the
msr-safe kernel module will have `msr_batch` and `msr_whitelist` in `/dev/cpu`,
and will have an `msr_safe` present under each CPU directory in `/dev/cpu/*`.
$ git clone https://github.com/LLNL/msr-safe
$ cd msr-safe
$ make
$ insmod msr-safe.ko
Kernel Load with Command Line Arguments
---------------------------------------
Alternatively, this module can be loaded with command line arguments. The
arguments specify the major device number you want to associate with a
particular device. When loading the kernel, you can specify 1 or all 3 of the
msr devices.
$ insmod msr-safe.ko mdev_msr_safe=<#> \
mdev_msr_whitelist=<#> \
mdev_msr_batch=<#>
Configuration Notes After Install
---------------------------------
Setup permissions and group ownership for `/dev/cpu/msr_batch`,
`/dev/cpu/msr_whitelist`, and `/dev/cpu/*/msr_safe` as you like since the
whitelist will protect you from harm.
Sample whitelists for specific architectures are provided in `whitelists/`
directory. These are meant to be a starting point, and should be used with
caution. Each site may add to, remove from, or modify the write masks in the
whitelist depending on specific needs.
To configure whitelist:
cat whitelist/wl_file > /dev/cpu/msr_whitelist
Where `wl_file` can be determined as follows:
printf 'wl_%.2x%x\n' $(lscpu | grep "CPU family:" | awk -F: '{print $2}') $(lscpu | grep "Model:" | awk -F: '{print $2}')
To confirm successful whitelist configured:
cat /dev/cpu/msr_whitelist
To enumerate the current whitelist (i.e., implies whitelist was loaded
successfully):
cat < /dev/cpu/msr_whitelist
To remove whitelist (as root):
echo > /dev/cpu/msr_whitelist
msrsave
-------
The msrsave utility provides a mechanism for saving and restoring MSR values
based on entries in the whitelist. To restore MSR values, the register must
have an appropriate writemask.
Modification of MSRs that are marked as safe in the whitelist may impact
subsequent users on a shared HPC system. It is important the resource manager
on such a system use the msrsave utility to save and restore MSR values between
allocating compute nodes to users. An example of this has been implemented for
the SLURM resource manager as a SPANK plugin. This plugin can be built with the
"make spank" target and installed with the "make install-spank" target. This
uses the SLURM SPANK infrastructure to make a popen(3) call to the msrsave
command line utility in the job epilogue and prologue.
Release
-------
msr-safe is released under the GPLv3 license. For more details, please see the
[LICENSE](https://github.com/LLNL/msr-safe/blob/master/LICENSE) file.