@@ -14,7 +14,7 @@ DESCRIPTION
14
14
This module attempts to determine the job which originated this connection.
15
15
The module is configurable; these are the default steps:
16
16
17
- 1) Check the local stepds for a count of jobs owned by the non-root user
17
+ 1) Check the local stepd for a count of jobs owned by the non-root user
18
18
a) If none, deny (option action_no_jobs)
19
19
b) If only one, adopt the process into that job
20
20
c) If multiple, continue
@@ -38,36 +38,72 @@ This module has the following options (* = default):
38
38
a service or similar, it will be tracked and killed by Slurm
39
39
when the job exits. This sounds bad because it is bad.
40
40
41
- 1* = let the connection through without adoption
41
+ 1* = Let the connection through without adoption
42
42
0 = I am crazy. I want random services to die when root jobs exit. I
43
43
also like it when RPC calls block for a while then time out.
44
44
45
45
46
46
action_no_jobs - The action to perform if the user has no jobs on the node
47
47
48
- ignore = let the connection through without adoption
49
- deny* = deny the connection
48
+ ignore = Do nothing. Fall through to the next pam module
49
+ deny* = Deny the connection
50
50
51
51
52
- action_unknown - The action to perform when the RPC call does not locate the
53
- source job and the user has multiple jobs on the node to
54
- choose from
52
+ action_unknown - The action to perform when the user has multiple jobs on
53
+ the node *and* the RPC call does not locate the source job.
54
+ If the RPC mechanism works properly in your environment,
55
+ this option will likely be relevant *only* when connecting
56
+ from a login node.
55
57
56
- any* = pick a job in a (somewhat) random fashion
57
- ignore = let the connection through without adoption
58
- deny = deny the connection
58
+ newest* = Pick the newest job on the node. The "newest" job is chosen
59
+ based on the mtime of the job's step_extern cgroup; asking
60
+ Slurm would require an RPC to the controller. The user can ssh
61
+ in but may be adopted into a job that exits earlier than the
62
+ job they intended to check on. The ssh connection will at
63
+ least be subject to appropriate limits and the user can be
64
+ informed of better ways to accomplish their objectives if this
65
+ becomes a problem
66
+ user = Use the /slurm/uid_$UID cgroups. Not all cgroups set
67
+ appropriate limits at this level so this may not be very
68
+ effective. Additionally, job accounting at this level is
69
+ impossible as is automatic cleanup of stray processes when the
70
+ job exits. This settings is not recommended.
71
+ allow = Let the connection through without adoption
72
+ deny = Deny the connection
59
73
60
74
61
- action_adopt_failure - The action to perform if the job is unable to be
62
- adopted into a job for whatever reason
75
+ action_adopt_failure - The action to perform if the process is unable to be
76
+ adopted into an identified job for whatever reason
63
77
64
- ignore = let the connection through without adoption
65
- deny* = deny the connection
78
+ allow* = Let the connection through without adoption
79
+ deny = Deny the connection
66
80
81
+ action_generic_failure - The action to perform it there certain failures
82
+ such as inability to talk to the local slurmd or
83
+ if the kernel doesn't offer the correct facilities
84
+
85
+ ignore* = Do nothing. Fall through to the next pam module
86
+ allow = Let the connection through without adoption
87
+ deny = Deny the connection
67
88
68
89
log_level - See SlurmdDebug in slurm.conf(5) for available options. The
69
90
default log_level is info.
70
91
92
+ SLURM.CONF CONFIGURATION
93
+ For best results, all relevant cgroups plugins (e.g. proctrack/cgroup) should
94
+ be enabled in slurm.conf. At least one must be enabled for this module to be
95
+ even somewhat useful.
96
+
97
+ PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
98
+ into which ssh-launched processes will be adopted.
99
+
100
+ **** IMPORTANT ****
101
+ PrologFlags=contain must be in place *before* using this module.
102
+ The module bases its checks on local steps that have already been launched. If
103
+ the user has no steps on the node, such as the extern step, the module will
104
+ assume that the user has no jobs allocated to the node. Depending on your
105
+ configuration of the pam module, you might deny *all* user ssh attempts.
106
+
71
107
NOTES
72
108
This module and the related RPC call currently support Linux systems which
73
109
have network connection information available through /proc/net/tcp{,6}. A
@@ -79,31 +115,34 @@ NOTES
79
115
Slurm is tracking.
80
116
81
117
IPv6 is supported by the RPC data structure itself and the code which sends it
82
- or receives it. Sending the RPC call to an IPv6 address is not currently
118
+ and receives it. Sending the RPC call to an IPv6 address is not currently
83
119
supported by Slurm. Once support is added, remove the relevant check in
84
- slurm_network_callerid ().
120
+ slurm_network_callerid().
85
121
86
- proctrack/cgroup is recommended on Linux.
122
+ One future action_unknown idea is an option to pick the job with the longest
123
+ time remaining. This is not yet implemented.
87
124
88
125
FIREWALLS, IP ADDRESSES, ETC.
89
- slurmd should be accessible on any IP address that a user might launch ssh.
90
- The RPC call to determine the source job must be able to reach the slurmd port
91
- on that particular IP address.
126
+ slurmd should be accessible on any IP address from which a user might launch
127
+ ssh. The RPC call to determine the source job must be able to reach the slurmd
128
+ port on that particular IP address.
92
129
93
- If there is no slurmd on the source node, it is better to have the RPC call be
94
- rejected rather than silently dropped. This will allow better responsiveness
95
- to the RPC initiator.
130
+ If there is no slurmd on the source node, such as on a login node, it is
131
+ better to have the RPC call be rejected rather than silently dropped. This
132
+ will allow better responsiveness to the RPC initiator.
96
133
97
134
EXAMPLES / SUGGESTED USAGE
98
135
Use of this module is recommended on any compute node.
99
136
100
137
Add the following line to the appropriate file in /etc/pam.d, such as
101
138
system-auth or sshd:
102
139
103
- account required pam_slurm_adopt.so
140
+ account sufficient pam_slurm_adopt.so
104
141
105
- If you always want to allow access for an administrative group (eg, wheel),
106
- stack the pam_access module ahead of pam_slurm:
142
+ If you always want to allow access for an administrative group (e.g. wheel),
143
+ stack the pam_access module after pam_slurm_adopt. A success with
144
+ pam_slurm_adopt is sufficient to allow access but the pam_access module can
145
+ allow others, such as staff, access even without jobs.
107
146
108
147
account sufficient pam_slurm_adopt.so
109
148
account required pam_access.so
0 commit comments