Skip to content

Commit ca68297

Browse files
ryanbcoxdannyauble
authored andcommitted
Modifications to pam_slurm_adopt to work correctly for the "extern" step.
1 parent 05f5369 commit ca68297

File tree

3 files changed

+372
-149
lines changed

3 files changed

+372
-149
lines changed

NEWS

+1
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ documents those changes that are of interest to users and administrators.
4141
-- MYSQL - Fix rollups for multiple jobs running by the same association
4242
in an hour counting multiple times.
4343
-- Burstbuffer/Cray plugin - Fix for persistent burst buffer use.
44+
-- Modifications to pam_slurm_adopt to work correctly for the "extern" step.
4445

4546
* Changes in Slurm 15.08.2
4647
==========================

contribs/pam_slurm_adopt/README

+65-26
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ DESCRIPTION
1414
This module attempts to determine the job which originated this connection.
1515
The module is configurable; these are the default steps:
1616

17-
1) Check the local stepds for a count of jobs owned by the non-root user
17+
1) Check the local stepd for a count of jobs owned by the non-root user
1818
a) If none, deny (option action_no_jobs)
1919
b) If only one, adopt the process into that job
2020
c) If multiple, continue
@@ -38,36 +38,72 @@ This module has the following options (* = default):
3838
a service or similar, it will be tracked and killed by Slurm
3939
when the job exits. This sounds bad because it is bad.
4040

41-
1* = let the connection through without adoption
41+
1* = Let the connection through without adoption
4242
0 = I am crazy. I want random services to die when root jobs exit. I
4343
also like it when RPC calls block for a while then time out.
4444

4545

4646
action_no_jobs - The action to perform if the user has no jobs on the node
4747

48-
ignore = let the connection through without adoption
49-
deny* = deny the connection
48+
ignore = Do nothing. Fall through to the next pam module
49+
deny* = Deny the connection
5050

5151

52-
action_unknown - The action to perform when the RPC call does not locate the
53-
source job and the user has multiple jobs on the node to
54-
choose from
52+
action_unknown - The action to perform when the user has multiple jobs on
53+
the node *and* the RPC call does not locate the source job.
54+
If the RPC mechanism works properly in your environment,
55+
this option will likely be relevant *only* when connecting
56+
from a login node.
5557

56-
any* = pick a job in a (somewhat) random fashion
57-
ignore = let the connection through without adoption
58-
deny = deny the connection
58+
newest* = Pick the newest job on the node. The "newest" job is chosen
59+
based on the mtime of the job's step_extern cgroup; asking
60+
Slurm would require an RPC to the controller. The user can ssh
61+
in but may be adopted into a job that exits earlier than the
62+
job they intended to check on. The ssh connection will at
63+
least be subject to appropriate limits and the user can be
64+
informed of better ways to accomplish their objectives if this
65+
becomes a problem
66+
user = Use the /slurm/uid_$UID cgroups. Not all cgroups set
67+
appropriate limits at this level so this may not be very
68+
effective. Additionally, job accounting at this level is
69+
impossible as is automatic cleanup of stray processes when the
70+
job exits. This settings is not recommended.
71+
allow = Let the connection through without adoption
72+
deny = Deny the connection
5973

6074

61-
action_adopt_failure - The action to perform if the job is unable to be
62-
adopted into a job for whatever reason
75+
action_adopt_failure - The action to perform if the process is unable to be
76+
adopted into an identified job for whatever reason
6377

64-
ignore = let the connection through without adoption
65-
deny* = deny the connection
78+
allow* = Let the connection through without adoption
79+
deny = Deny the connection
6680

81+
action_generic_failure - The action to perform it there certain failures
82+
such as inability to talk to the local slurmd or
83+
if the kernel doesn't offer the correct facilities
84+
85+
ignore* = Do nothing. Fall through to the next pam module
86+
allow = Let the connection through without adoption
87+
deny = Deny the connection
6788

6889
log_level - See SlurmdDebug in slurm.conf(5) for available options. The
6990
default log_level is info.
7091

92+
SLURM.CONF CONFIGURATION
93+
For best results, all relevant cgroups plugins (e.g. proctrack/cgroup) should
94+
be enabled in slurm.conf. At least one must be enabled for this module to be
95+
even somewhat useful.
96+
97+
PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
98+
into which ssh-launched processes will be adopted.
99+
100+
**** IMPORTANT ****
101+
PrologFlags=contain must be in place *before* using this module.
102+
The module bases its checks on local steps that have already been launched. If
103+
the user has no steps on the node, such as the extern step, the module will
104+
assume that the user has no jobs allocated to the node. Depending on your
105+
configuration of the pam module, you might deny *all* user ssh attempts.
106+
71107
NOTES
72108
This module and the related RPC call currently support Linux systems which
73109
have network connection information available through /proc/net/tcp{,6}. A
@@ -79,31 +115,34 @@ NOTES
79115
Slurm is tracking.
80116

81117
IPv6 is supported by the RPC data structure itself and the code which sends it
82-
or receives it. Sending the RPC call to an IPv6 address is not currently
118+
and receives it. Sending the RPC call to an IPv6 address is not currently
83119
supported by Slurm. Once support is added, remove the relevant check in
84-
slurm_network_callerid ().
120+
slurm_network_callerid().
85121

86-
proctrack/cgroup is recommended on Linux.
122+
One future action_unknown idea is an option to pick the job with the longest
123+
time remaining. This is not yet implemented.
87124

88125
FIREWALLS, IP ADDRESSES, ETC.
89-
slurmd should be accessible on any IP address that a user might launch ssh.
90-
The RPC call to determine the source job must be able to reach the slurmd port
91-
on that particular IP address.
126+
slurmd should be accessible on any IP address from which a user might launch
127+
ssh. The RPC call to determine the source job must be able to reach the slurmd
128+
port on that particular IP address.
92129

93-
If there is no slurmd on the source node, it is better to have the RPC call be
94-
rejected rather than silently dropped. This will allow better responsiveness
95-
to the RPC initiator.
130+
If there is no slurmd on the source node, such as on a login node, it is
131+
better to have the RPC call be rejected rather than silently dropped. This
132+
will allow better responsiveness to the RPC initiator.
96133

97134
EXAMPLES / SUGGESTED USAGE
98135
Use of this module is recommended on any compute node.
99136

100137
Add the following line to the appropriate file in /etc/pam.d, such as
101138
system-auth or sshd:
102139

103-
account required pam_slurm_adopt.so
140+
account sufficient pam_slurm_adopt.so
104141

105-
If you always want to allow access for an administrative group (eg, wheel),
106-
stack the pam_access module ahead of pam_slurm:
142+
If you always want to allow access for an administrative group (e.g. wheel),
143+
stack the pam_access module after pam_slurm_adopt. A success with
144+
pam_slurm_adopt is sufficient to allow access but the pam_access module can
145+
allow others, such as staff, access even without jobs.
107146

108147
account sufficient pam_slurm_adopt.so
109148
account required pam_access.so

0 commit comments

Comments
 (0)