Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Modifications to pam_slurm_adopt to work correctly for the "extern" s…
…tep.
  • Loading branch information
ryanbcox authored and dannyauble committed Nov 3, 2015
1 parent 05f5369 commit ca68297
Show file tree
Hide file tree
Showing 3 changed files with 372 additions and 149 deletions.
1 change: 1 addition & 0 deletions NEWS
Expand Up @@ -41,6 +41,7 @@ documents those changes that are of interest to users and administrators.
-- MYSQL - Fix rollups for multiple jobs running by the same association
in an hour counting multiple times.
-- Burstbuffer/Cray plugin - Fix for persistent burst buffer use.
-- Modifications to pam_slurm_adopt to work correctly for the "extern" step.

* Changes in Slurm 15.08.2
==========================
Expand Down
91 changes: 65 additions & 26 deletions contribs/pam_slurm_adopt/README
Expand Up @@ -14,7 +14,7 @@ DESCRIPTION
This module attempts to determine the job which originated this connection.
The module is configurable; these are the default steps:

1) Check the local stepds for a count of jobs owned by the non-root user
1) Check the local stepd for a count of jobs owned by the non-root user
a) If none, deny (option action_no_jobs)
b) If only one, adopt the process into that job
c) If multiple, continue
Expand All @@ -38,36 +38,72 @@ This module has the following options (* = default):
a service or similar, it will be tracked and killed by Slurm
when the job exits. This sounds bad because it is bad.

1* = let the connection through without adoption
1* = Let the connection through without adoption
0 = I am crazy. I want random services to die when root jobs exit. I
also like it when RPC calls block for a while then time out.


action_no_jobs - The action to perform if the user has no jobs on the node

ignore = let the connection through without adoption
deny* = deny the connection
ignore = Do nothing. Fall through to the next pam module
deny* = Deny the connection


action_unknown - The action to perform when the RPC call does not locate the
source job and the user has multiple jobs on the node to
choose from
action_unknown - The action to perform when the user has multiple jobs on
the node *and* the RPC call does not locate the source job.
If the RPC mechanism works properly in your environment,
this option will likely be relevant *only* when connecting
from a login node.

any* = pick a job in a (somewhat) random fashion
ignore = let the connection through without adoption
deny = deny the connection
newest* = Pick the newest job on the node. The "newest" job is chosen
based on the mtime of the job's step_extern cgroup; asking
Slurm would require an RPC to the controller. The user can ssh
in but may be adopted into a job that exits earlier than the
job they intended to check on. The ssh connection will at
least be subject to appropriate limits and the user can be
informed of better ways to accomplish their objectives if this
becomes a problem
user = Use the /slurm/uid_$UID cgroups. Not all cgroups set
appropriate limits at this level so this may not be very
effective. Additionally, job accounting at this level is
impossible as is automatic cleanup of stray processes when the
job exits. This settings is not recommended.
allow = Let the connection through without adoption
deny = Deny the connection


action_adopt_failure - The action to perform if the job is unable to be
adopted into a job for whatever reason
action_adopt_failure - The action to perform if the process is unable to be
adopted into an identified job for whatever reason

ignore = let the connection through without adoption
deny* = deny the connection
allow* = Let the connection through without adoption
deny = Deny the connection

action_generic_failure - The action to perform it there certain failures
such as inability to talk to the local slurmd or
if the kernel doesn't offer the correct facilities

ignore* = Do nothing. Fall through to the next pam module
allow = Let the connection through without adoption
deny = Deny the connection

log_level - See SlurmdDebug in slurm.conf(5) for available options. The
default log_level is info.

SLURM.CONF CONFIGURATION
For best results, all relevant cgroups plugins (e.g. proctrack/cgroup) should
be enabled in slurm.conf. At least one must be enabled for this module to be
even somewhat useful.

PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
into which ssh-launched processes will be adopted.

**** IMPORTANT ****
PrologFlags=contain must be in place *before* using this module.
The module bases its checks on local steps that have already been launched. If
the user has no steps on the node, such as the extern step, the module will
assume that the user has no jobs allocated to the node. Depending on your
configuration of the pam module, you might deny *all* user ssh attempts.

NOTES
This module and the related RPC call currently support Linux systems which
have network connection information available through /proc/net/tcp{,6}. A
Expand All @@ -79,31 +115,34 @@ NOTES
Slurm is tracking.

IPv6 is supported by the RPC data structure itself and the code which sends it
or receives it. Sending the RPC call to an IPv6 address is not currently
and receives it. Sending the RPC call to an IPv6 address is not currently
supported by Slurm. Once support is added, remove the relevant check in
slurm_network_callerid ().
slurm_network_callerid().

proctrack/cgroup is recommended on Linux.
One future action_unknown idea is an option to pick the job with the longest
time remaining. This is not yet implemented.

FIREWALLS, IP ADDRESSES, ETC.
slurmd should be accessible on any IP address that a user might launch ssh.
The RPC call to determine the source job must be able to reach the slurmd port
on that particular IP address.
slurmd should be accessible on any IP address from which a user might launch
ssh. The RPC call to determine the source job must be able to reach the slurmd
port on that particular IP address.

If there is no slurmd on the source node, it is better to have the RPC call be
rejected rather than silently dropped. This will allow better responsiveness
to the RPC initiator.
If there is no slurmd on the source node, such as on a login node, it is
better to have the RPC call be rejected rather than silently dropped. This
will allow better responsiveness to the RPC initiator.

EXAMPLES / SUGGESTED USAGE
Use of this module is recommended on any compute node.

Add the following line to the appropriate file in /etc/pam.d, such as
system-auth or sshd:

account required pam_slurm_adopt.so
account sufficient pam_slurm_adopt.so

If you always want to allow access for an administrative group (eg, wheel),
stack the pam_access module ahead of pam_slurm:
If you always want to allow access for an administrative group (e.g. wheel),
stack the pam_access module after pam_slurm_adopt. A success with
pam_slurm_adopt is sufficient to allow access but the pam_access module can
allow others, such as staff, access even without jobs.

account sufficient pam_slurm_adopt.so
account required pam_access.so
Expand Down

0 comments on commit ca68297

Please sign in to comment.