Permalink
Switch branches/tags
start slurm-18-08-0-0pre2 slurm-18-08-0-0pre1 slurm-17-11-9-2 slurm-17-11-9-1 slurm-17-11-8-1 slurm-17-11-7-1 slurm-17-11-6-1 slurm-17-11-5-1 slurm-17-11-4-1 slurm-17-11-3-2 slurm-17-11-3-1 slurm-17-11-2-1 slurm-17-11-1-2 slurm-17-11-1-1 slurm-17-11-0-1 slurm-17-11-0-0rc3 slurm-17-11-0-0rc2 slurm-17-11-0-0rc1 slurm-17-11-0-0pre2 slurm-17-11-0-0pre1 slurm-17-02-11-1 slurm-17-02-10-1 slurm-17-02-9-1 slurm-17-02-8-1 slurm-17-02-7-1 slurm-17-02-6-1 slurm-17-02-5-1 slurm-17-02-4-1 slurm-17-02-3-1 slurm-17-02-2-1 slurm-17-02-1-2 slurm-17-02-1-1 slurm-17-02-0-1 slurm-17-02-0-0rc1 slurm-17-02-0-0pre4 slurm-17-02-0-0pre3 slurm-17-02-0-0pre2 slurm-17-02-0-0pre1 slurm-16-05-11-1 slurm-16-05-10-2 slurm-16-05-10-1 slurm-16-05-9-1 slurm-16-05-8-1 slurm-16-05-7-1 slurm-16-05-6-1 slurm-16-05-5-1 slurm-16-05-4-1 slurm-16-05-3-1 slurm-16-05-2-1 slurm-16-05-1-1 slurm-16-05-0-1 slurm-16-05-0-0rc2 slurm-16-05-0-0rc1 slurm-16-05-0-0pre2 slurm-16-05-0-0pre1 slurm-15-08-13-1 slurm-15-08-12-1 slurm-15-08-11-1 slurm-15-08-10-1 slurm-15-08-9-1 slurm-15-08-8-1 slurm-15-08-7-1 slurm-15-08-6-1 slurm-15-08-5-1 slurm-15-08-4-1 slurm-15-08-3-1 slurm-15-08-2-1 slurm-15-08-1-1 slurm-15-08-0-1 slurm-15-08-0-0rc1 slurm-15-08-0-0pre6 slurm-15-08-0-0pre5 slurm-15-08-0-0pre4 slurm-15-08-0-0pre3 slurm-15-08-0-0pre2 slurm-15-08-0-0pre1 slurm-14-11-11-1 slurm-14-11-10-1 slurm-14-11-9-1 slurm-14-11-8-1 slurm-14-11-7-1 slurm-14-11-6-1 slurm-14-11-5-1 slurm-14-11-4-1 slurm-14-11-3-1 slurm-14-11-2-1 slurm-14-11-1-1 slurm-14-11-0-1 slurm-14-11-0-0rc3 slurm-14-11-0-0rc2 slurm-14-11-0-0rc1 slurm-14-11-0-0pre5 slurm-14-11-0-0pre4 slurm-14-11-0-0pre3 slurm-14-11-0-0pre2 slurm-14-11-0-0pre1 slurm-14-03-11-1 slurm-14-03-10-1 slurm-14-03-9-1
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
167 lines (125 sloc) 7.18 KB
NAME
pam_slurm_adopt.so
SYNOPSIS
Adopt incoming connections into jobs
AUTHOR
Ryan Cox <ryan_cox@byu.edu>
MODULE TYPES PROVIDED
account
DESCRIPTION
This module attempts to determine the job which originated this connection.
The module is configurable; these are the default steps:
1) Check the local stepd for a count of jobs owned by the non-root user
a) If none, deny (option action_no_jobs)
b) If only one, adopt the process into that job
c) If multiple, continue
2) Determine src/dst IP/port of socket
3) Issue callerid RPC to slurmd at IP address of source
a) If the remote slurmd can identify the source job, adopt into that job
b) If not, continue
4) Pick a random local job from the user to adopt into (option action_unknown)
Jobs are adopted into a job's allocation step.
MODULE OPTIONS
This module has the following options (* = default):
ignore_root - By default, all root connections are ignored. If the RPC
is sent to a node which drops packets to the slurmd port, the
RPC will block for some time before failing. This is
unlikely to be desirable. Likewise, root may be trying to
administer the system and not do work that should be in a job.
The job may trigger oom-killer or just exit. If root restarts
a service or similar, it will be tracked and killed by Slurm
when the job exits. This sounds bad because it is bad.
1* = Let the connection through without adoption
0 = I am crazy. I want random services to die when root jobs exit. I
also like it when RPCs block for a while then time out.
action_no_jobs - The action to perform if the user has no jobs on the node
ignore = Do nothing. Fall through to the next pam module
deny* = Deny the connection
action_unknown - The action to perform when the user has multiple jobs on
the node *and* the RPC does not locate the source job.
If the RPC mechanism works properly in your environment,
this option will likely be relevant *only* when connecting
from a login node.
newest* = Pick the newest job on the node. The "newest" job is chosen
based on the mtime of the job's step_extern cgroup; asking
Slurm would require an RPC to the controller. The user can ssh
in but may be adopted into a job that exits earlier than the
job they intended to check on. The ssh connection will at
least be subject to appropriate limits and the user can be
informed of better ways to accomplish their objectives if this
becomes a problem
allow = Let the connection through without adoption
deny = Deny the connection
action_adopt_failure - The action to perform if the process is unable to be
adopted into any job for whatever reason. If the
process cannot be adopted into the job identified by
the callerid RPC, it will fall through to the
action_unknown code and try to adopt there. A failure
at that point or if there is only one job will result
in this action being taken.
allow* = Let the connection through without adoption
deny = Deny the connection
action_generic_failure - The action to perform if there are certain failures
such as the inability to talk to the local slurmd
or if the kernel doesn't offer the correct
facilities.
ignore* = Do nothing. Fall through to the next pam module
allow = Let the connection through without adoption
deny = Deny the connection
log_level - See SlurmdDebug in slurm.conf(5) for available options. The
default log_level is info.
disable_x11 - turn off Slurm built-in X11 forwarding support.
1 = Do not check for Slurm's X11 forwarding support, and no not
alter the DISPLAY variable.
0* = If the step the job is adopted into has X11 enabled, set
the DISPLAY variable in the processes environment accordingly.
SLURM.CONF CONFIGURATION
PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
into which ssh-launched processes will be adopted.
**** IMPORTANT ****
PrologFlags=contain must be in place *before* using this module.
The module bases its checks on local steps that have already been launched. If
the user has no steps on the node, such as the extern step, the module will
assume that the user has no jobs allocated to the node. Depending on your
configuration of the pam module, you might deny *all* user ssh attempts.
NOTES
This module and the related RPC currently support Linux systems which
have network connection information available through /proc/net/tcp{,6}. A
proccess's sockets must exist as symlinks in its /proc/self/fd directory.
The RPC data structure itself is OS-agnostic. If support is desired for a
different OS, relevant code must be added to find one's socket information
then match that information on the remote end to a particular process which
Slurm is tracking.
IPv6 is supported by the RPC data structure itself and the code which sends it
and receives it. Sending the RPC to an IPv6 address is not currently
supported by Slurm. Once support is added, remove the relevant check in
slurm_network_callerid().
For the action_unknown=newest setting to work, the memory cgroup must be in
use so that the code can check mtimes of cgroup directories. If you would
prefer to use a different subsystem, modify the _indeterminate_multiple
function.
FIREWALLS, IP ADDRESSES, ETC.
slurmd should be accessible on any IP address from which a user might launch
ssh. The RPC to determine the source job must be able to reach the slurmd
port on that particular IP address.
If there is no slurmd on the source node, such as on a login node, it is
better to have the RPC be rejected rather than silently dropped. This
will allow better responsiveness to the RPC initiator.
EXAMPLES / SUGGESTED USAGE
Use of this module is recommended on any compute node.
Add the following line to the appropriate file in /etc/pam.d, such as
system-auth or sshd:
account sufficient pam_slurm_adopt.so
If you always want to allow access for an administrative group (e.g. wheel),
stack the pam_access module after pam_slurm_adopt. A success with
pam_slurm_adopt is sufficient to allow access but the pam_access module can
allow others, such as staff, access even without jobs.
account sufficient pam_slurm_adopt.so
account required pam_access.so
Then edit the pam_access configuration file (/etc/security/access.conf):
+:wheel:ALL
-:ALL:ALL
When access is denied, the user will receive a relevant error message.
pam_systemd.so is known to not play nice with Slurm's usage of cgroups. It is
recommended that you disable it or possibly add pam_slurm_adopt.so after
pam_systemd.so.