Permalink
Browse files

Modifications to pam_slurm_adopt to work correctly for the "extern" s…

…tep.
  • Loading branch information...
1 parent 05f5369 commit ca682973d11bed39811e8865462d38094150224a @ryanbcox ryanbcox committed with dannyauble Nov 3, 2015
Showing with 372 additions and 149 deletions.
  1. +1 −0 NEWS
  2. +65 −26 contribs/pam_slurm_adopt/README
  3. +306 −123 contribs/pam_slurm_adopt/pam_slurm_adopt.c
View
@@ -41,6 +41,7 @@ documents those changes that are of interest to users and administrators.
-- MYSQL - Fix rollups for multiple jobs running by the same association
in an hour counting multiple times.
-- Burstbuffer/Cray plugin - Fix for persistent burst buffer use.
+ -- Modifications to pam_slurm_adopt to work correctly for the "extern" step.
* Changes in Slurm 15.08.2
==========================
@@ -14,7 +14,7 @@ DESCRIPTION
This module attempts to determine the job which originated this connection.
The module is configurable; these are the default steps:
- 1) Check the local stepds for a count of jobs owned by the non-root user
+ 1) Check the local stepd for a count of jobs owned by the non-root user
a) If none, deny (option action_no_jobs)
b) If only one, adopt the process into that job
c) If multiple, continue
@@ -38,36 +38,72 @@ This module has the following options (* = default):
a service or similar, it will be tracked and killed by Slurm
when the job exits. This sounds bad because it is bad.
- 1* = let the connection through without adoption
+ 1* = Let the connection through without adoption
0 = I am crazy. I want random services to die when root jobs exit. I
also like it when RPC calls block for a while then time out.
action_no_jobs - The action to perform if the user has no jobs on the node
- ignore = let the connection through without adoption
- deny* = deny the connection
+ ignore = Do nothing. Fall through to the next pam module
+ deny* = Deny the connection
- action_unknown - The action to perform when the RPC call does not locate the
- source job and the user has multiple jobs on the node to
- choose from
+ action_unknown - The action to perform when the user has multiple jobs on
+ the node *and* the RPC call does not locate the source job.
+ If the RPC mechanism works properly in your environment,
+ this option will likely be relevant *only* when connecting
+ from a login node.
- any* = pick a job in a (somewhat) random fashion
- ignore = let the connection through without adoption
- deny = deny the connection
+ newest* = Pick the newest job on the node. The "newest" job is chosen
+ based on the mtime of the job's step_extern cgroup; asking
+ Slurm would require an RPC to the controller. The user can ssh
+ in but may be adopted into a job that exits earlier than the
+ job they intended to check on. The ssh connection will at
+ least be subject to appropriate limits and the user can be
+ informed of better ways to accomplish their objectives if this
+ becomes a problem
+ user = Use the /slurm/uid_$UID cgroups. Not all cgroups set
+ appropriate limits at this level so this may not be very
+ effective. Additionally, job accounting at this level is
+ impossible as is automatic cleanup of stray processes when the
+ job exits. This settings is not recommended.
+ allow = Let the connection through without adoption
+ deny = Deny the connection
- action_adopt_failure - The action to perform if the job is unable to be
- adopted into a job for whatever reason
+ action_adopt_failure - The action to perform if the process is unable to be
+ adopted into an identified job for whatever reason
- ignore = let the connection through without adoption
- deny* = deny the connection
+ allow* = Let the connection through without adoption
+ deny = Deny the connection
+ action_generic_failure - The action to perform it there certain failures
+ such as inability to talk to the local slurmd or
+ if the kernel doesn't offer the correct facilities
+
+ ignore* = Do nothing. Fall through to the next pam module
+ allow = Let the connection through without adoption
+ deny = Deny the connection
log_level - See SlurmdDebug in slurm.conf(5) for available options. The
default log_level is info.
+SLURM.CONF CONFIGURATION
+ For best results, all relevant cgroups plugins (e.g. proctrack/cgroup) should
+ be enabled in slurm.conf. At least one must be enabled for this module to be
+ even somewhat useful.
+
+ PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
+ into which ssh-launched processes will be adopted.
+
+ **** IMPORTANT ****
+ PrologFlags=contain must be in place *before* using this module.
+ The module bases its checks on local steps that have already been launched. If
+ the user has no steps on the node, such as the extern step, the module will
+ assume that the user has no jobs allocated to the node. Depending on your
+ configuration of the pam module, you might deny *all* user ssh attempts.
+
NOTES
This module and the related RPC call currently support Linux systems which
have network connection information available through /proc/net/tcp{,6}. A
@@ -79,31 +115,34 @@ NOTES
Slurm is tracking.
IPv6 is supported by the RPC data structure itself and the code which sends it
- or receives it. Sending the RPC call to an IPv6 address is not currently
+ and receives it. Sending the RPC call to an IPv6 address is not currently
supported by Slurm. Once support is added, remove the relevant check in
- slurm_network_callerid ().
+ slurm_network_callerid().
- proctrack/cgroup is recommended on Linux.
+ One future action_unknown idea is an option to pick the job with the longest
+ time remaining. This is not yet implemented.
FIREWALLS, IP ADDRESSES, ETC.
- slurmd should be accessible on any IP address that a user might launch ssh.
- The RPC call to determine the source job must be able to reach the slurmd port
- on that particular IP address.
+ slurmd should be accessible on any IP address from which a user might launch
+ ssh. The RPC call to determine the source job must be able to reach the slurmd
+ port on that particular IP address.
- If there is no slurmd on the source node, it is better to have the RPC call be
- rejected rather than silently dropped. This will allow better responsiveness
- to the RPC initiator.
+ If there is no slurmd on the source node, such as on a login node, it is
+ better to have the RPC call be rejected rather than silently dropped. This
+ will allow better responsiveness to the RPC initiator.
EXAMPLES / SUGGESTED USAGE
Use of this module is recommended on any compute node.
Add the following line to the appropriate file in /etc/pam.d, such as
system-auth or sshd:
- account required pam_slurm_adopt.so
+ account sufficient pam_slurm_adopt.so
- If you always want to allow access for an administrative group (eg, wheel),
- stack the pam_access module ahead of pam_slurm:
+ If you always want to allow access for an administrative group (e.g. wheel),
+ stack the pam_access module after pam_slurm_adopt. A success with
+ pam_slurm_adopt is sufficient to allow access but the pam_access module can
+ allow others, such as staff, access even without jobs.
account sufficient pam_slurm_adopt.so
account required pam_access.so
Oops, something went wrong.

0 comments on commit ca68297

Please sign in to comment.