|
@@ -14,7 +14,7 @@ DESCRIPTION |
|
|
This module attempts to determine the job which originated this connection. |
|
|
The module is configurable; these are the default steps: |
|
|
|
|
|
1) Check the local stepds for a count of jobs owned by the non-root user |
|
|
1) Check the local stepd for a count of jobs owned by the non-root user |
|
|
a) If none, deny (option action_no_jobs) |
|
|
b) If only one, adopt the process into that job |
|
|
c) If multiple, continue |
|
@@ -38,36 +38,72 @@ This module has the following options (* = default): |
|
|
a service or similar, it will be tracked and killed by Slurm |
|
|
when the job exits. This sounds bad because it is bad. |
|
|
|
|
|
1* = let the connection through without adoption |
|
|
1* = Let the connection through without adoption |
|
|
0 = I am crazy. I want random services to die when root jobs exit. I |
|
|
also like it when RPC calls block for a while then time out. |
|
|
|
|
|
|
|
|
action_no_jobs - The action to perform if the user has no jobs on the node |
|
|
|
|
|
ignore = let the connection through without adoption |
|
|
deny* = deny the connection |
|
|
ignore = Do nothing. Fall through to the next pam module |
|
|
deny* = Deny the connection |
|
|
|
|
|
|
|
|
action_unknown - The action to perform when the RPC call does not locate the |
|
|
source job and the user has multiple jobs on the node to |
|
|
choose from |
|
|
action_unknown - The action to perform when the user has multiple jobs on |
|
|
the node *and* the RPC call does not locate the source job. |
|
|
If the RPC mechanism works properly in your environment, |
|
|
this option will likely be relevant *only* when connecting |
|
|
from a login node. |
|
|
|
|
|
any* = pick a job in a (somewhat) random fashion |
|
|
ignore = let the connection through without adoption |
|
|
deny = deny the connection |
|
|
newest* = Pick the newest job on the node. The "newest" job is chosen |
|
|
based on the mtime of the job's step_extern cgroup; asking |
|
|
Slurm would require an RPC to the controller. The user can ssh |
|
|
in but may be adopted into a job that exits earlier than the |
|
|
job they intended to check on. The ssh connection will at |
|
|
least be subject to appropriate limits and the user can be |
|
|
informed of better ways to accomplish their objectives if this |
|
|
becomes a problem |
|
|
user = Use the /slurm/uid_$UID cgroups. Not all cgroups set |
|
|
appropriate limits at this level so this may not be very |
|
|
effective. Additionally, job accounting at this level is |
|
|
impossible as is automatic cleanup of stray processes when the |
|
|
job exits. This settings is not recommended. |
|
|
allow = Let the connection through without adoption |
|
|
deny = Deny the connection |
|
|
|
|
|
|
|
|
action_adopt_failure - The action to perform if the job is unable to be |
|
|
adopted into a job for whatever reason |
|
|
action_adopt_failure - The action to perform if the process is unable to be |
|
|
adopted into an identified job for whatever reason |
|
|
|
|
|
ignore = let the connection through without adoption |
|
|
deny* = deny the connection |
|
|
allow* = Let the connection through without adoption |
|
|
deny = Deny the connection |
|
|
|
|
|
action_generic_failure - The action to perform it there certain failures |
|
|
such as inability to talk to the local slurmd or |
|
|
if the kernel doesn't offer the correct facilities |
|
|
|
|
|
ignore* = Do nothing. Fall through to the next pam module |
|
|
allow = Let the connection through without adoption |
|
|
deny = Deny the connection |
|
|
|
|
|
log_level - See SlurmdDebug in slurm.conf(5) for available options. The |
|
|
default log_level is info. |
|
|
|
|
|
SLURM.CONF CONFIGURATION |
|
|
For best results, all relevant cgroups plugins (e.g. proctrack/cgroup) should |
|
|
be enabled in slurm.conf. At least one must be enabled for this module to be |
|
|
even somewhat useful. |
|
|
|
|
|
PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step |
|
|
into which ssh-launched processes will be adopted. |
|
|
|
|
|
**** IMPORTANT **** |
|
|
PrologFlags=contain must be in place *before* using this module. |
|
|
The module bases its checks on local steps that have already been launched. If |
|
|
the user has no steps on the node, such as the extern step, the module will |
|
|
assume that the user has no jobs allocated to the node. Depending on your |
|
|
configuration of the pam module, you might deny *all* user ssh attempts. |
|
|
|
|
|
NOTES |
|
|
This module and the related RPC call currently support Linux systems which |
|
|
have network connection information available through /proc/net/tcp{,6}. A |
|
@@ -79,31 +115,34 @@ NOTES |
|
|
Slurm is tracking. |
|
|
|
|
|
IPv6 is supported by the RPC data structure itself and the code which sends it |
|
|
or receives it. Sending the RPC call to an IPv6 address is not currently |
|
|
and receives it. Sending the RPC call to an IPv6 address is not currently |
|
|
supported by Slurm. Once support is added, remove the relevant check in |
|
|
slurm_network_callerid (). |
|
|
slurm_network_callerid(). |
|
|
|
|
|
proctrack/cgroup is recommended on Linux. |
|
|
One future action_unknown idea is an option to pick the job with the longest |
|
|
time remaining. This is not yet implemented. |
|
|
|
|
|
FIREWALLS, IP ADDRESSES, ETC. |
|
|
slurmd should be accessible on any IP address that a user might launch ssh. |
|
|
The RPC call to determine the source job must be able to reach the slurmd port |
|
|
on that particular IP address. |
|
|
slurmd should be accessible on any IP address from which a user might launch |
|
|
ssh. The RPC call to determine the source job must be able to reach the slurmd |
|
|
port on that particular IP address. |
|
|
|
|
|
If there is no slurmd on the source node, it is better to have the RPC call be |
|
|
rejected rather than silently dropped. This will allow better responsiveness |
|
|
to the RPC initiator. |
|
|
If there is no slurmd on the source node, such as on a login node, it is |
|
|
better to have the RPC call be rejected rather than silently dropped. This |
|
|
will allow better responsiveness to the RPC initiator. |
|
|
|
|
|
EXAMPLES / SUGGESTED USAGE |
|
|
Use of this module is recommended on any compute node. |
|
|
|
|
|
Add the following line to the appropriate file in /etc/pam.d, such as |
|
|
system-auth or sshd: |
|
|
|
|
|
account required pam_slurm_adopt.so |
|
|
account sufficient pam_slurm_adopt.so |
|
|
|
|
|
If you always want to allow access for an administrative group (eg, wheel), |
|
|
stack the pam_access module ahead of pam_slurm: |
|
|
If you always want to allow access for an administrative group (e.g. wheel), |
|
|
stack the pam_access module after pam_slurm_adopt. A success with |
|
|
pam_slurm_adopt is sufficient to allow access but the pam_access module can |
|
|
allow others, such as staff, access even without jobs. |
|
|
|
|
|
account sufficient pam_slurm_adopt.so |
|
|
account required pam_access.so |
|
|