Skip to content

slurm_resume creating files in /tmp of head_node and not cleaning up.... #6572

@gwolski

Description

@gwolski

parallelcluster 3.9.1 and 3.11.1
I see many files in /tmp since the start of the head_node of the form:

-rw-r----- 1 slurm pcluster-slurm-share 1088 Nov 18 09:14 tmp.VBWqTAz4SS
-rw-r----- 1 slurm pcluster-slurm-share 262 Nov 18 09:05 tmp.zMJAymfADb
-rw-r----- 1 slurm pcluster-slurm-share 276 Nov 18 09:01 tmp.4FNDMLk4rC
-rw-r----- 1 slurm pcluster-slurm-share 282 Nov 18 08:59 tmp.xVx3sST9n3
-rw-r----- 1 slurm pcluster-slurm-share 282 Nov 18 08:55 tmp.a23WPhksqt
-rw-r----- 1 slurm pcluster-slurm-share 473 Nov 18 08:41 tmp.D3MvXoHL1g
-rw-r----- 1 slurm pcluster-slurm-share 1691 Nov 18 08:40 tmp.zYS9m1CU3j
-rw-r----- 1 slurm pcluster-slurm-share 1488 Nov 18 08:40 tmp.mCOXXmqOHt
-rw-r----- 1 slurm pcluster-slurm-share 1488 Nov 18 08:39 tmp.di3dHuc923
-rw-r----- 1 slurm pcluster-slurm-share 265 Nov 18 08:37 tmp.AXTQoAX1hL
-rw-r----- 1 slurm pcluster-slurm-share 265 Nov 18 08:37 tmp.45sPSvBlrB

I would expect slurm (slurmctld since we're on the head node?) to clean up and not leave crumbs.
Contents of the file seem to relate starting jobs and are of the form:

{"jobs":[{"extra":null,"job_id":17836,"features":null,"nodes_alloc":"sp-r7a-m-dy-sp-8-gb-1-cores-40","nodes_resume":"sp-r7a-m-dy-sp-8-gb-1-cores-40","oversubscribe":"NO","partition":"sp-8-gb","reservation":null},{"extra":null,"job_id":17837,"features":null,"nodes_alloc":"sp-r7a-m-dy-sp-8-gb-1-cores-41","nodes_resume":"sp-r7a-m-dy-sp-8-gb-1-cores-41","oversubscribe":"NO","partition":"sp-8-gb","reservation":null}],"all_nodes_resume":"sp-r7a-m-dy-sp-8-gb-1-cores-[40-41]"}

Is this a bug, feature, or known issue? Should I be cleaning up head_node /tmp/ files older than N days?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions