Skip to content

Latest commit

 

History

History
200 lines (152 loc) · 6.12 KB

managers_samples.rst

File metadata and controls

200 lines (152 loc) · 6.12 KB

Queue Manager Example YAML Files

The primary way to set up a Manager is to setup a YAML config file. This page provides helpful config files which mostly can be just copied and used in place (filling in things like **username** and **password** <managers_server> as needed.)

The full documentation of every option and how it can be used can be found in the Queue Manager's API <managers_config_api>.

For these examples, the username will always be "Foo" and the password will always be "b4R" (which are just placeholders and not valid). The manager_name variable can be any string and these examples provide some descriptive samples. The more distinct the name, the better it is to see its status on the Server.

SLURM Cluster, Dask Adapter with additional options

This example is similar to the example on the start page for Managers<manager_starter_example>, but with some additional options such as connecting back to a central Fractal instance and setting more cluster-specific options. Again, this starts a manager with a dask Adapter, on a SLURM cluster, consuming 1 CPU and 8 GB of ram, targeting a Fractal Server running on that cluster, and using the SLURM partition default, save the following YAML config file:

common:
 adapter: dask
 tasks_per_worker: 1
 cores_per_worker: 1
 memory_per_worker: 8

server:
 fractal_uri: "localhost:7777"
 username: Foo
 password: b4R

manager:
 manager_name: "SlurmCluster_OneDaskTask"

cluster:
 scheduler: slurm
 walltime: "72:00:00"

dask:
 queue: default

Multiple Tasks, 1 Cluster Job

This example starts a max of 1 cluster Job, but multiple tasks<Task>. The hardware will be consumed uniformly by the Worker. With 8 cores, 20 GB of memory, and 4 tasks; the Worker will provide 2 cores and 5 GB of memory to compute each Task. We set common.max_workers to 1 to limit the number of Workers<Worker> and Jobs <Job> which can be started. Since this is SLURM, the squeue information will show this user has run 1 sbatch jobs which requested 4 cores and 20 GB of memory.

common:
 adapter: dask
 tasks_per_worker: 4
 cores_per_worker: 8
 memory_per_worker: 20
 max_workers: 1

server:
 fractal_uri: "localhost:7777"
 username: Foo
 password: b4R

manager:
 manager_name: "SlurmCluster_MultiDask"

cluster:
 scheduler: slurm
 walltime: "72:00:00"

dask:
 queue: default

Testing the Manager Setup

This will test the Manager to make sure it's setup correctly, and does not need to connect to the Server, and therefore does not need a server block. It will still however submit jobs <Job>.

common:
 adapter: dask
 tasks_per_worker: 2
 cores_per_worker: 4
 memory_per_worker: 10

manager:
 manager_name: "TestBox_NeverSeen_OnServer"
 test: True
 ntests: 5

cluster:
 scheduler: slurm
 walltime: "01:00:00"

dask:
 queue: default

Running commands before work

Suppose there are some commands you want to run before starting the Worker, such as starting a Conda environment, or setting some environment variables. This lets you specify that. For this, we will run on a Sun Grid Engine (SGE) cluster, start a conda environment, and load a module.

An important note about this one, we have now set max_workers to something larger than 1. Each Job will still request 16 cores and 256 GB of memory to be evenly distributed between the 4 tasks<Task>, however, the Adapter will attempt to start 5 independent jobs<Job>, for a total of 80 cores, 1.280 TB of memory, distributed over 5 Workers<Worker> collectively running 20 concurrent tasks<Task>. If the Scheduler does not allow all of those jobs<Job> to start, whether due to lack of resources or user limits, the Adapter can still start fewer jobs<Job>, each with 16 cores and 256 GB of memory, but Task concurrency will change by blocks of 4 since the Worker in each Job is configured to handle 4 tasks<Task> each.

common:
 adapter: dask
 tasks_per_worker: 4
 cores_per_worker: 16
 memory_per_worker: 256
 max_workers: 5

server:
 fractal_uri: localhost:7777
 username: Foo
 password: b4R

manager:
 manager_name: "GridEngine_OpenMPI_DaskWorker"
 test: False

cluster:
 scheduler: sge
 task_startup_commands:
     - module load mpi/gcc/openmpi-1.6.4
     - conda activate qcfmanager
 walltime: "71:00:00"

dask:
 queue: free64

Additional Scheduler Flags

A Scheduler may ask you to set additional flags (or you might want to) when submitting a Job. Maybe it's a Sys. Admin enforced rule, maybe you want to pull from a specific account, or set something not interpreted for you in the Manager or Adapter (do tell us though if this is the case). This example sets additional flags on a PBS cluster such that the final Job launch file will have #PBS {my headers}.

This example also uses Parsl and sets a scratch directory.

common:
 adapter: parsl
 tasks_per_worker: 1
 cores_per_worker: 6
 memory_per_worker: 64
 max_workers: 5
 scratch_directory: "$TMPDIR"

server:
 fractal_uri: localhost:7777
 username: Foo
 password: b4R
 verify: False

manager:
 manager_name: "PBS_Parsl_MyPIGroupAccount_Manger"

cluster:
 node_exclusivity: True
 scheduler: pbs
 scheduler_options:
     - "-A MyPIsGroupAccount"
 task_startup_commands:
     - conda activate qca
     - cd $WORK
 walltime: "06:00:00"

parsl:
 provider:
  partition: normal_q
  cmd_timeout: 30