The primary way to set up a Manager
is to setup a YAML config file. This page provides helpful config files which mostly can be just copied and used in place (filling in things like **username** and **password** <managers_server>
as needed.)
The full documentation of every option and how it can be used can be found in the Queue Manager's API <managers_config_api>
.
For these examples, the username
will always be "Foo" and the password
will always be "b4R" (which are just placeholders and not valid). The manager_name
variable can be any string and these examples provide some descriptive samples. The more distinct the name, the better it is to see its status on the Server
.
This example is similar to the example on the start page for Managers<manager_starter_example>
, but with some additional options such as connecting back to a central Fractal instance and setting more cluster-specific options. Again, this starts a manager with a dask Adapter
, on a SLURM cluster, consuming 1 CPU and 8 GB of ram, targeting a Fractal Server running on that cluster, and using the SLURM partition default
, save the following YAML config file:
common:
adapter: dask
tasks_per_worker: 1
cores_per_worker: 1
memory_per_worker: 8
server:
fractal_uri: "localhost:7777"
username: Foo
password: b4R
manager:
manager_name: "SlurmCluster_OneDaskTask"
cluster:
scheduler: slurm
walltime: "72:00:00"
dask:
queue: default
This example starts a max of 1 cluster Job
, but multiple tasks<Task>
. The hardware will be consumed uniformly by the Worker
. With 8 cores, 20 GB of memory, and 4 tasks; the Worker
will provide 2 cores and 5 GB of memory to compute each Task
. We set common.max_workers
to 1 to limit the number of Workers<Worker>
and Jobs <Job>
which can be started. Since this is SLURM, the squeue
information will show this user has run 1 sbatch
jobs which requested 4 cores and 20 GB of memory.
common:
adapter: dask
tasks_per_worker: 4
cores_per_worker: 8
memory_per_worker: 20
max_workers: 1
server:
fractal_uri: "localhost:7777"
username: Foo
password: b4R
manager:
manager_name: "SlurmCluster_MultiDask"
cluster:
scheduler: slurm
walltime: "72:00:00"
dask:
queue: default
This will test the Manager
to make sure it's setup correctly, and does not need to connect to the Server
, and therefore does not need a server
block. It will still however submit jobs <Job>
.
common:
adapter: dask
tasks_per_worker: 2
cores_per_worker: 4
memory_per_worker: 10
manager:
manager_name: "TestBox_NeverSeen_OnServer"
test: True
ntests: 5
cluster:
scheduler: slurm
walltime: "01:00:00"
dask:
queue: default
Suppose there are some commands you want to run before starting the Worker
, such as starting a Conda environment, or setting some environment variables. This lets you specify that. For this, we will run on a Sun Grid Engine (SGE) cluster, start a conda environment, and load a module.
An important note about this one, we have now set max_workers
to something larger than 1. Each Job
will still request 16 cores and 256 GB of memory to be evenly distributed between the 4 tasks<Task>
, however, the Adapter
will attempt to start 5 independent jobs<Job>
, for a total of 80 cores, 1.280 TB of memory, distributed over 5 Workers<Worker>
collectively running 20 concurrent tasks<Task>
. If the Scheduler
does not allow all of those jobs<Job>
to start, whether due to lack of resources or user limits, the Adapter
can still start fewer jobs<Job>
, each with 16 cores and 256 GB of memory, but Task
concurrency will change by blocks of 4 since the Worker
in each Job
is configured to handle 4 tasks<Task>
each.
common:
adapter: dask
tasks_per_worker: 4
cores_per_worker: 16
memory_per_worker: 256
max_workers: 5
server:
fractal_uri: localhost:7777
username: Foo
password: b4R
manager:
manager_name: "GridEngine_OpenMPI_DaskWorker"
test: False
cluster:
scheduler: sge
task_startup_commands:
- module load mpi/gcc/openmpi-1.6.4
- conda activate qcfmanager
walltime: "71:00:00"
dask:
queue: free64
A Scheduler
may ask you to set additional flags (or you might want to) when submitting a Job
. Maybe it's a Sys. Admin enforced rule, maybe you want to pull from a specific account, or set something not interpreted for you in the Manager
or Adapter
(do tell us though if this is the case). This example sets additional flags on a PBS cluster such that the final Job
launch file will have #PBS {my headers}
.
This example also uses Parsl and sets a scratch directory.
common:
adapter: parsl
tasks_per_worker: 1
cores_per_worker: 6
memory_per_worker: 64
max_workers: 5
scratch_directory: "$TMPDIR"
server:
fractal_uri: localhost:7777
username: Foo
password: b4R
verify: False
manager:
manager_name: "PBS_Parsl_MyPIGroupAccount_Manger"
cluster:
node_exclusivity: True
scheduler: pbs
scheduler_options:
- "-A MyPIsGroupAccount"
task_startup_commands:
- conda activate qca
- cd $WORK
walltime: "06:00:00"
parsl:
provider:
partition: normal_q
cmd_timeout: 30