# Jobflow-remote introduction

Jobflow-remote is a free, open-source library serving as a manager for the execution of [jobflow](https://materialsproject.github.io/jobflow/) workflows. Most of the information about how to set up and use jobflow-remote can be found in the [official documentation](https://matgenix.github.io/jobflow-remote/index.html).

## Setting up jobflow-remote

While the first set to use jobflow-remote would be to [set up the configuration file for one project](https://matgenix.github.io/jobflow-remote/user/install.html), in this tutorial all the configurations have already been performed. Nonetheless it may be helpfull to explore the content of the configuration yaml file that has been generated. The default location is in the `~/.jfremote` folder.
A few important sections that you might want to check are:
* `name`: the name of the project (note that while it is advisable that the configuration file has the same name as the project, this is not a constraint)
* `workers`: the workers available for this project. In this case there are several workers available:
  * `local_slurm`: submits jobs to the `slurm` container in its SLURM queue
  * `local`: runs jobs directly in the `jupyter` container.
* `queue`: the connection details to the MongoDB database containing the details about the Jobs execution
* `jobstore`: the connection details to jobflow's `JobStore`, containing the Job's outputs
* `exec_config`: common configurations required to execute Jobs on the workers.

In [1]:
!cat /home/jovyan/.jfremote/test_project.yaml # replace test_project with the PROJECTNAME value if you changed it

name: test_project
base_dir: /home/jovyan/.jfremote/test_project
log_level: debug
runner:
  delay_checkout: 10
  delay_check_run_status: 10
  delay_advance_status: 10
  delay_update_batch: 10
  max_step_attempts: 3
  delta_retry:
  - 10
  - 20
  - 30

workers:
  local_slurm:
    scheduler_type: slurm
    work_dir: /home/atomate/jobs
    resources:
    pre_run: "source /home/atomate/.venv/bin/activate"
    post_run:
    timeout_execute: 60
    type: remote
    host: slurm
    user: atomate
    port:
    password: atomate
  local_shell:
    scheduler_type: shell
    work_dir: /home/jovyan/jobs
    pre_run: "export PYTHONPATH=/home/jovyan/work/develop:$PYTHONPATH"
    type: local
    max_jobs: 2

queue:
  store:
    type: MongoStore
    database: jobflow_remote
    collection_name: jobs
    host: mongodb
    port: 
    username: ''
    password: ''
    ssh_tunnel:
    safe_update: false
    auth_source: jobflow_remote
    mongoclient_kwargs: {}
    default_sort:

exec_config: {}

jobstore

## The `jf` CLI

Most of the interactions with jobflow-remote can be performed using the `jf` command line interface. It is usually executed from the shell, but can also be executed inside this notebook, prepending the command with an exlamation mark `!`.

A first step would be to verify that the connections have been properly set up using the `jf project check` command. This will attempt to connect to the workers and the databases and perform few actions. Passing the checks does not guarantee that everything will work fine, but if the checks do not pass connection details need to be revised. 
The output should look like:
```
✓ Worker local_slurm
✓ Worker local_shell
✓ Jobstore
✓ Queue store
```

In [2]:
!jf project check

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[1;32m✓[0m Worker local_slurml_slurm
[2K[1;32m✓[0m Worker local_shelll_shell
[2K[1;32m✓[0m Jobstorer local_shell
[2K[1;32m✓[0m Queue storeocal_shell
[2K[32m⠇[0m Checking queue storeell
[1A[2K

Once the connections are properly set, the queue database needs to be prepared with the `jf admin reset` command. This will add a few required documents to the DB and **remove all the Jobs and Flows** present in the database (here we us the `-f` option to avoid being asked for a confirmation)

In [3]:
!jf admin reset -f

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Checking the Daemon status...
[2K[32m⠸[0m Resetting the DB...
[1A[2KThe database was reset


It is now possible to check that no jobs or flow are present in the DB

In [4]:
!jf job list

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2K[3m                               Jobs info                               [0m
┏━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mDB id[0m[1m [0m┃[1m [0m[1mName[0m[1m [0m┃[1m [0m[1mState[0m[1m [0m┃[1m [0m[1mJob id (Index)[0m[1m [0m┃[1m [0m[1mWorker[0m[1m [0m┃[1m [0m[1mLast updated [UTC][0m[1m [0m┃
┡━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
└───────┴──────┴───────┴────────────────┴────────┴────────────────────┘


In [5]:
!jf flow list

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2K[3m                            Flows info                            [0m
┏━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mDB id[0m[1m [0m┃[1m [0m[1mName[0m[1m [0m┃[1m [0m[1mState[0m[1m [0m┃[1m [0m[1mFlow id[0m[1m [0m┃[1m [0m[1mNum Jobs[0m[1m [0m┃[1m [0m[1mLast updated [UTC][0m[1m [0m┃
┡━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
└───────┴──────┴───────┴─────────┴──────────┴────────────────────┘


<div class="alert alert-block alert-info">
<b>Tip</b>: All the commands have a <code>-h</code>/<code>--help</code> option that shows all the options for that command
</div>

In [6]:
!jf job list -h

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mjf job list [OPTIONS][0m[1m                                                  [0m[1m [0m
[1m                                                                                [0m
 Get the list of Jobs in the database.                                          
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-job[0m[1;36m-id[0m           [1;32m-jid[0m        [1;33mTEXT                 [0m  One or more pair of    [2m│[0m
[2m│[0m                                                       job ids (i.e. uuids)   [2m│[0m
[2m│[0m                                                   

## The first Flow

To start using jobflow-remote a Flow needs to be created and added to the DB. Several example Jobs can be found in the `jobflow-remote.testing` module (see the [source code](https://github.com/Matgenix/jobflow-remote/blob/develop/src/jobflow_remote/testing/__init__.py)) and can be used to compose a simple Flow.

<div class="alert alert-block alert-warning"><b>Warning:</b> To be executed by jobflow-remote the Job source code should be <b>available in the worker as well</b>. Thus, a simple Job with its source in the notebook will not be present in the worker and jobflow-remote is not able of running it. For this reason in these examples simple jobs included in jobflow-remote will be used.</div>


In [7]:
from jobflow import Flow
from jobflow_remote.testing import add

j1 = add(1, 2)
j2 = add(j1.output, 4)

flow1 = Flow([j1, j2])

At this point the Flow has been create but it has not been inserted in the jobflow-remote database. To do so it is necessary to use the `submit_flow` function and choose a worker that will execute the Jobs. For this first example we will use the local worker `local_shell`. The function returns a list of the jobs unique ids that are used to identify them in the database (note that these are strings, even if usually representing integers)

In [8]:
from jobflow_remote import submit_flow

job_ids = submit_flow(flow1, worker="local_shell")
job_ids

['1', '2']

It is now possible to check the presence of the Jobs and Flow in the database using again the `jf` command line:

In [9]:
!jf job list

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2K[3m                                   Jobs info                                    [0m
┏━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃[1m       [0m┃[1m      [0m┃[1m         [0m┃[1m                   [0m┃[1m             [0m┃[1m [0m[1mLast updated     [0m[1m [0m┃
┃[1m [0m[1mDB id[0m[1m [0m┃[1m [0m[1mName[0m[1m [0m┃[1m [0m[1mState  [0m[1m [0m┃[1m [0m[1mJob id (Index)   [0m[1m [0m┃[1m [0m[1mWorker     [0m[1m [0m┃[1m [0m[1m[UTC]            [0m[1m [0m┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ 2     │ add  │ WAITING │ 10600b10-f8b1-45… │ local_shell │ 2025-04-11 15:30  │
│       │      │         │ (1)               │             │                   │
│ 1     │ add  │ READY   │ 4a2d2fc0-dd3d-45… │ local_shell │ 2025-

In [10]:
!jf flow list

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2K[3m                                   Flows info                                   [0m
┏━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mDB id[0m[1m [0m┃[1m [0m[1mName[0m[1m [0m┃[1m [0m[1mState[0m[1m [0m┃[1m [0m[1mFlow id              [0m[1m [0m┃[1m [0m[1mNum Jobs[0m[1m [0m┃[1m [0m[1mLast updated [UTC][0m[1m [0m┃
┡━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ 1     │ Flow │ READY │ d40faa94-c3d8-4983-9… │ 2        │ 2025-04-11 15:30   │
└───────┴──────┴───────┴───────────────────────┴──────────┴────────────────────┘


More detailed information on a single job can be obtained with the `jf job info` command. To identify a Job both the `uuid` and `db_id` can be used.

In [11]:
!jf job info 2

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2K[34m╭────────────────────────────────────────────────────────────╮[0m
[34m│[0m      [3;33mdb_id[0m[31m =[0m [32m'2'[0m                                           [34m│[0m
[34m│[0m       [3;33muuid[0m[31m =[0m [32m'10600b10-f8b1-45f5-98ce-f1fc0998dab5'[0m        [34m│[0m
[34m│[0m      [3;33mindex[0m[31m =[0m [1;36m1[0m                                             [34m│[0m
[34m│[0m       [3;33mname[0m[31m =[0m [32m'add'[0m                                         [34m│[0m
[34m│[0m      [3;33mstate[0m[31m =[0m [32m'WAITING'[0m                                     [34m│[0m
[34m│[0m     [3;33mremote[0m[31m =[0m [1m{[0m[32m'step_attempts'[0m: [1;36m0[0m, [32m'prerun_cleanup'[0m: [3;91mFalse[0m[1m}[0m [34m│[0m
[34m│[0m [3;33mcreated_on[0m[31m =[0m [32m'2025-04-11 15:30'

<div class="alert alert-block alert-info">
<b>Note</b>: more information can be printed if the previous commands are executed using the <code>-v</code> option to increase the verbosity. More <code>v</code>s further increase the verbosity (e.g. <code>-vvv</code>)
</div>

## The Runner

Once the Jobs are in the database, they will be executed by jobflow-remote. However, to do so, the Runner has to be activated. 
In jobflow-remote the Runner refers to one or more processes that handle the whole execution of the jobflow workflows, including the interaction with the worker and the writing of the outputs in the `JobStore`.

![Runner schema](https://matgenix.github.io/jobflow-remote/_images/daemon_schema.svg)

The Runner performs different tasks, mainly divided in
1. checking out jobs from database to start a Flow execution
2. updating the states of the Jobs in the queue database
3. interacting with the worker hosts to upload/download files and check the job status
4. inserting the output data in the output JobStore

The runner can be activate with the `jf runner start` command. Its status can be checked with `jf runner status` and `jf runner info` and it can be stopped with `jf runner stop` or `jf runner shutdown`. The Runner **remains active until explicitly stopped**. 

In [12]:
!jf runner start

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠼[0m Starting the daemon...
[1A[2K

In [13]:
!jf runner status

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2KDaemon status: [32mrunning[0m


In [14]:
!jf runner info

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2K┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┓
┃[1m [0m[1mProcess                                     [0m[1m [0m┃[1m [0m[1mPID [0m[1m [0m┃[1m [0m[1mState  [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━┩
│ supervisord                                  │ 1104 │ RUNNING │
│ runner_daemon_checkout:run_jobflow_checkout  │ 1105 │ RUNNING │
│ runner_daemon_complete:run_jobflow_complete0 │ 1106 │ RUNNING │
│ runner_daemon_queue:run_jobflow_queue        │ 1107 │ RUNNING │
│ runner_daemon_transfer:run_jobflow_transfer0 │ 1108 │ RUNNING │
└──────────────────────────────────────────────┴──────┴─────────┘

Data about running runner in the DB:
[34m╭───────────────────────────────────────────────────────────────╮[0m
[34m│[0m     [3;33mdaemon_dir[0m[31m =[0m [32m'/home/jovyan/.jf

If everything is running properly you can check the status of the Jobs in the queue. They should reach the `COMPLETED` state in few seconds.

In [19]:
!jf job list

The selected project is [32mtest_project[0m from config file 
[32m/home/jovyan/.jfremote/test_project.yaml[0m
[2K[32m⠋[0m Processing...
[1A[2K[3m                                   Jobs info                                    [0m
┏━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃[1m       [0m┃[1m      [0m┃[1m           [0m┃[1m                  [0m┃[1m             [0m┃[1m [0m[1mLast updated    [0m[1m [0m┃
┃[1m [0m[1mDB id[0m[1m [0m┃[1m [0m[1mName[0m[1m [0m┃[1m [0m[1mState    [0m[1m [0m┃[1m [0m[1mJob id (Index)  [0m[1m [0m┃[1m [0m[1mWorker     [0m[1m [0m┃[1m [0m[1m[UTC]           [0m[1m [0m┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ 2     │ add  │ COMPLETED │ 10600b10-f8b1-4… │ local_shell │ 2025-04-11 15:32 │
│       │      │           │ (1)              │             │                  │
│ 1     │ add  │ COMPLETED │ 4a2d2fc0-dd3d-4… │ local_shell │ 2025

## Extracting results

Once the Jobs are completed it is possible to extract their outputs. Jobflow-remote does not change this specific aspect of jobflow, so the results can still be extracted from the database using the `JobStore` object or direct database queries. The only difference would be to use jobflow-remote to get the properly configured instance of `JobStore` through the helper function `get_jobstore`. Remeber to **connect** the JobStore before using it.

In [20]:
from jobflow_remote import get_jobstore

jobstore = get_jobstore()
jobstore.connect()

Queries will be the same as for standard jobflow and results can be obtained with generic queries, or referring to the job `uuid`. In this case it may also be convenient to use jobflow-remote's `db_id`, which is stored in the outputs `metadata`:

In [21]:
print("Output from uuid: ", jobstore.get_output(uuid=j2.uuid))
print("Output document from uuid: ", jobstore.query_one({"metadata.db_id": "2"}))

Output from uuid:  7
Output document from uuid:  {'_id': ObjectId('67f9361534cffa51d7095430'), 'uuid': '10600b10-f8b1-45f5-98ce-f1fc0998dab5', 'index': 1, 'output': 7, 'completed_at': '2025-04-11T15:32:18.607253', 'metadata': {'db_id': '2'}, 'hosts': ['d40faa94-c3d8-4983-9982-c859ca65a74b'], 'name': 'add', '@module': 'jobflow.core.schemas', '@class': 'JobStoreDocument', '@version': '0.1.19'}


## Submitting to a queue based worker 

Since the previous Flow was executed on the local worker, that is based on the shell execution, no further information needed to be provided. However, to perform real simulations Jobs need to be submitted to queueing systems in the HPC centers (e.g. Slurm, PBS, ...). To do so it is necessary to specify which resources will be used. Let's create a new Flow and submit it to the `local_slurm` worker. Since the scheduler in this worker is Slurm, some of the standard Slurm input options can be used

<div class="alert alert-block alert-info">
The handling of the submission to the scheduler is delegated to <a href="https://matgenix.github.io/qtoolkit/">qtoolkit</a>. The available keywords can be checked in the <a href="https://github.com/Matgenix/qtoolkit/blob/bcb445b903f3cb78295aa7641944e0bade9a3fb8/src/qtoolkit/io/slurm.py#L150">Slurm template</a> 
</div>

In [22]:
j3 = add(1, 2)
j4 = add(j3.output, 4)

flow2 = Flow([j3, j4])

submit_flow(flow2, worker="local_slurm", resources={"nodes": 1 , "ntasks": 1, "time": "00:10:00"})

['3', '4']

The execution may be slightly slower then in the previous case, since now files need to be transferred to the other machine and the Job is running through a queue.

In [None]:
!jf job list

## Additional exercises

* Try submitting a new Flow and stop the runner to verify that steps are not being perfomed when the runner is not active
* Open the terminal in JupyterLab and try running the `jf` commands directly
* Explore the functionalities available in the CLI: `jf --tree` for a tree representation of use the `-h` option to get list of subcommands and options
* Explore options for filtering Jobs and Flows (`jf job list -h`, `jf flow list -h`).
  * How do you select the job with `db_id` `1`? And with a specific `uuid`?
  * How do you list all the `COMPLETED` jobs?
  * How to tune the maximum number of displayed jobs and flows?
  * Change the sorting order
  * Filter jobs and flows by date.
* Try submitting a more complex Flow. Composed or with dynamical actions (see the [test Jobs available in jobflow-remote](https://github.com/Matgenix/jobflow-remote/blob/develop/src/jobflow_remote/testing/__init__.py))
* Get the [Mermaid](https://mermaid.js.org/) representation of a Flow. Use `jf flow graph --mermaid`. How do you specify the flow? You can use the generated text in [mermaid.live](https://mermaid.live) to view the graph.
* In the JupyterLab terminal start the GUI with `jf gui` and open your browser at http://localhost:5001. Select the project and explore the functionalities.