# How to quickly create a workflow from a set of executables

To run the following Python cells, we need to make sure that we select the correct kernel `Python3.10 (AIIDA)`. If it is
not already selected, do so as follows:

<img src="../../data/change_notebook_kernel.png" width="500" style="height:auto; display:block; margin-left:auto; margin-right:auto;">

## Quickly set up a running instance

### Interacting with AiiDA and creating a profile

While AiiDA is already installed in the conda kernel of this deployment, for each project one must set up a **profile**,
which defines the connection to the data storage (SQLite or PostgreSQL database and file repository), configuration, and
other settings.

Overall, AiiDA can be controlled in two ways:

1. Using the `verdi` command line interface (CLI), or `%verdi` magic in Jupyter notebooks.
2. Using the `aiida` Python API

As of AiiDA **v2.6.1** which was released on 2024-07-01, it is now possible to create a profile without the
PostgreSQL and RabbitMQ services mentioned previously. For the sake of this tutorial, we will use this simplified
version, and we refer you to the [installation instructions on
RTD](https://aiida.readthedocs.io/projects/aiida-core/en/stable/installation/index.html) for more information on how to
set up a fully functional high-performance profile.

To set up our profile, we just need to run the following notebook cell:

In [1]:
!/apps/share64/debian10/anaconda/anaconda-7/envs/AIIDA/bin/verdi presto --profile-name euro-scipy-2024

fish: Unknown command: /apps/share64/debian10/anaconda/anaconda-7/envs/AIIDA/bin/verdi
fish: 
/apps/share64/debian10/anaconda/anaconda-7/envs/AIIDA/bin/verdi presto --profile-name euro-scipy-2024
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^


Now that we have created a profile, for convenience, we will now load the AiiDA jupyter extension. This will allow us
to use the `%verdi` jupyter magic commands, rather than having to run them in a subshell with the full, absolute
path to the `verdi` executable as done in the cell above.

In addition, this makes the `%aiida` jupyter magic command available that, when executed, will automatically load the
previously created `euro-scipy-2024` default profile. Alternatively, a specific profile can also be loaded as follows:
```python
from aiida import load_profile
load_profile('euro-scipy-2024')
```
which is the typical way to load a profile and what you will see in most code snippets.

In [2]:
%load_ext aiida
%aiida

Now, we set some configuration options for our profile:

In [3]:
%verdi config set warnings.development_version false
%verdi config set warnings.showdeprecations false



And verify that the profile was created successfully via:

In [5]:
%verdi status

[32m[22m ✔ [0m[22mversion:     AiiDA v2.6.1[0m
[32m[22m ✔ [0m[22mconfig:      /home/geiger_j/aiida_projects/fair-workflows-workshop/.aiida[0m
[32m[22m ✔ [0m[22mprofile:     euro-scipy-2024[0m
[32m[22m ✔ [0m[22mstorage:     SqliteDosStorage[/home/geiger_j/aiida_projects/fair-workflows-workshop/.aiida/repository/sqlite_dos_19cf4b6c9a8a4e31bd2902ba52fc3e86]: open,[0m
[32m[22m ✔ [0m[22mbroker:      RabbitMQ v3.9.13 @ amqp://guest:guest@127.0.0.1:5672?heartbeat=600[0m
[32m[22m ✔ [0m[22mdaemon:      Daemon is running with PID 2247088[0m


should show something like:

```shell
 ✔ version:     AiiDA v2.6.2
 ✔ config:      /home/nanohub/<your-user>/.aiida
 ✔ profile:     euro-scipy-2024
 ✔ storage:     SqliteDosStorage[/home/nanohub/<your-user>/.aiida/repository/sqlite_dos_b25c3582f65647beb068a3e50636a274]: open,
 ⏺ broker:      No broker defined for this profile: certain functionality not available. See https://aiida-core.readthedocs.io/en/stable/installation/guide_quick.html#quick-install-limitations
 ⏺ daemon:      No broker defined for this profile: daemon is not available. See {URL_NO_BROKER}
```

***

## Concatenating several scripts to one workflow

### The workflow setup

Now that we have a working profile set up, assume we would like to execute a workflow that is composed of the following
steps:

- 1. Create a database that contains some matrices 
- 2. Run a code that achieves matrix diagonalizations and writes the eigenvalues and eigenvectors to files on disk
- 3. Plot the results from the previous steps

For this tutorial, the chosen example serves mainly demonstration purposes. However, to motivate our choices of
tasks, one could imagine the following concrete use cases:

- 1. Query and download atomic structures from a materials database via their API
- 2. Run structure optimizations using Quantum Mechanical codes like QE (these, like many other numerical codes, run
     many matrix diagonalizations)
- 3. Visualize our results for a scientific publication

Note that AiiDA was originally created for materials science applications, so we are aware that the examples reflect
that. If you think of other use cases, feel free to implement them after this tutorial and let us know about them :star: 

Each of the steps of our workflow can be of arbitrary nature, e.g. an executable on your system, a shell script, Python code, etc.

We provide those for the exemplary workflow outlined above as pre-compiled binaries. Their source code doesn't really
matter. If you are interested, you can find the source code under the `data` directory.

### `Computer`s and `Code`s

Now, before we can start running stuff with AiiDA, we must first register the computational resources and executables we
want to use for that purpose.

The `verdi presto` command with which we created our profile automatically set up your local workstation as the
`localhost` computer, which will suffice for this tutorial. To set up additional `Computer`s in the future,
e.g. remote HPC resources, they will need to be registered in AiiDA, providing the necessary SSH and scheduler options.
For further information, we refer to the [relevant section of the
documentation](https://aiida.readthedocs.io/projects/aiida-core/en/stable/howto/run_codes.html#how-to-set-up-a-computer).
Configuration files for some (mainly Swiss) HPC resources [here](https://github.com/aiidateam/aiida-code-registry) (PRs
welcome!).

We have now finally arrived at some Python code, so let's import the necessary modules, importantly the AiiDA ORM and engine:

In [6]:
from pathlib import Path

from aiida import orm, engine
from aiida.common.exceptions import NotExistent

In [12]:
codes = [
    {
        'label': 'remote_query',
        'path': str(Path('../../data/euro-scipy-2024/diag-wf/remote_query.py').resolve()),
        'description': 'Python code to query a remote resource and obtain matrix data.'
    },
    {
        'label': 'diagonalization',
        'path': str(Path('../../data/euro-scipy-2024/diag-wf/bin/default/diag').resolve()),
        'description': 'External executable that can diagonalize a matrix.'
    },
    {
        'label': 'plotting',
        'path': str(Path('../../data/euro-scipy-2024/diag-wf/plot_eigenvals.py').resolve()),
        'description': 'Python script to plot the eigenvalues of the matrix diagonalization.'
    }
]

loaded_codes = []

for code in codes:
    code_label = code['label']
    code_path = code['path']
    code_description = code['description']
    
    try:
        loaded_code = orm.load_code(f'{code_label}@localhost')
        print(f"Loaded {code_label}")
    except NotExistent:
        loaded_code = orm.InstalledCode(
            computer=orm.load_computer('localhost'),
            filepath_executable=code_path,
            label=code_label,
            description=code_description,
            default_calc_job_plugin='core.shell',
            prepend_text='export OMP_NUM_THREADS=1',
            append_text='',
            use_double_quotes=False,
            with_mpi=False
        ).store()
        print(f"Created and stored {code_label}")
    loaded_codes.append(loaded_code)

query_code = loaded_codes[0]
diag_code = loaded_codes[1]
plot_code = loaded_codes[2]

Loaded remote_query
Loaded diagonalization
Loaded plotting


To create a `Code` in AiiDA, various settings are required:

- First, the `Computer` where the code should be executed needs to be specified
- The absolute path to the executable must be given, as well, and we have already added the correct path for the nanoHUB deployment
- A label (to load the `Code` later on), and a description (optional) are also given
- The interface how AiiDA interacts with the given executable
- In addition, `append_text` and `prepend_text` can be added, and will appear in the submission script before and after
  the actual call to the executable. This can be useful to load modules or set environment variables (as done here to
  disable hyperthreading)
- Lastly, Let's keep things simple and serial by disabling MPI via `with_mpi=False`

Note that AiiDA's `verdi` command-line interface (CLI) is often used to set up a `Code` instance for a profile. To this end, the command:

```shell
verdi code create core.code.installed
```

needs to be run on the terminal and will ask you for all required options.

For convenience, it is also possible to provide these options via a YAML configuration file using the `--config` flat,
which can point either to a local file, or to a URL (e.g. on GitHub). The YAML configuration for our `remote_query`
executable could have the following content:

```yaml
append_text: ''
computer: localhost
default_calc_job_plugin: core.shell
description: ''
filepath_executable: <absolute-path-to-remote_query>
label: remote_query
prepend_text: ''
use_double_quotes: 'False'
with_mpi: 'False'
```

After creating our `Code`, we can then see if everything works fine by running:

In [11]:
%verdi code test remote_query
%verdi code test diagonalization
%verdi code test plotting

[32m[1mSuccess: [0m[22mall tests succeeded.[0m
[32m[1mSuccess: [0m[22mall tests succeeded.[0m
[32m[1mSuccess: [0m[22mall tests succeeded.[0m


Now that we have successfully registered our codes, let's see how we can execute them through `aiida-shell`. For this
purpose, we load the `launch_shell_job` function:

In [15]:
from aiida_shell import launch_shell_job

To which we pass:

- The loaded `Code` that we want to execute, and
- The two required command line arguments, namely
  - The path to the mocked external database from which we want to obtain data, and
  - The matrix identifier (feel free to change that to a value between 0 and 100 to obtain different results)
- Lastly, we also specify the output filename of the file that our executable will create (note that `stdout` and
  `stderr` are automatically captured by `aiida-shell`)

In [16]:
db_path = str(Path('../../data/euro-scipy-2024/diag-wf/remote/matrices.db').resolve())
matrix_pk = 0

# 1. Query a remote database for data

query_results, query_node = launch_shell_job(
    query_code,
    arguments=f'{db_path} {matrix_pk}',
    outputs=[f'matrix-{matrix_pk}.npy']
)





That was simple, wasn't it?

Now, `aiida-shell` allows us to pass the output of one job as the input of another job, so let's do that for the next
step, and then unpack it:

In [17]:
# 2. Diagonalize 

diag_results, diag_node = launch_shell_job(
    diag_code,
    arguments='{matrix_file}',
    nodes={
        'matrix_file': query_results[f'matrix_{matrix_pk}_npy']
    },
    outputs = [f'matrix-{matrix_pk}-eigvals.txt']
)

KeyError: 'matrix_0_npy'

In [None]:
# 3. Plotting of the script

plot_type = 'violin'
figure_name = f'matrix-{matrix_pk}-eigvals-{plot_type}.png'

plot_results, plot_node = launch_shell_job(
    plot_code,
    arguments='-i {eigenval_txt} -p {plot_type}',
    nodes={
        'eigenval_txt': diag_results[f'matrix_{matrix_pk}_eigvals_txt'],
        'plot_type': orm.Str(plot_type)
    },
    outputs = [figure_name]
)

In [None]:
%verdi process list -ap 1

In [None]:
from IPython.display import Image, display

# Display the image
display(Image(filename=Path(plot_node.get_remote_workdir()) / figure_name))

In [None]:
from aiida.tools.visualization import Graph
graph = Graph()
graph.recurse_ancestors(plot_node, annotate_links="both")
graph.recurse_descendants(plot_node, annotate_links="both")
graph.graphviz

## Transitioning from running locally to submitting remote jobs

## Restart from the last checkpoint and caching functionality

## Parsing and querying your results