# How to quickly create a workflow from a set of executables

To run the following Python cells, we need to make sure that we select the correct kernel `Python3.10 (AIIDA)`. If it is
not already selected, do so as follows:

<img src="../../data/change_notebook_kernel.png" width="500" style="height:auto; display:block; margin-left:auto; margin-right:auto;">

## Quickly set up a running instance

### Interacting with AiiDA

AiiDA can be controlled in two ways:

1. Using the `verdi` command line interface (CLI), or `%verdi` magic in Jupyter notebooks.
2. Using the `aiida` Python API

For each project in AiiDA, we set up a **profile**, which defines the connection to the data storage (SQLite or PostgreSQL database and file repository), configuration, and other settings.

### Creating a profile

As of AiiDA **v2.6.1** which was released on 2024-07-01, it is now also possible to create a profile without the
PostgreSQL and RabbitMQ services mentioned in the beginning. For the sake of this tutorial, we will use this simplified
version, and we refer you to the [installation instructions on
RTD](https://aiida.readthedocs.io/projects/aiida-core/en/stable/installation/index.html) for more information on how to
set up a fully functional high-performance profile.

For setting up our profile, we just need to run the following notebook cell:

In [None]:
!/apps/share64/debian10/anaconda/anaconda-7/envs/AIIDA/bin/verdi presto --profile-name euro-scipy-2024

Now that we have created a profile, for convenience, we will now load the AiiDA jupyter extension. This will allow us
to use the `%verdi` jupyter magic commands, rather than having to run them in a subshell with the full, absolute
path to the `verdi` executable as done in the cell above.

In addition, this makes the `%aiida` jupyter magic command available that, when executed, will automatically load the
previously created `euro-scipy-2024` default profile. Alternatively, a specific profile can also be loaded as follows:
```python
from aiida import load_profile
load_profile('euro-scipy-2024')
```
which is the typical way to load a profile and what you will see in most code snippets on the internet.

In [6]:
%load_ext aiida
%aiida

The aiida extension is already loaded. To reload it, use:
  %reload_ext aiida


Now, we set a some configuration options for our profile:

In [7]:
%verdi config set warnings.development_version false
%verdi config set warnings.showdeprecations false



And verify that the profile was created successfully via:

In [8]:
%verdi status

[32m[22m ✔ [0m[22mversion:     AiiDA v2.6.1[0m
[32m[22m ✔ [0m[22mconfig:      /home/geiger_j/aiida_projects/fair-workflows-workshop/.aiida[0m
[32m[22m ✔ [0m[22mprofile:     euro-scipy-2024[0m
[32m[22m ✔ [0m[22mstorage:     SqliteDosStorage[/home/geiger_j/aiida_projects/fair-workflows-workshop/.aiida/repository/sqlite_dos_19cf4b6c9a8a4e31bd2902ba52fc3e86]: open,[0m
[32m[22m ✔ [0m[22mbroker:      RabbitMQ v3.9.13 @ amqp://guest:guest@127.0.0.1:5672?heartbeat=600[0m


  "cipher": algorithms.TripleDES,
  "class": algorithms.TripleDES,


[33m[22m ⏺ [0m[22mdaemon:      The daemon is not running.[0m


should show something like:

```shell
 ✔ version:     AiiDA v2.6.2
 ✔ config:      /home/nanohub/<your-user>/.aiida
 ✔ profile:     euro-scipy-2024
 ✔ storage:     SqliteDosStorage[/home/nanohub/<your-user>/.aiida/repository/sqlite_dos_b25c3582f65647beb068a3e50636a274]: open,
 ⏺ broker:      No broker defined for this profile: certain functionality not available. See https://aiida-core.readthedocs.io/en/stable/installation/guide_quick.html#quick-install-limitations
 ⏺ daemon:      No broker defined for this profile: daemon is not available. See {URL_NO_BROKER}
```

### `Computer`s and `Code`s

The `verdi presto` command used to create the AiiDA profile automatically sets up your local workstation as
the `localhost` computer. This will suffice for the sake of the tutorial.

To set up additional `Computer`s in the future, e.g. remote HPC resources, they will need to be registered
in AiiDA, providing the necessary SSH and scheduler options. For further information, we refer to the [relevant section of the documentation](https://aiida.readthedocs.io/projects/aiida-core/en/stable/howto/run_codes.html#how-to-set-up-a-computer).

In a similar manner, executables must be registered in AiiDA, where they are represented as instances of `Code` classes.
We will see how we can do this in the following cells where we will set up a multi-step workflow in AiiDA.

***
## Concatenating several scripts to one workflow

### Before we get started

The two tools we will present here for the construction of multi-step workflows through AiiDA are
[`aiida-shell`](https://github.com/sphuber/aiida-shell/) and
[`aiida-workgraph`](https://github.com/aiidateam/aiida-workgraph/).

Both of these tools are set up as external AiiDA plugins and need to be additionally installed. So let's do just that

(TODO: Pin version numbers?)

In [None]:
!/apps/share64/debian10/anaconda/anaconda-7/envs/AIIDA/bin/python -m pip install aiida-shell==0.7.3
!/apps/share64/debian10/anaconda/anaconda-7/envs/AIIDA/bin/python -m pip install aiida-workgraph

Both of these tools don't replace `aiida-core`, but instead provide simplified entry points for workflow creation in
AiiDA:

<br>

<img src="../../data/aiida-core-shell-workgraph.jpg" width="800" style="height:auto; display:block; margin-left:auto;
margin-right:auto;">

For more in-depth information on how to write AiiDA workflows in the *classical* way, that is, by writing a custom
`WorkChain` class, we refer the interested reader to the [relevant documentation
section](https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/write_workflows.html), as well as material from [past AiiDA virtual tutorials](https://aiida-tutorials.readthedocs.io/en/latest/sections/writing_workflows/index.html).

Lastly, it is important to note that, while the `aiida-shell` API has been quite stable for a while, the
`aiida-workgraph` is still very much under active development. So any feedback you might have during this tutorial will
be very valuable to us!

### The workflow setup

Assume we would like to execute a workflow that is composed of the following steps:

- 1. Create a database that contains some matrices 
- 2. Run a code that achieves matrix diagonalizations and writes the eigenvalues and eigenvectors to files on disk
- 3. Postprocessing, e.g. clean up any intermediate files (JG: not too happy about that, possibly we could do plotting or sthg similar) 

Naturally, for this tutorial, the example serves mainly demonstration purposes. However, to motivate our choices of
tasks, one could imagine the following concrete use cases:

- 1. Query and download atomic structures from a materials database via their API
- 2. Run structure optimizations using Quantum Mechanical codes (these, like many other numerical codes, run matrix diagonalizations)
- 3. Raw output files from the previous steps might be too large (terrabytes) to be retrieved, so postprocessing and
     cleanup might be necessary

Note that AiiDA was originally created for materials science applications, so we are aware that the examples reflect
that. If you think of other use cases, feel free to implement them after this tutorial and let us know about it :star: 

Each of these steps can be of arbitrary nature, e.g. an executable on your system, a shell script, Python code, etc.

We provide those for the exemplary workflow outlined above as pre-compiled binaries. Their source code doesn't really
matter. If you are interested, you can still find the source code under the `data` directory.

As mentioned above, to run executables through AiiDA, we must fist register them. We can now do that through the Python
API:

In [9]:
from pathlib import Path

from aiida import orm, engine
from aiida.common.exceptions import NotExistent

In [15]:
query_code_label = 'remote_query'
query_code_path = str(Path('../../data/euro-scipy-2024/diag-wf/remote_query.py').resolve())

try:
    query_code = orm.load_code(f'{query_code_label}@localhost')  # The computer label can also be omitted here
    print(f"Loaded {query_code_label}")
except NotExistent:
    query_code = orm.InstalledCode(
        computer=orm.load_computer('localhost'),
        filepath_executable=query_code_path,
        label=query_code_label,
        description='Python code to query a remote resource and obtain matrix data.',
        default_calc_job_plugin='core.shell',
        prepend_text='export OMP_NUM_THREADS=1',
        append_text='',
        use_double_quotes=False,
        with_mpi=False
    ).store()
    print(f"Created and stored {query_code_label}")

Loaded remote_query


In [16]:
diag_code_label = 'diagonalization'
diag_code_path = str(Path('../../data/euro-scipy-2024/diag-wf/diager-rs/x86_64-unknown-linux-gnu/diager-rs').resolve())

try:
    diag_code = orm.load_code(f'{diag_code_label}@localhost')  # The computer label can also be omitted here
    print(f"Loaded {diag_code_label}")
except NotExistent:
    diag_code = orm.InstalledCode(
        computer=orm.load_computer('localhost'),
        filepath_executable=diag_code_path,
        label=diag_code_label,
        description='',
        default_calc_job_plugin='core.shell',
        prepend_text='export OMP_NUM_THREADS=1',
        append_text='',
        use_double_quotes=False,
        with_mpi=False
    ).store()
    print(f"Created and stored {diag_code_label}")

Loaded diagonalization


TODO: Add section on other ways to set up `Code`s from FAIR workflows notebook

In [None]:
from aiida_shell import launch_shell_job

db_path = str(Path('../../data/euro-scipy-2024/diag-wf/remote/matrices.db').resolve())

# ? 1. Query a remote database for data

matrix_pk = 0

query_results, query_node = launch_shell_job(
    query_code,
    arguments=f'{db_path} {matrix_pk}',
    outputs=[f'{matrix_pk}.npy']
)

# ? 2. Diagonalize 

diag_results, diag_node = launch_shell_job(
    diag_code,
    arguments='{matrix_file}',
    nodes={
        'matrix_file': query_results[f'aiida_shell_{matrix_pk}_npy']
    },
    outputs = [f'{matrix_pk}-eigenvals.txt']
)

# ? 3. Plotting of the script

# plot_results, plot_node = launch_shell_job(
#     plot_script,
#     arguments='{pap_csv}',
#     nodes={
#         'pap_csv': count_results['pain_and_pleasure_csv']
#     },
#     outputs = ['pain_and_pleasure.png']
# )

## Transitioning from running locally to submitting remote jobs

## Restart from the last checkpoint and caching functionality

## Parsing and querying your results