# Building robust workflows with strong provenance

And we will do that using:
<img src="../../data/aiida-logo.png" width="500" style="height:auto; display:block; margin-left:auto; margin-right:auto;">

An open-source Python infrastructure to help researchers with:
- automating,
- managing,
- persisting,
- sharing, and
- reproducing
the complex workflows associated with modern computational science and all associated data.

***
## Provenance: A robust solution for process management and data traceability

What is a process or a calculation, fundamentally?

Well, it's just a data transformation!

<img src="../../data/aiida-calculation-recipe.jpg" width="500" style="height:auto; display:block; margin-left:auto;
margin-right:auto;">

When doing this via AiiDA, it stores:
- The data transformations or calculations
- The inputs and their metadata
- The outputs and their metadata
- Most crucially: The inter-connections

While doing so, AiiDA creates a directed acyclic graph (DAG), which implements some important features:
- Once data is stored, it cannot be modified &rarr; **provenance**
- Data is queryable and can always be traced back &rarr; **reproducibility**
- Checkpointing allows for **continuation** (even if computer is shut down)
- **Caching** prevents running the same calculation twice

***
## Scalability, interoperability, and high-throughput performance

### Learning by example: The LUMI hero run

AiiDA was built with high-throughput and the upcoming exa-scale area in mind:

<img src="../../data/lumi-hero-run.jpg" width="500" style="height:auto; display:block; margin-left:auto;
margin-right:auto;">

The hero run:

- Utilized a full partition of LUMI-C: **1,500** nodes with **128** cores each (**192k** cores in total)
- **~15k** simulations (geometry optimizations of inorganic compounds) orchestrated with AiiDA in **13** hours runtime
- **~8k** issues dealt with on the fly

During all of this, **AiiDA runs on the local machine**. So no need to:

- Mirror your local environment to the HPC
- Ask the HPC admin to install software for you
- Getting banned from the HPC because a background process is continuously running

## The cogs and wheels behind AiiDA: Architecture and dependencies

To achieve such performance, AiiDA requires:
- The **RabbitMQ** message broker, allowing for multiple background daemon workers that monitor processes and write to the
- **PostgreSQL** database where data *nodes* are stored, and which allows for concurrent write access by the daemon
  workers

It further makes use of:
- An object-relational-mapper (ORM) which links entries in the database to the Python objects we will be dealing with
- A custom [disk-objectstore](https://aiida.readthedocs.io/projects/aiida-core/en/latest/internals/storage/repository.html#the-disk-object-store) file repository, where raw files are stored
  in an efficient, machine-readable manner, and can be *packed* for quick backup and export
- A custom [daemon](https://aiida.readthedocs.io/projects/aiida-core/en/latest/topics/daemon.html#daemon) that handles
  the execution and retrieval of simulations
- 