# Basic tutorial

The goal of this tutorial is to give you an idea of how AiiDA helps you in executing data-driven workflows.
At the end of this tutorial, you will know how to:

- Store data in the database and subsequently retrieve it.
- Decorate a Python function such that its inputs and outputs are automatically tracked.
- Run and monitor the status of processes.
- Explore and visualize the provenance graph.


In [None]:
%load_ext aiida
%aiida

## Data nodes

Before running any calculations, let's create and store a *data node*.
AiiDA implements data node types for the most common types of data (int, float, str, etc.), which you can extend with your own (composite) data node types if needed.
For this tutorial, we'll keep it very simple, and start by initializing an `Int` node and assigning it to the `x` variable:

In [None]:
from aiida import orm

x = orm.Int(2)

We can check the contents of the `node` variable like this:

In [None]:
x

Quite a bit of information on our freshly created node is returned:

- The data node is of the type `Int`
- The node has the *universally unique identifier* (**UUID**)
- The node is currently not stored in the database `(unstored)`
- The integer value of the node is `2`

Let's store the node in the database:

In [None]:
x.store()

As you can see, the data node has now been assigned a *primary key* (**PK**), a number that identifies the node in your database `(pk: 1)`.
The PK and UUID both reference the node with the only difference that the PK is unique *for your local database only*, whereas the UUID is a globally unique identifier and can therefore be used between *different* databases.
Use the PK only if you are working within a single database, i.e. in an interactive session and the UUID in all other cases.

> **Note**
> 
> The PK numbers shown throughout this tutorial assume that you start from a completely empty database.
> It is possible that the nodes' PKs will be different for your database!
>
> The UUIDs are generated randomly and are, therefore, **guaranteed** to be different.


Next, let's use the `verdi` command line interface (CLI) to check the data node we have just created:

In [None]:
%verdi node show 1

Once again, we can see that the node is of type `Int`, has its PK and UUID.
Besides this information, the `verdi node show` command also shows the (empty) `label` and `description`, as well as the time the node was created (`ctime`) and last modified (`mtime`).

> **Note**
> AiiDA already provides many standard data types, but you can also [create your own](https://aiida.readthedocs.io/projects/aiida-core/en/stable/topics/data_types.html#topics-data-types-plugin).


## Calculation functions

Once your data is stored in the database, it is ready to be used for some computational task.
For example, let's say you want to multiply two `Int` data nodes.
The following Python function:

```python
def multiply(x, y):
    return x * y
```

will give the desired result when applied to two `Int` nodes, but the calculation will not be stored in the provenance graph.
However, we can use a [Python decorator](https://docs.python.org/3/glossary.html#term-decorator) provided by AiiDA to automatically make it part of the provenance graph, as shown below:

In [None]:
from aiida import engine

@engine.calcfunction
def multiply(x, y):
    return x * y

This converts the `multiply` function into an AiIDA *calculation function*, the most basic execution unit in AiiDA.
Next, Let's create a new `Int` data node and assign it to the variable `y`, and then run the `multiply` function with the `x` and `y` data nodes as inputs:

In [None]:
y = orm.Int(3)

Now it's time to multiply the two numbers!

In [None]:
multiply(x, y)

Success!
The `calcfunction`-decorated `multiply` function has multiplied the two `Int` data nodes and returned a new `Int` data node whose value is the product of the two input nodes.
Note that by executing the `multiply` function, all input and output nodes are automatically stored in the database:

In [None]:
y

We had not yet stored the data node assigned to the `y` variable, but by providing it as an input argument to the `multiply` function, it was automatically stored with PK = 2.
Similarly, the returned `Int` node with value 6 has been stored with PK = 4.

Let's look for the process we have just run using the `verdi` CLI:

In [None]:
%verdi process list -a

We can see that our `multiply` calcfunction was created 1 minute ago, assigned the PK 3, and has `Finished`.

### Provenance graph
An AiiDA database does not only contain the results of your calculations, but also their inputs and each step that was executed to obtain them. All of this information is stored in the form of a directed acyclic graph (DAG).
Let's have a look at the provenance of this simple calculation.
The provenance graph can be automatically generated using the verdi CLI.
Let's generate the provenance graph for the `multiply` calculation function we have just run with PK = 3:

> **note**
> Remember that the PK of the `CalcJob` can be different for your database.

```console
$ verdi node graph generate 3
```

The command will write the provenance graph to a `.pdf` file.
Use your favorite PDF viewer to have a look.
It should look something like the graph shown below.

In [None]:
from aiida.tools.visualization import Graph
graph = Graph()
calc_node = orm.load_node(3)
graph.add_incoming(calc_node, annotate_links="both")
graph.add_outgoing(calc_node, annotate_links="both")
graph.graphviz

In the provenance graph, you can see different types of *nodes* represented by different shapes.
The green ellipses are `Data` nodes, and the rectangles represent *processes*, i.e. the calculations performed in your *workflow*.

The provenance graph allows us to not only see what data we have, but also how it was produced.

## CalcJobs
When running calculations that require an external code or run on a remote machine, a simple calculation function is no longer sufficient.
For this purpose, AiiDA provides the `CalcJob` process class.

To run a `CalcJob`, you need to set up two things: a `code` that is going to implement the desired calculation and a `computer` for the calculation to run on.

In the previous section, the `verdi presto` command automatically configures the local workstation as the `localhost` computer for you.

Now, let's set up the code we're going to use for the tutorial. The following command sets up a code with *label* `add` on the *computer* `localhost`, using the *plugin* `core.arithmetic.add`.


More details for how to [run external codes](https://aiida.readthedocs.io/projects/aiida-core/en/stable/howto/run_codes.html#how-to-run-codes) can be found in the AiiDA documentation.

In [None]:
%verdi code create core.code.installed --label add --computer=localhost --default-calc-job-plugin core.arithmetic.add --filepath-executable=/bin/bash -n

A typical real-world example of a computer is a remote supercomputing facility.
Codes can be anything from a Python script to powerful *ab initio* codes such as Quantum Espresso or machine learning tools like Tensorflow.
Let's have a look at the codes that are available to us:

In [None]:
%verdi code list

You can see a single code `add@tutor`, with PK = 5, in the printed list.
This code allows us to add two integers together.
The `add@tutor` identifier indicates that the code with label `add` is run on the computer with label `tutor`.
To see more details about the computer, you can use the following `verdi` command:

In [None]:
%verdi computer show localhost

We can see that the *Work directory* has been set up as the `work` subdirectory of the current directory.
This is the directory in which the calculations running on the `localhost` computer will be executed.

> **note**
> You may have noticed that the PK of the `localhost` computer is 1, same as the `Int` node we created at the start of this tutorial.
> This is because different entities, such as nodes, computers and groups, are stored in different tables of the database.
> So, the PKs for each entity type are unique for each database, but entities of different types can have the same PK within one database.

Let's now load the `add@localhost` code using its label:

In [None]:
code = orm.load_code(label='add')
code

Every code has a convenient tool for setting up the required input, called the *builder*.
It can be obtained by using the `get_builder` method:

In [None]:
builder = code.get_builder()
builder

Using the builder, you can easily set up the calculation by directly providing the input arguments.
Let's use the `Int` node that was created by our previous `calcfunction` as one of the inputs and a new node as the second input:

In [None]:
builder.x = orm.load_node(pk=4)
builder.y = orm.Int(5)
builder

In case that your nodes' PKs are different and you don't remember the PK of the output node from the previous calculation, check the provenance graph you generated earlier and use the UUID of the output node instead:

```ipython
In [3]: builder.x = orm.load_node(uuid='42541d38')
   ...: builder.y = orm.Int(5)
```

Note that you don't have to provide the entire UUID to load the node.
As long as the first part of the UUID is unique within your database, AiiDA will find the node you are looking for.

> **note**
> One nifty feature of the builder is the ability to use tab completion for the inputs.
> Try it out by typing `builder.` + `<TAB>` in the verdi shell.

To execute the `CalcJob`, we use the `run` function provided by the AiiDA engine, and wait for the process to complete:

In [None]:
engine.run(builder)

Besides the sum of the two `Int` nodes, the calculation function also returns two other outputs: one of type `RemoteData` and one of type `FolderData`.
See the [topics section on calculation jobs](https://aiida.readthedocs.io/projects/aiida-core/en/stable/topics/calculations/usage.html#calculation-jobs) for more details.
Now, once more check for *all* processes:

In [None]:
%verdi process list -a

You should now see two processes in the list.
One is the `multiply` calcfunction you ran earlier, the second is the `ArithmeticAddCalculation` CalcJob that you have just run.
Grab the PK of the `ArithmeticAddCalculation`, and generate the provenance graph.
The result should look like the graph shown below.

```console
$ verdi node graph generate 7
```

In [None]:
from aiida.tools.visualization import Graph
graph = Graph()
calc_node = orm.load_node(7)
graph.recurse_ancestors(calc_node, annotate_links="both")
graph.add_outgoing(calc_node, annotate_links="both")
graph.graphviz

You can see more details on any process, including its inputs and outputs, using the verdi shell:

In [None]:
%verdi process show 7

## Workflows

So far we have executed each process manually.
AiiDA allows us to automate these steps by linking them together in a *workflow*, whose provenance is stored to ensure reproducibility.
For this tutorial we have prepared a basic `WorkChain` that is already implemented in `aiida-core`.
You can see the code below:

<details>
<summary>Click to show/hide code</summary>

```python
from aiida.engine import ToContext, WorkChain, calcfunction
from aiida.orm import AbstractCode, Int
from aiida.plugins.factories import CalculationFactory

ArithmeticAddCalculation = CalculationFactory('core.arithmetic.add')


@calcfunction
def multiply(x, y):
    return x * y


class MultiplyAddWorkChain(WorkChain):
    """WorkChain to multiply two numbers and add a third, for testing and demonstration purposes."""

    @classmethod
    def define(cls, spec):
        """Specify inputs and outputs."""
        super().define(spec)
        spec.input('x', valid_type=Int)
        spec.input('y', valid_type=Int)
        spec.input('z', valid_type=Int)
        spec.input('code', valid_type=AbstractCode)
        spec.outline(
            cls.multiply,
            cls.add,
            cls.validate_result,
            cls.result,
        )
        spec.output('result', valid_type=Int)
        spec.exit_code(400, 'ERROR_NEGATIVE_NUMBER', message='The result is a negative number.')

    def multiply(self):
        """Multiply two integers."""
        self.ctx.product = multiply(self.inputs.x, self.inputs.y)

    def add(self):
        """Add two numbers using the `ArithmeticAddCalculation` calculation job plugin."""
        inputs = {'x': self.ctx.product, 'y': self.inputs.z, 'code': self.inputs.code}
        future = self.submit(ArithmeticAddCalculation, **inputs)

        return ToContext(addition=future)

    def validate_result(self):
        """Make sure the result is not negative."""
        result = self.ctx.addition.outputs.sum

        if result.value < 0:
            return self.exit_codes.ERROR_NEGATIVE_NUMBER

    def result(self):
        """Add the result to the outputs."""
        self.out('result', self.ctx.addition.outputs.sum)
```
First, we recognize the `multiply` function we have used earlier, decorated as a `calcfunction`.
The `define` class method specifies the `input` and `output` of the `WorkChain`, as well as the `outline`, which are the steps of the workflow.
These steps are provided as methods of the `MultiplyAddWorkChain` class.


Let's import and run the `MultiplyAddWorkChain`. Similar to a `CalcJob`, the `WorkChain` input can be set up using a builder:

In [None]:
from aiida.workflows.arithmetic.multiply_add import MultiplyAddWorkChain

from aiida import orm
builder = MultiplyAddWorkChain.get_builder()
builder.code = orm.load_code(label='add')
builder.x = orm.Int(2)
builder.y = orm.Int(3)
builder.z = orm.Int(5)
builder

Once the `WorkChain` input has been set up, we run it with the AiiDA engine:

In [None]:
from aiida import engine
engine.run(builder)

Now quickly leave the IPython shell and check the process list:

In [None]:
%verdi process list -a

We can see that the `MultiplyAddWorkChain` and its *child process* should be in the `Finished` state.

We can now generate the full provenance graph for the `WorkChain` with:

```console
$ verdi node graph generate 14
```

In [None]:
from aiida.tools.visualization import Graph
graph = Graph()
calc_node = orm.load_node(14)
graph.recurse_ancestors(calc_node, annotate_links="both")
graph.recurse_descendants(calc_node, annotate_links="both")
graph.graphviz

# Next Steps

Congratulations! You have completed the first step to becoming an AiiDA expert.

To further enhance your skills, we have compiled several how-to guides tailored for key use cases:

- **Querying and Sharing Your Data**: Once you have run multiple computations, the [Managing Data](3-managing-data.ipynb) guide will show you how to efficiently explore and share your data.

- **Real-World Example**: The [Quantum Espresso](2-qe-pw-aiida.ipynb) guide demonstrates how to run a Quantum Espresso calculation using AiiDA, providing a practical example of its application.

- **Designing a Workflow**: Learn how to encode the logic of a typical scientific workflow with the [EOS Workflow](4-eos-workflow.ipynb) guide.

These resources will help you deepen your understanding and proficiency with AiiDA. Happy exploring!
