# How to write error-resistant workflows

## Introduction
In this tutorial, we will show how to implement the error handling in a WorkGraph.


Load the AiiDA profile.

In [1]:
%load_ext aiida
from aiida import load_profile
load_profile()

Profile<uuid='57ccbf7d9e2b41b39edb2bfdaf725feb' name='default'>

## Normal WorkGraph
We will show how to implement the error handlers for the `ArithmeticAddCalculation`. We start by creating a normal WorkGraph:

In [2]:
from aiida_workgraph import WorkGraph
from aiida import orm
from aiida.calculations.arithmetic.add import ArithmeticAddCalculation


wg = WorkGraph("normal_graph")
wg.tasks.new(ArithmeticAddCalculation, name="add1")

#------------------------- Submit the calculation -------------------
code = orm.load_code("add@localhost")
wg.submit(inputs={"add1": {"code": code,
                            "x": orm.Int(1),
                           "y": orm.Int(2)
                           }},
          wait=True)
print("Task finished OK? ", wg.tasks["add1"].process.is_finished_ok)
print("Exit code: ", wg.tasks["add1"].process.exit_code)
print("Exit Message: ", wg.tasks["add1"].process.exit_message)


WorkGraph process created, PK: 78157
Task finished OK?  True
Exit code:  None
Exit Message:  None


## Error code

If the computed sum of the inputs x and y is negative, the `ArithmeticAddCalculation` fails with exit code 410. Let's reset the WorkGraph and modify the inputs:

In [3]:
wg.reset()
wg.submit(inputs={"add1": {"code": code,
                            "x": orm.Int(1),
                           "y": orm.Int(-6)
                           }},
          wait=True)
print("Task finished OK? ", wg.tasks["add1"].process.is_finished_ok)
print("Exit code: ", wg.tasks["add1"].process.exit_code)
print("Exit Message: ", wg.tasks["add1"].process.exit_message)

WorkGraph process created, PK: 78165
Task finished OK?  False
Exit code:  ExitCode(status=410, message='The sum of the operands is a negative number.', invalidates_cache=False)
Exit Message:  The sum of the operands is a negative number.


We can confirm that the task fails by:

In [4]:
%verdi process status 78165

[22mWorkGraph<normal_graph><78165> Finished [302]
    └── ArithmeticAddCalculation<78167> Finished [410][0m


## Error handling

To “register” a error handler for a WorkGraph, you simply define a function that takes the `self` and `task_name` as its arguments, and attach it as the `error_hanlders` of the WorkGraph.

You can specify the tasks and their exit codes that should trigger the error handler, as well as the maximum number of retries for a task:

```python
tasks={"add1": {"exit_codes": [410],
                "max_retries": 5}
      }
```

In [7]:
from aiida_workgraph import WorkGraph
from aiida import orm
from aiida.calculations.arithmetic.add import ArithmeticAddCalculation

def handle_negative_sum(self, task_name: str):
    """Handle the failure code 410 of the `ArithmeticAddCalculation`.
    Simply make the inputs positive by taking the absolute value.
    """
    self.report(f"Run error handler: handle_negative_sum.")
    # load the task from the WorkGraph engine
    task = self.get_task(task_name)
    # modify task inputs
    task.set({"x": orm.Int(abs(task.inputs["x"].value)),
              "y": orm.Int(abs(task.inputs["y"].value))})
    # update the task in the WorkGraph engine
    self.update_task(task)

wg = WorkGraph("normal_graph")
wg.tasks.new(ArithmeticAddCalculation, name="add1")
# register error handler
wg.attach_error_handler(handle_negative_sum, name="handle_negative_sum",
                           tasks={"add1": {"exit_codes": [410],
                                           "max_retries": 5}
                           })

#------------------------- Submit the calculation -------------------
wg.submit(inputs={"add1": {"code": code,
                            "x": orm.Int(1),
                           "y": orm.Int(-6)
                           },
                },
          wait=True)
print("Task finished OK? ", wg.tasks["add1"].process.is_finished_ok)
print("Exit code: ", wg.tasks["add1"].process.exit_code)
print("Exit Message: ", wg.tasks["add1"].process.exit_message)


WorkGraph process created, PK: 78194
Task finished OK?  True
Exit code:  None
Exit Message:  None


We can confirm that the task first fails again with a 410. Then the WorkGraph restarts the task with the new inputs, and it finishes successfully. 

In [8]:
%verdi process status 78194

[22mWorkGraph<normal_graph><78194> Finished [0]
    ├── ArithmeticAddCalculation<78195> Finished [410]
    └── ArithmeticAddCalculation<78201> Finished [0][0m


## Custom parameters for error handlers
One can also pass custom parameters to the error handler. For example, instead of simply make the inputs positive by taking the absolute value, we add an increment to the inputs. And the `increment` is a custom parameter of the error handler, which the user can specify when attaching the error handler to the WorkGraph, or update it during the execution of the WorkGraph.




In [15]:
def handle_negative_sum(self, task_name: str, increment: int = 1):
    """Handle the failure code 410 of the `ArithmeticAddCalculation`.
    Simply add an increment to the inputs.
    """
    self.report(f"Run error handler: handle_negative_sum.")
    # load the task from the WorkGraph engine
    task = self.get_task(task_name)
    # modify task inputs
    task.set({"x": orm.Int(task.inputs["x"].value + increment),
              "y": orm.Int(task.inputs["y"].value + increment)})
    # update the task in the WorkGraph engine
    self.update_task(task)


wg = WorkGraph("normal_graph")
wg.tasks.new(ArithmeticAddCalculation, name="add1")
# register error handler
wg.attach_error_handler(handle_negative_sum, name="handle_negative_sum",
                           tasks={"add1": {"exit_codes": [410],
                                           "max_retries": 5,
                                           "kwargs": {"increment": 1}}
                           })

#------------------------- Submit the calculation -------------------
wg.submit(inputs={"add1": {"code": code,
                            "x": orm.Int(1),
                           "y": orm.Int(-6)
                           },
                },
          wait=True)
print("Task finished OK? ", wg.tasks["add1"].process.is_finished_ok)
print("Exit code: ", wg.tasks["add1"].process.exit_code)
print("Exit Message: ", wg.tasks["add1"].process.exit_message)

WorkGraph process created, PK: 78256
Task finished OK?  True
Exit code:  None
Exit Message:  None


Since we increase the inputs by a `increment`, so it takes three retries before it finished successfully:

In [16]:
%verdi process status 78256

[22mWorkGraph<normal_graph><78256> Finished [0]
    ├── ArithmeticAddCalculation<78257> Finished [410]
    ├── ArithmeticAddCalculation<78263> Finished [410]
    ├── ArithmeticAddCalculation<78269> Finished [410]
    └── ArithmeticAddCalculation<78275> Finished [0][0m


One can increase the `increment` before submiting the WorkGraph:


In [19]:
wg.reset()
wg.error_handlers["handle_negative_sum"]["tasks"]["add1"]["kwargs"]["increment"] = 3
wg.submit(inputs={"add1": {"code": code,
                            "x": orm.Int(1),
                           "y": orm.Int(-6)
                           },
                },
          wait=True)

WorkGraph process created, PK: 78302


<WorkChainNode: uuid: 79bf8f75-d38a-4fae-bb26-1691ccf402b8 (pk: 78302) (aiida_workgraph.engine.workgraph.WorkGraphEngine)>

In this case, it only needs one retry to finish successfully.

In [20]:
%verdi process status 78302

[22mWorkGraph<normal_graph><78302> Finished [0]
    ├── ArithmeticAddCalculation<78303> Finished [410]
    └── ArithmeticAddCalculation<78309> Finished [0][0m


## Compare to the `BaseRestartWorkChain`
AiiDA provides a `BaseRestartWorkChain` class that can be used to write workflows that can handle known failure modes of processes and calculations.

In [19]:
from aiida.engine import BaseRestartWorkChain
from aiida.plugins import CalculationFactory
from aiida import orm
from aiida.engine import while_
from aiida.engine import process_handler, ProcessHandlerReport

ArithmeticAddCalculation = CalculationFactory('core.arithmetic.add')

class ArithmeticAddBaseWorkChain(BaseRestartWorkChain):

    _process_class = ArithmeticAddCalculation


    @classmethod
    def define(cls, spec):
        """Define the process specification."""
        super().define(spec)
        spec.expose_inputs(ArithmeticAddCalculation, namespace='add')
        spec.expose_outputs(ArithmeticAddCalculation)
        spec.outline(
            cls.setup,
            while_(cls.should_run_process)(
                cls.run_process,
                cls.inspect_process,
            ),
            cls.results,
        )

    def setup(self):
        """Call the `setup` of the `BaseRestartWorkChain` and then create the inputs dictionary in `self.ctx.inputs`.

        This `self.ctx.inputs` dictionary will be used by the `BaseRestartWorkChain` to submit the process in the
        internal loop.
        """
        super().setup()
        self.ctx.inputs = self.exposed_inputs(ArithmeticAddCalculation, 'add')
    
    @process_handler
    def handle_negative_sum(self, node):
        """Check if the calculation failed with `ERROR_NEGATIVE_NUMBER`.

        If this is the case, simply make the inputs positive by taking the absolute value.

        :param node: the node of the subprocess that was ran in the current iteration.
        :return: optional :class:`~aiida.engine.processes.workchains.utils.ProcessHandlerReport` instance to signal
            that a problem was detected and potentially handled.
        """
        if node.exit_status == ArithmeticAddCalculation.exit_codes.ERROR_NEGATIVE_NUMBER.status:
            self.ctx.inputs['x'] = orm.Int(abs(node.inputs.x.value))
            self.ctx.inputs['y'] = orm.Int(abs(node.inputs.y.value))
            return ProcessHandlerReport()


In the `BaseRestartWorkChain`, the error handling is implemented for a specific Calculation class. While, the error handling in a WorkGraph is more general and can be applied to any task in the WorkGraph, with custom parameters.

## Summary
Here we have shown how to implement error handling in a WorkGraph.