Skip to content

Error handling

Bonan Zhu edited this page Nov 26, 2021 · 6 revisions

Error handling that has been implemented by @zhubonan is great. The summary of his work is found at https://github.com/aiida-vasp/aiida-vasp/pull/473. The key information is run_status returned from file parsers. Currently run_status definition seems not centralized.

How it works

VaspParser

  1. If misc node is going to be attached to outputs, run_status is registered to be collected because it is in the list of NODES['misc']['quantities'] (ParserSettings). Since The misc node must be attached in our definition, run_status should be collected always.
  2. Following quantity_names_to_parse, each particular parser (e.g., vasprun.py) collects values from the attributes of itself. Most of VASP raw values are stored in the particular parser class instance and accessible via the attributes. Those raw values are parsed by the corresponding parsevasp class. DEFAULT_OPTIONS['quantities_to_parse'] in the particular parser module (i.e., .py file) seems not to be used in general.
  3. Because of (2), each particular parser (e.g., vasprun.py) returns run_status dict as an attribute (via get_quantity(quantity_key)).
  4. The run_status dict is put in the unstored misc Dict node (by node_composer.get_node_composer_inputs and self.out) because run_status is in the list of NODES['misc']['quantities'].
  5. The misc node is put in self._outputs dict of Parser.

VaspCalculation

  1. AiiDA invokes Parser.parse method through CalcJob.parse method.
  2. CalcJob attaches the nodes returned from Parser to its output port. Therefore, misc is stored here.
  3. Along with (2), Parser.parse returns an exit code to CalcJob, and then CalcJob.parse returns an exit code to AiiDA. Note that exit codes of Parser are the same as those defined in the particular CalcJob, i.e., mostly VaspCalculation for us.

VaspWorkChain

This is a subclass of BaseRestartWorkChain. The behaviour can be described as:

run restart until [0] exit code is achieved for the underlying calculation

A series handlers are implemented to act upon zero and non-zero codes, they are run in the descending order of their pre-defined priority.

First, the return code of the VaspCalculation is checked:

  • If the return codee is zero:

    • each handler is called in the order, respecting any filter of exit code. Some handler may choose only run for certain exit code.
    • If no error is found, then the workchain will terminate with [0].
    • If any of the handler returns a ProcessHandler report, the calculation will be restarted
    • If any of the handler returns a ProcessHandler report with an ExitCode, the workchain will be aborted.
  • If the return code is non-zero:

    • The rules are the same as above, except that:
    • If no error is found, e.g. all handlers pass, then the workchain will be restarted with indentical inputs. After two successive restarts, the workchain will abort.

The table below also summarise the behaviours:

VaspCalculation result Handler report? Handler exit code Action
Success yes == 0 Restart
Success yes != 0 Abort
Success No N/A Terminate
Failed yes == 0 Restart
Failed yes != 0 Abort
Failed No N/A Abort

Thhe run_status output is critical as it contains diagonsis information for the workchain to decide what to do.

  1. run_status is obtained from VaspCalculation's outputs.misc node.
  2. BaseRestartWorkChain.inspect_process initiates process handlers to analyze VaspCalculation's results (mainly run_status) and decides if next iteration will be performed. See the details about process_handler function in aiida-core.
  3. Expose outputs of VaspCalculation to VaspWorkChain, if the calculation succeeded, or handler_always_attach_outputs is enabled.

Handlers can be turnned on/off using handler_overrides input:

builder.handler_overrides = Dict(dict={#
    "handler_always_attach_outputs": True,  # Attach the output, even if the workchain is aborted 
    "handler_electronic_conv": False,       # Do not handle electronic convergence failures
    "handler_ionic_conv_enhanced": Tre,     # Turn on enhance handling - this hanlder is untested

    }
)