## Using the parsing functions in the `schema_dict_util` module

The output parser uses a set of functions to perform the actual parsing. But even outside the output parser they can be very useful.

In [None]:
import masci_tools.util.schema_dict_util

This module contains a lot of different small functions, which make use of the schema dictionary. For example the `get_tag_xpath` and `get_attrib_xpath` functions from tutorial 3 are defined here. Then there is `read_constants`, which determines the mathemtical constants which can be used in the input file. The rest of the functions make s use of `get_tag_xpath` or `get_attrib_xpath` to gain easy access to parsing small parts of the xml files

In [None]:
help(masci_tools.util.schema_dict_util)

To use these functions, we need to get the root of the xml file and the right schema_dict. For functions actually converting attribute values we also need to read in the defined constants (In additiona to defined constants there is also a default set)

In [None]:
from lxml import etree
from masci_tools.io.parsers.fleur.fleur_schema import load_inpschema
from masci_tools.util.schema_dict_util import read_constants
root = etree.parse('./files/Fe_Example_input.xml').getroot()
schema_dict = load_inpschema(root.xpath('//@fleurInputVersion')[0])
constants = read_constants(root, schema_dict)

All parsing functions have the same interface. Let's start with tag_exists. This will tell us if a certain tag is present in the input file

In [None]:
from masci_tools.util.schema_dict_util import tag_exists
print(tag_exists(root, schema_dict, 'filmPos'))
print(tag_exists(root, schema_dict, 'relPos'))

These functions also take the same arguments as `get_tag_xpath` for specifying the concrete path

In [None]:
from masci_tools.util.schema_dict_util import get_number_of_nodes
print(get_number_of_nodes(root, schema_dict, 'ldaU'))

In [None]:
print(get_number_of_nodes(root, schema_dict, 'ldaU', contains='species'))

The function `evaluate_attribute` also allows to directly specify the tag, where the attribute should be parsed. This makes specifying attributes with common names a lot easier (like `units` in the output file for example)

In [None]:
from masci_tools.util.schema_dict_util import evaluate_attribute
print(evaluate_attribute(root, schema_dict, 'name', constants=constants, tag_name='species'))

Another option for specifying, which path is supposed to be parsed is using a Element of the xml tree, that is not the root. For example if we would want to parse the `mtSphere` tag for the atom species the naive approch would throw an error since all tags in the atoms section can occur in `species` or `atomGroup`.

In [None]:
from masci_tools.util.schema_dict_util import evaluate_tag
print(evaluate_tag(root, schema_dict, 'mtSphere', constants=constants))

One easy way to circumvent this, is to first get the `species` element via the `eval_simple_xpath` function and then performing the same call as before on the `species` element.
If any of these parsing functions sees a element different from the root element it will constrain the path to contain the tag of that element. Subsequently the absolute path is converted into a relative path starting at the given element.

The alternative is to add `contains=species` but if there are sections with lots of calls for species attributes Then this approach will definitely be less cumbersome.

In [None]:
from masci_tools.util.schema_dict_util import evaluate_tag, eval_simple_xpath
species_elem = eval_simple_xpath(root, schema_dict, 'species')
print(evaluate_tag(species_elem, schema_dict, 'mtSphere', constants=constants))