## Schema dictionaries

The basis of the input/output parsers is a set of functions, which extract type, order and various other information from the `FleurInputSchema.xsd` and `FleurOutputSchema.xsd` files for different versions. The obtained information is stored in large dictionaries next to the schema files.

To load the information we use the `load_inpschema` and `load_outschema` functions by providing the desired version string. They both work in the same way but are needed, since the outputschema implicitly includes information from the inputschema

In [None]:
from masci_tools.io.parsers.fleur.fleur_schema import load_inpschema
load_inpschema?

schema_dict = load_inpschema('0.33')
print(schema_dict.keys())

If we also want a python object to validate files against this schema we provide the `schema_return` argument

In [None]:
schema_dict, xmlschema = load_inpschema('0.33', schema_return=True)
print(type(xmlschema))

To get an explanation of the keys in the schema dictionary we can pass the `show_help` argument

In [None]:
schema_dict = load_inpschema('0.33', show_help=True)

Let's for example take a look at `attrib_types`. Here all attributes are classified for the conversion from the strings we get from the xml file. If there are multiple possible types the conversion function will start at the first type and stop when a conversion was successful (`string` is put in last place at all times)

In [None]:
from pprint import pprint
pprint(schema_dict['attrib_types'])

In `tag_paths` all possible names of tags are mapped to possible simple xpaths through the input file

In [None]:
pprint(schema_dict['tag_paths'])

There are multiple keys for attributes and text tags (`unique_attribs`, `unique_path_attribs` and `other_attribs`), which classify the attributes in terms of three categories:

1. unique attributes can only occur once in the input file and there is only one possible path
2. unique path attributes can occur in multiple places but each place has only one occurence (name clashes for exmaple `spinf`)
3. other attributes

In [None]:
pprint(schema_dict['unique_attribs'])

In [None]:
pprint(schema_dict['unique_path_attribs'])

This is useful but it does not provide a utility, to get a path and guarantee that you end up with a unique path. For this there are the functions `get_tag_xpath` and `get_attrib_xpath`. They are used by providing the name of the tag/attribute in question and other criteria to select the right path

In [None]:
from masci_tools.util.schema_dict_util import get_tag_xpath
get_tag_xpath?

In [None]:
print(get_tag_xpath(schema_dict, 'bzIntegration'))

If the path is not unique an error is raised and we have to be more specific with the selection

In [None]:
print(get_tag_xpath(schema_dict, 'ldaU'))

In [None]:
print(get_tag_xpath(schema_dict, 'ldaU', contains='species'))

In [None]:
print(get_tag_xpath(schema_dict, 'ldaU', not_contains='atom'))

If there is no possible path to fullfill the criteria the function also raises an error

In [None]:
print(get_tag_xpath(schema_dict, 'ldaU', contains='species', not_contains='atom'))

These functions allow for easy version support between different file versions if the tag names themselves do not change

In [None]:
from masci_tools.util.schema_dict_util import get_attrib_xpath
schema_dict_max4 = load_inpschema('0.31')
print(get_attrib_xpath(schema_dict, 'valenceElectrons'))
print(get_attrib_xpath(schema_dict_max4, 'valenceElectrons'))

More detailed information about the attributes and tags, which can be on a given tag can be found in the `tag_info` key. This part of the schema dictionary is indexed by the simple xpaths to avoid name clashes

In [None]:
pprint(schema_dict['tag_info']['/fleurInput'])

In [None]:
pprint(schema_dict['tag_info']['/fleurInput/atomSpecies/species'])