# ConfigReader
Configuration files are an important part for many applications. One of the most common type of configuration files are `.ini` files. In Python the most common way of reading these files is the `configparser` module. However, it provides only very basic functionality and a lot of work is required on the side of the application. For instance, by default all values are read in as strings. To convert them to a native Python type, one of the methods `getint`, `getbool`, etc. have to be called. It also doesn't allow for any sub-sections in the file or for easy constant definitions.

I started to work on an alternative in my [BnsLib](https://github.com/MarlinSchaefer/BnsLib/blob/master/src/BnsLib/utils/config.py). One of the core features of this implementation was the ability to dynamically infer the type of the variable and allow for safe function calls from the config file. However, the implementation was done from the bottom up and thus very error prone. Although I haven't really seen many bugs, a higher level implementation would be benefitial. Furthermore, having a full-fledged configuration file reader that also allows for sub-sections would have been nice.

One Sunday I had a bit of time and so tried my hands on a better and standalone implementation. This blog describes the outcome and capabilities of the package. You can find and install it from [here](https://github.com/MarlinSchaefer/configreader).

## Value Parsing
I wanted to base my `ConfigReader` on the `ConfigParser` from the `configparser` module, to save me time writing my own file-interface and parser. This meant that I had to find a way to parse the raw strings into Python types. Since I also wanted basic function support this had to be done in a safe way, such that no code injection would be possible. While I cannot guarantee that I succeeded in my goal, I have taken a relatively conservative approach.

Instead of writing the parsing from the bottom up as I had done for my [BnsLib](https://github.com/MarlinSchaefer/BnsLib/blob/master/src/BnsLib/utils/config.py), I decided to use the `ast` (abstract syntax tree) module. To get off the ground with that library (and in general), I can recommend [this video](https://www.youtube.com/watch?v=OjPT15y2EpE) from the great YouTube channel [mCoding](https://www.youtube.com/channel/UCaiL2GDNpLYH6Wokkk1VNcg). The most important part was to understand the structure of the `ast.parse` output, which produces nested objects with different attributes, from which Python objects can be constructed. For every basic Python-functionality there is a corresponding `ast`-object.

This nested structure is called a tree and it can then be walked. The idea is to recursively convert the objects in this tree into Python code and call the correct parsing function on the individual pieces. The central code block of the object is the function

In [1]:
def parse_node(self, node):
    cls_name = node.__class__.__name__
    parse_name = f'parse_{cls_name}'
    if hasattr(self, parse_name):
        parse_func = getattr(self, parse_name)
    else:
        raise ValueError(f'Forbidden node: {cls_name}')
    return parse_func(node)

The different `ast` objects are called nodes in the code. To handle any node we just have to define a function called `parse_[Node-name]` in the object. If that function is not defined, an error will be raised. For instance, this blocks all strings including `import`, as no function `parse_Import` is defined.

To allow calling pre-defined functions, the following function is used.

In [2]:
def parse_Call(self, node):
    funcname = node.func.id
    if funcname not in self.functions:
        raise ValueError(f'Unknown function {funcname}')
    args = [self.parse_node(arg) for arg in node.args]
    kwargs = {kwarg.arg: self.parse_node(kwarg.value)
              for kwarg in node.keywords}
    return self.functions[funcname](*args, **kwargs)

The `node.func.id` contains the name of the function. This structure also only allows to call functions which have been added to the `self.functions` dictionary. The arguments and keyword-arguments are then recursively passed to the `parse_node` function, so resolve them to a Python object. All arguments and function outputs have to be reduced to a Python expression at some point. Usually the base for this are constants, i.e. the `parse_Constant` function is usually the base primitive that is called.

In [3]:
def parse_Constant(self, node):
    val = node.value
    if isinstance(val, str) and val in self.constants:
        val = self.constants[val]
    return val

It checks if the value of the constant is known in the `constants` dictionary. If not, it will just return the value of the constant. This may be a string, or a boolean, or an int, or ... The object all these functions are defined in is called an `ExpressionString`. To use it, one can simply do

In [5]:
from configreader import ExpressionString
es = ExpressionString()
print(es.parse('1+1'))
print(es.parse('sin(pi)'))

2
1.2246467991473532e-16


A few functions and constants are pre-defined. These include `sin`, `cos`, `tan`, and `pi`. However, if a unknown function is called, the parsing will error out.

In [6]:
es.parse('round(0.3)')

ValueError: Unknown function round

If your application requires this specific function, it can easily be added to the ExpressionString.

In [8]:
es.register_function(round)
print(es.parse('round(0.3)'))

0


The function name is automatically identified. However, if a different function name should be used when parsing, this is also possible.

In [11]:
def foo_the_bar(inputstring):
    if inputstring == 'foo':
        return 'foobar'
    return inputstring
es.register_function(foo_the_bar, name='bar')
print(es.parse('bar("myfoo")'))
print(es.parse('bar("foo")'))

myfoo
foobar


Note that the quotation marks are important. This is the only downside to this approach of parsing the strings. Constants can be registered just as easily using the `register_constant` function.

## Config-File sectioning
One other crucial design goal, that I haven't tackled before, is the inclusion of sub-sections. Many implementations use multiple sets of square brackets to indicate subsections. However, these sub-sections have to be in order. I wanted to also lift this restriction. Therefore, I opted for a file-system-like structure, that uses a separator to indicate sub-sections. The files will be of the following structure:

```
[Section 1]
name1 = value1

[Section 1/Subsection 1.1]
name1.1 = value1.1

[Section 2]
name2 = value2

[/Subsection 2.1]
name2.1 = value2.1

[//Sub-Subsection 2.1.1]
name2.1.1 = value2.1.1
```

The parent-section can be omitted and replaced by placing the separator in front when sections are ordered in the file. The number of separators specifies the level of sub-sections.

## Retrieving values
There are multiple ways of retrieving a value from the configuration file. For this section consider the following configuration file:

In [20]:
from configreader import ConfigReader

config_source = """
[Constants]
c = 3 * 10 ** 8

[detectors]
width = 2
[/det1]
height = 1.5

[/det2]
height = 2

[Sampler]
sampler_name = custom
parameter1 = 2
[/parameter1]
min = 0
max = sin(pi / 4)

[/parameter2]
min = -1
max = c / 2
param2_desc = "This is the second parameter"
"""

config = ConfigReader(config_source)
print(config)

toplevel/
 ├─Constants/
 │  └─c = 300000000
 ├─detectors/
 │  ├─det1/
 │  │  └─height = 1.5
 │  ├─det2/
 │  │  └─height = 2
 │  └─width = 2
 └─Sampler/
    ├─parameter1/
    │  ├─min = 0
    │  └─max = 0.7071067811865475
    ├─parameter2/
    │  ├─min = -1
    │  ├─max = 150000000.0
    │  └─param2_desc = This is the second parameter
    ├─sampler_name = custom
    └─parameter1 = 2


The safest way is to use the full path to a value.

In [13]:
config['detectors/det1/height']

1.5

You can also try using a dictionary-style access.

In [14]:
config['detectors']['det1']['height']

1.5

However, if one key is not unqiue, it can lead to unexpected errors.

In [16]:
config['Sampler']['parameter1']

NonUniqueKeyError: "Found multiple values: [2, <configreader.core.Section object at 0x7f0a886d5af0>] and sections: ['toplevel/Sampler/parameter1']"

In this case using the full path still yields an expected result.

In [17]:
config['Sampler/parameter1']

2

To access the subsection instead, you can end the path with the separator.

In [18]:
print(config['Sampler/parameter1/'])

parameter1/
 ├─min = 0
 └─max = 0.7071067811865475


The final way to access a value is by a unique name. If a key is unique it can be used without specifying the sub-sections.

In [21]:
print(config['width'])
print(config['param2_desc'])

2
This is the second parameter
