# Info
- [Parsing a file](#Parsing-a-file)
    - [Parse a line for keywords](#Parse-a-line-for-keywords)

# Parsing a file

A good example how to parse a file would be writing a parser for a config file.

Let's have a look at the file first: `config.ini`.
> Open it with your textmanager of your choice or use a shell command from inside of the notebook.
> * Linux : `!cat config.ini`
> * Windows : `!type config.ini`
> * Mac : `!cat config.ini` (not tested)

In [None]:
# Linux : !cat config.ini
# Windows : !type config.ini
# Mac ?? : !cat config.ini
!type config.ini

As you can see the file is easily structured.

1. section is given in `[` + **section name** + `]`
2. values are assigned in the format : `parameter` `:` `value(s)`
3. comments seem to start with `#`

## open a file
**Recall**: The typical workflow is:
1. `open` a file
2. `read` its content or `write` something into it
3. `close` it

In [None]:
fp = open('config.ini')
content = fp.read()
fp.close()

In [None]:
print(content)

**There has to be a better way!**

Let us check if everything worked.

In [None]:
with open('new_config.ini', 'r') as fp:
    line = True
    while line:
        line = fp.readline()
        print(line, end='')        

## Parse a line for keywords
Now, we only want to have our keywords for the given sections.
Let's try to extract them.
What should we do?

1. Throw a way empty lines
2. Ignore comments
3. find out which section we are
4. if we have a parameter line, find the parameter name and the value
5. store it in some convenient way

**Tools we have**
- `dict` -> store sections <br>
   `dict(dict(key=value))` -> store *keyword* + *value* into a `dict`
   and store the `dict` per *section* into an other `dict`
- `continue` in a `for` loop skips the circle

Before we use our file, let us create some test cases. With them we can check if your logic works and we don't have to use our *complex* file.

In [None]:
test_cases = [
    "", # empty
    '  ', # empty
    '#', # comment
    '# more', # comment
    ' # still a comment', # comment
    'para1 : 1', # value
    ' para2 : 2', # value
    'parastring : Test: Break the system!'
]

Parse over the test_case.

In [None]:
for line in test_cases:
    print('"{}"'.format(line))

<div class='alert alert-block alert-info'>
    How can we find the empty lines?
</div>

<div class='alert alert-block alert-info'>
    How can we find comments?
</div>

<div class='alert alert-block alert-info'>
    Can we get the values and keywords?
</div>

In [None]:
# as a remember how our structure looks like
test_cases = [
    "", # empty
    '  ', # empty
    '#', # comment
    '# more', # comment
    ' # still a comment', # comment
    'para1 : 1', # value
    ' para2 : 2', # value
    'parastring : Test: Break the system!'
]

<div class='alert alert-block alert-info'>
    Let's use our file!
</div>

<div class='alert alert-block alert-info'>
    Let's find the sections.
</div>

<div class='alert alert-block alert-info'>
    Let's save the values
</div>

In [None]:
settings = {} 
section = '_all_' 
settings[section] = {} 

with open('config.ini', 'r') as fp:
    for line in fp:
        line_strip = line.strip()
        if len(line_strip) == 0:
            continue
        if line_strip[0] == '#':
            continue

        # section heading
        if line_strip[0] == '[': 
            section = line_strip[1:line_strip.find(']', -1)].strip() # get name
            settings[section] = {}
            continue
        
        # get index of :
        ind_sep = line_strip.find(':')

        # make sure every thing is formated correctly
        assert ind_sep != -1, "formating of the line is off\n"+line
            
        key = line_strip[:ind_sep].strip()
        value = line_strip[ind_sep+1:].strip()

        settings[section][key] = value 
settings

Now we got a nice function to scan a `config.ini` file.

For an easier use, let us store it in a function.

Additionally, we can add a few lines to make remove the empty `dict` we created in the beginning.
```python
if len(settings['_all_']) == 0:
   del settings['_all_']
```

# Put everything in a function

In [None]:
def read_settings(filename = 'config.ini'):
    """
    Function to read settings
    """

    settings = {}
    section = '_all_'
    settings[section] = {}

    with open(filename, 'r') as fp:
        for line in fp:
            line_strip = line.strip()
            if len(line_strip) == 0: continue # skip empty lines
            if line_strip[0] == '#': continue # skip comment lines

            # split line into a list
            line_split = line_strip.split()

            # section heading
            if line_strip[0] == '[': 
                section = line_strip[1:line_strip.find(']', -1)].strip() # get name
                settings[section] = {}
                continue

            # get index of :
            ind_sep = line_strip.find(':')

            # make sure every thing is formated correctly
            assert ind_sep != -1, "formating of the line is off\n"+line
                

            key = line_strip[:ind_sep].strip()
            value = line_strip[ind_sep+1:].strip()
            if value == 'False':
                value = False
            if value == 'True':
                value = True
            if value == 'None':
                value = None
            settings[section][key] = value
            
        if len(settings['_all_']) == 0:
            del settings['_all_']
    return settings

In [None]:
settings = read_settings(filename = 'config.ini')
settings

# There have to be an easier way!

Let's google: [**parse config file python**](https://www.google.ch/search?q=parse+config+file+python)

First hit: https://docs.python.org/3/library/configparser.html

Let's try this

In [None]:
import configparser

In [None]:
config = configparser.ConfigParser()

In [None]:
config.sections()

Read our config file.

In [None]:
config.read('config.ini')

Let's see its sections

In [None]:
config.sections()

Let's parse over the parameters

In [None]:
for section in config.sections():
    print(80*'#'+"\n"+section)
    for key, value in config[section].items():
        print("{} -> {}".format(key,value))

### Can we remove that leading spaces?
e.g. `' Main '` vs `'Main'` 

https://docs.python.org/3/library/configparser.html#configparser.SECTCRE

In [None]:
import re
custom = configparser.ConfigParser()
custom.SECTCRE = re.compile(r"\[ *(?P<header>[^]]+?) *\]")
custom.read('config.ini')
custom.sections()

In [None]:
for section in custom.sections():
    print(80*'#'+"\n"+section)
    for key, value in custom[section].items():
        print("{} -> {}".format(key,value))