# Info
- [Parsing a file](#Parsing-a-file)
    - [Parse a line for keywords](#Parse-a-line-for-keywords)

# Parsing a file

A good example how to parse a file would be writing a parser for a config file.

Let's have a look at the file first: `config.ini`.
> Open it with your textmanager of your choice or use a shell command from inside of the notebook.
> * Linux : `!cat config.ini`
> * Windows : `!type config.ini`
> * Mac : `!cat config.ini` (not tested)

In [1]:
# Linux : !cat config.ini
# Windows : !type config.ini
# Mac ?? : !cat config.ini
!type config.ini

config.ini: not found


As you can see the file is easily structured.

1. section is given in `[` + **section name** + `]`
2. values are assigned in the format : `parameter` `:` `value(s)`
3. comments seem to start with `#`

## open a file
**Recall**: The typical workflow is:
1. `open` a file
2. `read` its content or `write` something into it
3. `close` it

In [2]:
fp = open('config.ini')
content = fp.read()
fp.close()

In [3]:
print(content)

[ Main ] 
path : /home/me/Documents
Description : Python : Isn't it easy?

[ Window ]
# Information about the window
Hight : 1000
Width :  500
x     :  200
y     :  200


[ a stupid last section ]
alist : 10, 20, 30



**There has to be a better way!**

A better way to do this is using a construct like:
```python
with open("myfile", 'r') as fp:
    # do stuff with fp here
```
In this case, everything done with the file is done here and it's closed automatically! 

Because this manages things automatically for you, constructs likes this are called **context manager** in python, in case you want to google it.

In [4]:
#@solution
with open('config.ini') as fp:
    content = fp.read()

print(content)

[ Main ] 
path : /home/me/Documents
Description : Python : Isn't it easy?

[ Window ]
# Information about the window
Hight : 1000
Width :  500
x     :  200
y     :  200


[ a stupid last section ]
alist : 10, 20, 30



We can even open several files and write the content of the one into the other.
You can either call the command two time:
```python
with open("myfile", 'r') as fp_in:
    with open("myfile", 'w') as fp_out:
 ```
or tell the **context mangager** to handle two files for you:
```python
with open('config.ini', 'r') as fp_in, open('new_config.ini', 'w') as fp_out:
```
As you can see we also pass a second argument like
```python
'r', 'w', 'a', 'rb', 'wb', 'ab'
```
to the `open` function. This stands for **r**ead, **w**rite, **a**ppend, **b**inary.

In [5]:
#@solution
with open('config.ini', 'r') as fp, open('new_config.ini', 'w') as out:
    # copy read file1 -> write it to file2
    for line in fp.readlines():
        out.write(line)
    # write something new to it
    out.write('new : test\n')

Let us check if everything worked.

In [6]:
with open('new_config.ini', 'r') as fp:
    line = True
    while line:
        line = fp.readline()
        print(line, end='')        

[ Main ] 
path : /home/me/Documents
Description : Python : Isn't it easy?

[ Window ]
# Information about the window
Hight : 1000
Width :  500
x     :  200
y     :  200


[ a stupid last section ]
alist : 10, 20, 30
new : test


## Parse a line for keywords
Now, we only want to have our keywords for the given sections.
Let's try to extract them.
What should we do?

1. Throw a way empty lines
2. Ignore comments
3. find out which section we are
4. if we have a parameter line, find the parameter name and the value
5. store it in some convenient way

**Tools we have**
- `dict` -> store sections <br>
   `dict(dict(key=value))` -> store *keyword* + *value* into a `dict`
   and store the `dict` per *section* into an other `dict`
- `continue` in a `for` loop skips the circle

Before we use our file, let us create some test cases. With them we can check if your logic works and we don't have to use our *complex* file.

In [7]:
test_cases = [
    "", # empty
    '  ', # empty
    '#', # comment
    '# more', # comment
    ' # still a comment', # comment
    'para1 : 1', # value
    ' para2 : 2', # value
    'parastring : Test: Break the system!'
]

Parse over the test_case.

In [8]:
for line in test_cases:
    print('"{}"'.format(line))

""
"  "
"#"
"# more"
" # still a comment"
"para1 : 1"
" para2 : 2"
"parastring : Test: Break the system!"


<div class='alert alert-block alert-info'>
    How can we find the empty lines?
</div>

`.strip()` will strip the line by `blanks` and `\n`.

In [9]:
#@solution
for line in test_cases:
    if len(line) == 0:
        continue
    print('"{}"'.format(line))

"  "
"#"
"# more"
" # still a comment"
"para1 : 1"
" para2 : 2"
"parastring : Test: Break the system!"


As you can see this still leaves empty lines with blanks

In [10]:
#@solution
for line in test_cases:
    line_strip = line.strip()
    if len(line_strip) == 0:
        continue
    print('"{}"'.format(line))

"#"
"# more"
" # still a comment"
"para1 : 1"
" para2 : 2"
"parastring : Test: Break the system!"


<div class='alert alert-block alert-info'>
    How can we find comments?
</div>

The first character of the line is a `#`, so check if it's there.

In [11]:
#@solution
for line in test_cases:
    line_strip = line.strip()
    if len(line_strip) == 0:
        continue
    if line_strip[0] == '#':
        continue
    print('"{}"'.format(line))

"para1 : 1"
" para2 : 2"
"parastring : Test: Break the system!"


<div class='alert alert-block alert-info'>
    Can we get the values and keywords?
</div>

In [12]:
# as a remember how our structure looks like
test_cases = [
    "", # empty
    '  ', # empty
    '#', # comment
    '# more', # comment
    ' # still a comment', # comment
    'para1 : 1', # value
    ' para2 : 2', # value
    'parastring : Test: Break the system!'
]

Let's try `.split(":")`

In [13]:
#@solution
for line in test_cases:
    line_strip = line.strip()
    if len(line_strip) == 0:
        continue
    if line_strip[0] == '#':
        continue
    line_split = line_strip.split(":")
    key = line_split[0]
    value = line_split[1]
    
    print("{} -> {}".format(key, value))

para1  ->  1
para2  ->  2
parastring  ->  Test


Let's try `.find(":")`

In [14]:
#@solution
for line in test_cases:
    line_strip = line.strip()
    if len(line_strip) == 0:
        continue
    if line_strip[0] == '#':
        continue
    
    # get index of :
    ind_sep = line_strip.index(':')

    key = line_strip[:ind_sep].strip()
    value = line_strip[ind_sep+1:].strip()
    
    print("{} -> {}".format(key, value))

para1 -> 1
para2 -> 2
parastring -> Test: Break the system!


<div class='alert alert-block alert-info'>
    Let's use our file!
</div>

In [15]:
#@solution
with open('config.ini', 'r') as fp:
    for line in fp:
        line_strip = line.strip()
        if len(line_strip) == 0:
            continue
        if line_strip[0] == '#':
            continue

        # get index of :
        ind_sep = line_strip.find(':')

        key = line_strip[:ind_sep].strip()
        value = line_strip[ind_sep+1:].strip()

        print("{} -> {}".format(key, value))

[ Main -> [ Main ]
path -> /home/me/Documents
Description -> Python : Isn't it easy?
[ Window -> [ Window ]
Hight -> 1000
Width -> 500
x -> 200
y -> 200
[ a stupid last section -> [ a stupid last section ]
alist -> 10, 20, 30


Things seem to work, but we also got the sections.
Let's make sure that it breaks if its not a `parameter : value`.

Therefore, we use a construct call `assert`, which will throw an error as soon as the given condition no longer holds `True`.
```python
assert condition == True, "My error Message which tells me whats wrong"
```

This is a convenient way to make sure, that things will only work if they are structured the way you thought they should be. Otherwise, you maybe run later in an error and have to find out way things break a way later in your script.

In [16]:
#@solution
with open('config.ini', 'r') as fp:
    for line in fp:
        line_strip = line.strip()
        if len(line_strip) == 0:
            continue
        if line_strip[0] == '#':
            continue

        # get index of :
        ind_sep = line_strip.find(':')

        # make sure every thing is formated correctly
        assert ind_sep != -1, "formating of the line is off\n"+line
            
        key = line_strip[:ind_sep].strip()
        value = line_strip[ind_sep+1:].strip()

        print("{} -> {}".format(key, value))

AssertionError: formating of the line is off
[ Main ] 


<div class='alert alert-block alert-info'>
    Let's find the sections.
</div>

In [17]:
#@solution
with open('config.ini', 'r') as fp:
    for line in fp:
        line_strip = line.strip()
        if len(line_strip) == 0:
            continue
        if line_strip[0] == '#':
            continue

        # section heading
        if line_strip[0] == '[': 
            section = line_strip[1:line_strip.find(']', -1)].strip() # get name
            continue
        
        # get index of :
        ind_sep = line_strip.find(':')

        # make sure every thing is formated correctly
        assert ind_sep != -1, "formating of the line is off\n"+line
            
        key = line_strip[:ind_sep].strip()
        value = line_strip[ind_sep+1:].strip()

        print("{} -> {}".format(key, value))

path -> /home/me/Documents
Description -> Python : Isn't it easy?
Hight -> 1000
Width -> 500
x -> 200
y -> 200
alist -> 10, 20, 30


<div class='alert alert-block alert-info'>
    Let's save the values
</div>

Let's use a `dict` for it with key = `section_name`.

The value of each `section_name` will be another `dict` but this time with key = `key` from the settings and value = `setting`.

We have to make sure that we also handle things outside of a section.
Let's define a dummy section `_all_` where we can put things in if they don't belong to a section.

In [18]:
settings = {} 
section = '_all_' 
settings[section] = {} 

with open('config.ini', 'r') as fp:
    for line in fp:
        line_strip = line.strip()
        if len(line_strip) == 0:
            continue
        if line_strip[0] == '#':
            continue

        # section heading
        if line_strip[0] == '[': 
            section = line_strip[1:line_strip.find(']', -1)].strip() # get name
            settings[section] = {}
            continue
        
        # get index of :
        ind_sep = line_strip.find(':')

        # make sure every thing is formated correctly
        assert ind_sep != -1, "formating of the line is off\n"+line
            
        key = line_strip[:ind_sep].strip()
        value = line_strip[ind_sep+1:].strip()

        settings[section][key] = value 
settings

{'_all_': {},
 'Main': {'path': '/home/me/Documents',
  'Description': "Python : Isn't it easy?"},
 'Window': {'Hight': '1000', 'Width': '500', 'x': '200', 'y': '200'},
 'a stupid last section': {'alist': '10, 20, 30'}}

Now we got a nice function to scan a `config.ini` file.

For an easier use, let us store it in a function.

Additionally, we can add a few lines to make remove the empty `dict` we created in the beginning.
```python
if len(settings['_all_']) == 0:
   del settings['_all_']
```

# Put everything in a function

In [19]:
def read_settings(filename = 'config.ini'):
    """
    Function to read settings
    """

    settings = {}
    section = '_all_'
    settings[section] = {}

    with open(filename, 'r') as fp:
        for line in fp:
            line_strip = line.strip()
            if len(line_strip) == 0: continue # skip empty lines
            if line_strip[0] == '#': continue # skip comment lines

            # split line into a list
            line_split = line_strip.split()

            # section heading
            if line_strip[0] == '[': 
                section = line_strip[1:line_strip.find(']', -1)].strip() # get name
                settings[section] = {}
                continue

            # get index of :
            ind_sep = line_strip.find(':')

            # make sure every thing is formated correctly
            assert ind_sep != -1, "formating of the line is off\n"+line
                

            key = line_strip[:ind_sep].strip()
            value = line_strip[ind_sep+1:].strip()
            if value == 'False':
                value = False
            if value == 'True':
                value = True
            if value == 'None':
                value = None
            settings[section][key] = value
            
        if len(settings['_all_']) == 0:
            del settings['_all_']
    return settings

In [20]:
settings = read_settings(filename = 'config.ini')
settings

{'Main': {'path': '/home/me/Documents',
  'Description': "Python : Isn't it easy?"},
 'Window': {'Hight': '1000', 'Width': '500', 'x': '200', 'y': '200'},
 'a stupid last section': {'alist': '10, 20, 30'}}

# There have to be an easier way!

Let's google: [**parse config file python**](https://www.google.ch/search?q=parse+config+file+python)

First hit: https://docs.python.org/3/library/configparser.html

Let's try this

In [21]:
import configparser

In [22]:
config = configparser.ConfigParser()

In [23]:
config.sections()

[]

Read our config file.

In [24]:
config.read('config.ini')

['config.ini']

Let's see its sections

In [25]:
config.sections()

[' Main ', ' Window ', ' a stupid last section ']

Let's parse over the parameters

In [26]:
for section in config.sections():
    print(80*'#'+"\n"+section)
    for key, value in config[section].items():
        print("{} -> {}".format(key,value))

################################################################################
 Main 
path -> /home/me/Documents
description -> Python : Isn't it easy?
################################################################################
 Window 
hight -> 1000
width -> 500
x -> 200
y -> 200
################################################################################
 a stupid last section 
alist -> 10, 20, 30


### Can we remove that leading spaces?
e.g. `' Main '` vs `'Main'` 

https://docs.python.org/3/library/configparser.html#configparser.SECTCRE

In [27]:
import re
custom = configparser.ConfigParser()
custom.SECTCRE = re.compile(r"\[ *(?P<header>[^]]+?) *\]")
custom.read('config.ini')
custom.sections()

['Main', 'Window', 'a stupid last section']

In [28]:
for section in custom.sections():
    print(80*'#'+"\n"+section)
    for key, value in custom[section].items():
        print("{} -> {}".format(key,value))

################################################################################
Main
path -> /home/me/Documents
description -> Python : Isn't it easy?
################################################################################
Window
hight -> 1000
width -> 500
x -> 200
y -> 200
################################################################################
a stupid last section
alist -> 10, 20, 30
