# Regular Expression

The main regex library for Python is `re`.

## Regex Shortcuts

| Symbol | Equivalent | Description |
|--------|------------|-------------|
| `\d`   | `[0-9]`    | Matches any decimal digit |
| `\D`   | `[^0-9]`   | Matches any non-digit character |
| `\s`   | `[ \t\n\r\f\v]` | Matches any whitespace character |
| `\S`   | `[^ \t\n\r\f\v]` | Matches any non-whitespace character |
| `\w`   | `[a-zA-Z0-9_]` | Matches any alphanumeric character |
| `\W`   | `[^a-zA-Z0-9_]` | Matches any non-alphanumeric character |

## Usage of regex

In [1]:
import re

pattern = re.compile('ab*c')

## Regex methods

| Function     | Description |
|--------------|--------------------|
| `match()`    | Determine if the RE matches at the beginning of the string |
| `search()`   | Scan through a string, looking for any location where this RE matches
| `findall()`  | Find all substrings where the RE matches, and returns them as a list |
| `finditer()` | Find all substrings where the RE matches, and returns them as an iterator |

### Checking maching of the given string to pattern

There are two methods: `match` and `fullmatch`.

In [2]:
line_formater = '{0:35} {1!s:10} {2!s:10}'
build_line = lambda item: line_formater.format(item[0], pattern.match(item[1]) != None, pattern.fullmatch(item[1]) != None)

print(line_formater.format('Pattern:', 'Match:', 'Fullmatch:'))
print('\n'.join(map(build_line, {
    'do not match at all': 'aaa',
    'match from the beging to end': 'abbbbc',
    'match from the beging to middle': 'abbbbcnnn',
    'match from from middle to end': 'nnnabbc',
    'match from just in middle': 'nnnabbcnnn',
    'match to the end': 'nnnabbc'
}.items())))

Pattern:                            Match:     Fullmatch:
do not match at all                 False      False     
match from the beging to end        True       True      
match from the beging to middle     True       False     
match from from middle to end       False      False     
match from just in middle           False      False     
match to the end                    False      False     


#### Usage of groups

```py
pattern = re.compile(r'(\d+)-(\d+) ([a-z]): ([a-z]+)')
groups = pattern.fullmatch(line)
if groups:
    print(groups[1], groups[2], groups[3], groups[4])
```

### Find pattern location in given string

In [3]:
print('Pattern do not match at all:', pattern.search('aaa'))
print('Pattern match from the beging to end:', pattern.search('abbbbc'))
print('Pattern match from the beging to middle:', pattern.search('abbbbcnnn'))
print('Pattern match from from middle to end:', pattern.search('nnnabbc'))
print('Pattern match from just in middle:', pattern.search('nnnabbcnnn'))
print('Pattern match to the end:', pattern.search('nnnabbc'))

Pattern do not match at all: None
Pattern match from the beging to end: <re.Match object; span=(0, 6), match='abbbbc'>
Pattern match from the beging to middle: <re.Match object; span=(0, 6), match='abbbbc'>
Pattern match from from middle to end: <re.Match object; span=(3, 7), match='abbc'>
Pattern match from just in middle: <re.Match object; span=(3, 7), match='abbc'>
Pattern match to the end: <re.Match object; span=(3, 7), match='abbc'>


### Find all pattern locations in given string

There are two methods for it: `findall` and `finditer`.

In [4]:
print('Pattern do not match at all:', pattern.findall('nnnnnnnn'))
print('Pattern match from the beging to end:', pattern.findall('abbbbc'))
print('Pattern match from the beging to middle:', pattern.search('abbbbcnnn'))
print('Pattern match from from middle to end:', pattern.search('nnnabbc'))
print('Pattern match from just in middle:', pattern.search('nnnabbcnnn'))
print('Pattern match to the end:', pattern.search('nnnabbc'))

Pattern do not match at all: []
Pattern match from the beging to end: ['abbbbc']
Pattern match from the beging to middle: <re.Match object; span=(0, 6), match='abbbbc'>
Pattern match from from middle to end: <re.Match object; span=(3, 7), match='abbc'>
Pattern match from just in middle: <re.Match object; span=(3, 7), match='abbc'>
Pattern match to the end: <re.Match object; span=(3, 7), match='abbc'>
