# Regular Expressions

This notebooks provides an introduction on how to use Regular Expressions in Python

## Import the regex module

In [None]:
import re

## Re module basics

Regex string are usually affixed by `r` so that python won't escape the characters

The `re` module as two main functions for matching regular expressions
    
*   `search`: FInds the first instance of the match
*   `findall`: Finds all non overlapping matches

In [None]:
print('regex: ', r'.+\n', 'non-regex: ', '.+\n')

In [None]:
m = re.search(r'hello', 'hello hello')
m.group()

In [None]:
m = re.findall(r'hello', 'hello hello')
m

## Disjunctions

Allows matching of different pattern. Similar to an `OR` operator

### Enumeration

Definition of multiple characters in a bracket

In [None]:
m = re.search(r'[Hh]ello', 'hello')
m.group()

In [None]:
m = re.search(r'[Hh]ello', 'hello world')
m.group()

In [None]:
m = re.search(r'[Hh]ello', 'hi there')
print(m)

### Ranges

Definition of range of characters in a bracket

In [None]:
m = re.search(r'[A-Z][a-z][0-9]', 'hello')
print(m)

In [None]:
m = re.search(r'[A-Z][a-z][0-9]', 'Hi')
print(m)

In [None]:
m = re.search(r'[A-Z][a-z][0-9]', 'Hi19')
m.group()

### Negation

Negates the disjuction with a caret `^` after the first bracket

In [None]:
m = re.search(r'[^A-Z][^aeiou][^0-9]', 'hi')
print(m)

In [None]:
m = re.search(r'[^A-Z][^aeiou][^0-9]', 'Hyp')
print(m)

In [None]:
m = re.search(r'[^A-Z][^aeiou][^0-9]', 'hello')
m.group()

### Special Characters

Special characters that are defined to work on multiple characters. Capital letter eqivalent are negations.

In [None]:
m = re.search(r'.\s\S\d\D\w\W', 'h e1lo!')
m.group()

In [None]:
m = re.search(r'.\s\S\d\D\w\W', 'hello')
print(m)

In [None]:
m = re.search(r'.\s\S\d\D\w\W', 'hi world!')
print(m)

### Generic Disjunctions

Pipe works on multiple and single character disjunctions

In [None]:
m = re.search(r'hi|hello', 'hi')
m.group()

In [None]:
m = re.search(r'hi|hello', 'hello')
m.group()

In [None]:
m = re.search(r'1|2|3', '1')
m.group()

In [None]:
m = re.search(r'1|2|3', '3')
m.group()

## Quantifiers

Defines the possible quantities of each match

In [None]:
m = re.search(r'a*b+c?d{3,4}', 'abcddddd')
m.group()

In [None]:
m = re.search(r'a*b+c?d{3,4}', 'bbddd')
m.group()

In [None]:
m = re.search(r'a*b+c?d{3,4}', 'aacdddd')
print(m)

In [None]:
m = re.search(r'a*b+c?d{3,4}', 'aabcdd')
print(m)

## Anchors

Defines the match to start or end

In [None]:
m = re.search(r'^H', 'Hello')
m.group()

In [None]:
m = re.search(r'^H', 'Say Hello')
print(m)

In [None]:
m = re.search(r'o$', 'Hello')
m.group()

In [None]:
m = re.search(r'o$', 'Hello!')
print(m)

In [None]:
m = re.search(r'^H.{4}$', 'Hello')
m.group()

In [None]:
m = re.search(r'^H.{4}$', 'Hi')
print(m)

In [None]:
m = re.search(r'^H.{4}$', 'Hello!')
print(m)

## Capture Groups

Save parts of the matches separately from the whole match. Capture groups also group the operations.

In [None]:
m = re.search(r'(?:hello|hi) (world)(?P<exc>!)', 'hello world!')
m.group(), m.groups(), m.groupdict()

In [None]:
m.group(0, 1, 2, 'exc')

In [None]:
m = re.search(r'(?:hello|hi) (world)(?P<exc>!)', 'hi world!')
m.group(), m.groups(), m.groupdict()

## Lookahead

Matches should have the lookahead characters in front

In [None]:
m = re.search(r'hello(?= world)(?! world!)', 'hello world!')
print(m)

In [None]:
m = re.search(r'hello(?= world)(?! world!)', 'hello world')
print(m.group())

## Lookbehind

Matches should have the lookbehind characters behind

In [None]:
m = re.search(r'(?<=hello )(?<!!hello )world', 'hello world')
print(m.group())

In [None]:
m = re.search(r'(?<=hello )(?<!!hello )world', '!hello world')
print(m)