# Regular Expressions

This notebooks provides an introduction on how to use Regular Expressions in Python

## Import the regex module

In [1]:
import re

## Re module basics

Regex string are usually affixed by `r` so that python won't escape the characters

The `re` module as two main functions for matching regular expressions
    
*   `search`: FInds the first instance of the match
*   `findall`: Finds all non overlapping matches

In [2]:
print('regex: ', r'.+\n', 'non-regex: ', '.+\n')

regex:  .+\n non-regex:  .+



In [3]:
m = re.search(r'hello', 'hello hello')
m.group()

'hello'

In [4]:
m = re.findall(r'hello', 'hello hello')
m

['hello', 'hello']

## Disjunctions

Allows matching of different pattern. Similar to an `OR` operator

### Enumeration

Definition of multiple characters in a bracket

In [5]:
m = re.search(r'[Hh]ello', 'hello')
m.group()

'hello'

In [6]:
m = re.search(r'[Hh]ello', 'hello world')
m.group()

'hello'

In [7]:
m = re.search(r'[Hh]ello', 'hi there')
print(m)

None


### Ranges

Definition of range of characters in a bracket

In [8]:
m = re.search(r'[A-Z][a-z][0-9]', 'hello')
print(m)

None


In [9]:
m = re.search(r'[A-Z][a-z][0-9]', 'Hi')
print(m)

None


In [10]:
m = re.search(r'[A-Z][a-z][0-9]', 'Hi19')
m.group()

'Hi1'

### Negation

Negates the disjuction with a caret `^` after the first bracket

In [11]:
m = re.search(r'[^A-Z][^aeiou][^0-9]', 'hi')
print(m)

None


In [12]:
m = re.search(r'[^A-Z][^aeiou][^0-9]', 'Hyp')
print(m)

None


In [13]:
m = re.search(r'[^A-Z][^aeiou][^0-9]', 'hello')
m.group()

'ell'

### Special Characters

Special characters that are defined to work on multiple characters. Capital letter eqivalent are negations.

In [14]:
m = re.search(r'.\s\S\d\D\w\W', 'h e1lo!')
m.group()

'h e1lo!'

In [15]:
m = re.search(r'.\s\S\d\D\w\W', 'hello')
print(m)

None


In [16]:
m = re.search(r'.\s\S\d\D\w\W', 'hi world!')
print(m)

None


### Generic Disjunctions

Pipe works on multiple and single character disjunctions

In [17]:
m = re.search(r'hi|hello', 'hi')
m.group()

'hi'

In [18]:
m = re.search(r'hi|hello', 'hello')
m.group()

'hello'

In [19]:
m = re.search(r'1|2|3', '1')
m.group()

'1'

In [20]:
m = re.search(r'1|2|3', '3')
m.group()

'3'

## Quantifiers

Defines the possible quantities of each match

In [21]:
m = re.search(r'a*b+c?d{3,4}', 'abcddddd')
m.group()

'abcdddd'

In [22]:
m = re.search(r'a*b+c?d{3,4}', 'bbddd')
m.group()

'bbddd'

In [23]:
m = re.search(r'a*b+c?d{3,4}', 'aacdddd')
print(m)

None


In [24]:
m = re.search(r'a*b+c?d{3,4}', 'aabcdd')
print(m)

None


## Anchors

Defines the match to start or end

In [25]:
m = re.search(r'^H', 'Hello')
m.group()

'H'

In [26]:
m = re.search(r'^H', 'Say Hello')
print(m)

None


In [27]:
m = re.search(r'o$', 'Hello')
m.group()

'o'

In [28]:
m = re.search(r'o$', 'Hello!')
print(m)

None


In [29]:
m = re.search(r'^H.{4}$', 'Hello')
m.group()

'Hello'

In [30]:
m = re.search(r'^H.{4}$', 'Hi')
print(m)

None


In [31]:
m = re.search(r'^H.{4}$', 'Hello!')
print(m)

None


## Capture Groups

Save parts of the matches separately from the whole match. Capture groups also group the operations.

In [32]:
m = re.search(r'(?:hello|hi) (world)(?P<exc>!)', 'hello world!')
m.group(), m.groups(), m.groupdict()

('hello world!', ('world', '!'), {'exc': '!'})

In [33]:
m.group(0, 1, 2, 'exc')

('hello world!', 'world', '!', '!')

In [34]:
m.group(3)

IndexError: no such group

In [35]:
m = re.search(r'(?:hello|hi) (world)(?P<exc>!)', 'hi world!')
m.group(), m.groups(), m.groupdict()

('hi world!', ('world', '!'), {'exc': '!'})

## Lookahead

Matches should have the lookahead characters in front

In [36]:
m = re.search(r'hello(?= world)(?! world!)', 'hello world!')
print(m)

None


In [37]:
m = re.search(r'hello(?= world)(?! world!)', 'hello world')
print(m.group())

hello


## Lookbehind

Matches should have the lookbehind characters behind

In [38]:
m = re.search(r'(?<=hello )(?<! !hello )world', 'hello world')
print(m.group())

world


In [39]:
m = re.search(r'(?<=hello )(?<!!hello )world', '!hello world')
print(m)

None
