- Title: Regular Expression in Python
- Slug: regular-expression-python
- Date: 2021-03-10 09:27:11
- Category: Computer Science
- Tags: programming, Python, regex, regular expression
- Author: Ben Du


[Online Regular Expression Tester](https://regex101.com/)


1. The Python module `re` automatically compiles a plain/text pattern
  using `re.compile` and caches it,
  so there's not much benefit to compile plain/text patterns by yourself.

1. The regular expression modifier `(?i)` turns on case-insensitive matching.

2. `re.match` matches the regular expression pattern from the beginning of the string
    while `re.search` matches the regular expression pattern anywhere in the string.
    Generally speaking `re.search` is preferred over `re.match`
    as it is more flexible.

3. Passing `re.DOTALL` to the argument `flag` makes the dot (`.`) matches anything
    including a newline (by default the dot does not matches a newline).

1. `re.search` search for the first match anywhere in the string.

2. `re.match` search for the first match at the beginning of the string. 

3. `re.findall` find all matches in the string. 

4. `re.finditer` find all matches and return an iterator of the matches.

5. Passing `re.DOTALL` to the `flags` option make the dot matches anything including the newline.

In [1]:
import re

## re.compile

The compiled object is of type `re.Pattern` 
and has methods `search`, `match`, `sub`, `findall`, `finditer`, etc.

In [4]:
p = re.compile("\d{4}-\d{2}-\d{2}$")

In [5]:
type(p)

re.Pattern

In [8]:
[mem for mem in dir(p) if not mem.startswith("_")]

['findall',
 'finditer',
 'flags',
 'fullmatch',
 'groupindex',
 'groups',
 'match',
 'pattern',
 'scanner',
 'search',
 'split',
 'sub',
 'subn']

## re.sub

In [10]:
re.sub("\d{4}-\d{2}-\d{2}$", "YYYY-mm-dd", "Today is 2018-05-02")

'Today is YYYY-mm-dd'

## re.split

In [16]:
re.split("[+-/*]", "a-b/c*d")

['a', 'b', 'c', 'd']

In [17]:
re.split("[*+-/]", "a-b/c*d")

['a', 'b', 'c', 'd']

In [18]:
re.split("[+*-/]", "a-b/c*d")

['a', 'b', 'c', 'd']

`*` cannot be used after `-` in `[]` list as `-` has ambiguity here whether it is a literal minus sign or a range operator.

In [19]:
re.split("[+-*/]", "a-b/c*d")

error: bad character range +-* at position 1

## re.match

In [20]:
re.match("^\d{4}-\d{2}-\d{2}$", "2018-07-01")

<re.Match object; span=(0, 10), match='2018-07-01'>

In [21]:
re.match("\d{4}-\d{2}-\d{2}", "Today is 2018-07-01.")

## re.search

In [22]:
import re

re.search("^\d{4}-\d{2}-\d{2}$", "2018-07-01")

<re.Match object; span=(0, 10), match='2018-07-01'>

In [23]:
import re

re.search("\d{4}-\d{2}-\d{2}", "Today is 2018-07-01.")

<re.Match object; span=(9, 19), match='2018-07-01'>

## re.Match.group / re.Match.groups

Matched strings in parentheses can be accessed using the method `Match.group` or `Match.groups`.

In [17]:
m = re.search("(\d{4}-\d{2}-\d{2})", "Today is 2018-07-01.")
m

<re.Match object; span=(9, 19), match='2018-07-01'>

In [18]:
m.groups()

('2018-07-01',)

In [19]:
m.group(0)

'2018-07-01'

## re.findall

Find all matched strings.

In [5]:
import re

s = 'It is "a" good "day" today.'
re.findall('".*?"', s)

['"a"', '"day"']

In [24]:
import re

s = """this is 
/* BEGIN{NIMA}
what 
ever
END{NIMA} */
an example
"""

In [26]:
re.sub("(?s)/\* BEGIN{NIMA}.*END{NIMA} \*/", "", s)

'this is \n\nan example\n'

## Escape & Non-escape

`{` and `}` need not to be escaped.

## References

- [Precedence of Operators in Regular Expression](http://www.legendu.net/misc/blog/precedence-of-operators-in-regular-expression/)
- [re — Regular expression operations](https://docs.python.org/3/library/re.html)
- [Online Regular Expression Tester](https://regex101.com/)