https://realpython.com/regex-python/

In [1]:
import re

In [2]:
s = 'foo123bar'

In [3]:
'123' in s

True

In [4]:
s.find('123')

3

In [5]:
s.index('123')

3

# 

### re.search(regex, string)

In [7]:
if re.search('123', s):
    print('Found a match')

Found a match


# 

### .
- **any character except newline**

In [12]:
re.search('1.3', s)

<re.Match object; span=(3, 6), match='123'>

In [13]:
re.search('1.3', 'foo13')

In [41]:
re.search('foo.bar', 'foo\nbar')

### *
- **matches 0 or more repetitions**
- **immediately follows a portion of regex and indicates how many times that portion must occur**

In [82]:
re.search('foo-*bar', 'foobar')

<re.Match object; span=(0, 6), match='foobar'>

In [83]:
re.search('foo-*bar', 'foo-bar')

<re.Match object; span=(0, 7), match='foo-bar'>

In [84]:
re.search('foo-*bar', 'foo----bar')

<re.Match object; span=(0, 10), match='foo----bar'>

In [85]:
re.search('foo.*bar', '# foo $qux@grault % bar #')

<re.Match object; span=(2, 23), match='foo $qux@grault % bar'>

In [88]:
re.search('foo.*bar', '# foo $quxngrault \n bar #')

### +
- **matches 1 or more repetitions**

In [90]:
print(re.search('foo-+bar', 'foobar'))

None


In [91]:
re.search('foo-+bar', 'foo-bar')

<re.Match object; span=(0, 7), match='foo-bar'>

In [92]:
re.search('foo-+bar', 'foo----bar')

<re.Match object; span=(0, 10), match='foo----bar'>

### ?
- **matches 0 or 1 repetitions**

In [94]:
re.search('foo-?bar', 'foobar')

<re.Match object; span=(0, 6), match='foobar'>

In [95]:
re.search('foo-?bar', 'foo-bar')

<re.Match object; span=(0, 7), match='foo-bar'>

In [97]:
print(re.search('foo-?bar', 'foo---bar'))

None


- **specifies the non-greedy versions of * + ?**
- **introduces the lookahead or lookbehind assertion**
- **creates a named group**

In [98]:
re.search('<.*>', '%<foo> <bar> <baz>%')

<re.Match object; span=(1, 18), match='<foo> <bar> <baz>'>

In [99]:
re.search('<.*?>', '%<foo> <bar> <baz>%')

<re.Match object; span=(1, 6), match='<foo>'>

In [100]:
re.search('ba?', 'baaa')

<re.Match object; span=(0, 2), match='ba'>

In [102]:
print(re.search('ba??', 'baaa'))

<re.Match object; span=(0, 1), match='b'>


### {}
- **matches an explicitly specified number of repetitions**
- **regex{,n}** : any number of repetitions <= n
- **regex{m,}** : any number of repetitions >=m
- **regex{,}** : any number of repetitions

In [103]:
re.search('x-{3}x', 'x---x')

<re.Match object; span=(0, 5), match='x---x'>

In [104]:
print(re.search('x-{3}x', 'x--x'))

None


In [105]:
re.search('a{3,5}', 'aaaaaaaaa')

<re.Match object; span=(0, 5), match='aaaaa'>

In [106]:
re.search('a{3,5}?', 'aaaaaaaaa')

<re.Match object; span=(0, 3), match='aaa'>

# 

### \
- **escapes a metacharacter of its special meaning**
- **introduces a special character class**
- **introduces a grouping backreference**

In [40]:
re.search('[\]}]', '[Nom]')

<re.Match object; span=(4, 5), match=']'>

In [58]:
re.search('.', 'foo.bar')

<re.Match object; span=(0, 1), match='f'>

In [59]:
re.search('\.', 'foo.bar')

<re.Match object; span=(3, 4), match='.'>

### r
- **raw string**
- Remember to consider using a raw string whenever your regex includes a metacharacter sequence containing a backslash.

In [64]:
s = r'foo\bar'

In [65]:
re.search(r'\\', s)

<re.Match object; span=(3, 4), match='\\'>

# 

### []
- **specifies a character class** : match any character that is in the class

In [19]:
re.search('ba[teau]', 'batiment')

<re.Match object; span=(0, 3), match='bat'>

In [20]:
re.search('ba[teau]', 'abbaa')

<re.Match object; span=(2, 5), match='baa'>

In [25]:
re.search('[A-E]', 'Felita DONOR')

<re.Match object; span=(7, 8), match='D'>

In [23]:
re.search('[0-9]', '03-05-1993')

<re.Match object; span=(0, 1), match='0'>

In [28]:
re.search("[0-9a-fA-F]", '----a0B----')

<re.Match object; span=(4, 5), match='a'>

# 

### ^ 
### \A
- **anchors a match at the start of a string**
- **complements a character class**

In [29]:
re.search('[^0-9]', '123soleil')

<re.Match object; span=(3, 4), match='s'>

In [33]:
re.search('[#9^0)]', 'foo^bar')

<re.Match object; span=(3, 4), match='^'>

In [66]:
re.search('^foo', 'foobar')

<re.Match object; span=(0, 3), match='foo'>

In [67]:
re.search('\Afoo', 'foobar')

<re.Match object; span=(0, 3), match='foo'>

### $
### \Z
- **anchors a match at the end of string**

In [68]:
re.search('bar$', 'foobar')

<re.Match object; span=(3, 6), match='bar'>

In [70]:
print(re.search('bar\Z', 'barfoo'))

None


In [71]:
re.search('bar\Z', 'foobar')

<re.Match object; span=(3, 6), match='bar'>

In [72]:
re.search('bar$', 'foobar\n')

<re.Match object; span=(3, 6), match='bar'>

In [73]:
re.search('bar\Z', 'foobar\n')

### \b
- **anchors a match to a word boundary**

In [74]:
re.search(r'\bbar', 'foo bar')

<re.Match object; span=(4, 7), match='bar'>

In [76]:
print(re.search(r'foo\b', 'foobar'))

None


In [77]:
re.search(r'\bbar\b', 'foo bar baz')

<re.Match object; span=(4, 7), match='bar'>

In [78]:
re.search(r'\bbar\b', 'foo(bar)baz')

<re.Match object; span=(4, 7), match='bar'>

### \B
- **anchors a match to a location that is not a word boundary**

In [79]:
print(re.search(r'\Bfoo\B', 'foo'))

None


In [80]:
print(re.search(r'\Bfoo\B', '.foo.'))

None


In [81]:
re.search(r'\Bfoo\B', 'barfoobaz')

<re.Match object; span=(3, 6), match='foo'>

In [170]:
re.search('foo', 'barfoobaz')

<re.Match object; span=(3, 6), match='foo'>

# 

### \w
- **matches alphanumeric and underscore**

In [48]:
re.search('\w', '#).\_£')

<re.Match object; span=(4, 5), match='_'>

### \W
- **matches everything else than alphanumeric and underscore**

In [46]:
re.search('\W', 'f_e-5')

<re.Match object; span=(5, 6), match='-'>

### \s
- **matches any whitespace character (including newline)**

In [51]:
re.search('\s', '\n ')

<re.Match object; span=(0, 1), match='\n'>

### \S
- **matches everthing else than a whitespace (including newline)**

In [55]:
re.search('\S', '\n ca va?')

<re.Match object; span=(2, 3), match='c'>

### \d
- **matches decimal digit**

### \D
- **matches everthing else than decimal digit**

#### 

In [56]:
re.search('[\d\w\s]', '---3---')

<re.Match object; span=(3, 4), match='3'>

In [57]:
re.search('[\d\w\s]', '--- ---')

<re.Match object; span=(3, 4), match=' '>

### 

##  Grouping Constructs and Backreferences
- **Grouping** : a single syntactic entity
- **Capturing** : grouping also matched subexpression in the group

### ()
- **creates a group or subexpression**

In [116]:
re.search('(bar)', 'foo bar baz')

<re.Match object; span=(4, 7), match='bar'>

In [109]:
re.search('bar', 'foo bar baz')

<re.Match object; span=(4, 7), match='bar'>

In [110]:
re.search('(bar)+', 'foo bar baz')

<re.Match object; span=(4, 7), match='bar'>

In [111]:
re.search('(bar)+', 'foo barbarbar baz')

<re.Match object; span=(4, 13), match='barbarbar'>

In [171]:
re.search('(ba[rz]){1,4}(qux)?', 'bazbarbazqux')

<re.Match object; span=(0, 12), match='bazbarbazqux'>

In [117]:
re.search('(foo(bar)?)+(\d\d\d)?', 'foofoobar')

<re.Match object; span=(0, 9), match='foofoobar'>

In [176]:
re.search('(foo(bar)?)+(\d\d\d)?', 'foo333')

<re.Match object; span=(0, 6), match='foo333'>

### m.groups()
- **returns a tuple containing all the captured groups from a regex match**

In [118]:
m = re.search('(\w+),(\w+),(\w+)', 'foo,quux,baz')

In [120]:
m

<re.Match object; span=(0, 12), match='foo,quux,baz'>

In [121]:
m.groups()

('foo', 'quux', 'baz')

### m.group(n)
- **returns string containing the nth captured match**

In [122]:
m.group(1)

'foo'

In [123]:
m.group(0)

'foo,quux,baz'

In [125]:
m.group(1, 3)

('foo', 'baz')

### Backreferences

### \\\<n\>
- **matches the content of a previously captured group**

In [126]:
regex = r'(\w+),\1'

In [127]:
m = re.search(regex, 'foo,foo')

In [128]:
m

<re.Match object; span=(0, 7), match='foo,foo'>

### Other grouping constructs

### (?P\<name\>\<regex\>)
- **creates a named captured group**

In [129]:
m = re.search('(?P<w1>\w+),(?P<w2>\w+),(?P<w3>\w+)', 'foo,quux,baz')

In [130]:
m

<re.Match object; span=(0, 12), match='foo,quux,baz'>

In [131]:
m.group('w1')

'foo'

In [132]:
m.group('w3', 'w1')

('baz', 'foo')

### (?P=\<name\>)
- **matches the contents of a previously captured named group**

In [133]:
m = re.search(r'(?P<word>\w+),(?P=word)', 'foo,foo')

In [134]:
m

<re.Match object; span=(0, 7), match='foo,foo'>

In [135]:
m.group('word')

'foo'

### (?:\<regex\>)
- **creates a non capturing group**

In [188]:
m = re.search('(\w+),(?:\w+),(\w+)', 'foo,quux,baz')

In [137]:
m.groups()

('foo', 'baz')

### (?(\<n\>)\<yes-regex\>|\<no-regex\>)
### (?(\<name\>)\<yes-regex\>|\<no-regex\>)
- **specifies a conditional match**
- against yes-regex if group n or group name exists
- otherwise matches against no-regex

In [138]:
regex = r'^(###)?foo(?(1)bar|baz)'

(?(1)bar|baz) matches against 'bar' if group 1 exists and 'baz' if it doesn’t

In [139]:
re.search(regex, '###foobar')

<re.Match object; span=(0, 9), match='###foobar'>

In [140]:
print(re.search(regex, '###foobaz'))

None


In [141]:
re.search(regex, 'foobaz')

<re.Match object; span=(0, 6), match='foobaz'>

In [146]:
regex = r'^(?P<ch>\W)?foo(?(ch)(?P=ch)|)$'

 The conditional match then matches against \<yes-regex\>, which is (?P=ch), the same character again. That means the same character must also follow 'foo' for the entire match to succeed.

In [147]:
re.search(regex, 'foo')

<re.Match object; span=(0, 3), match='foo'>

In [148]:
re.search(regex, '@foo@')

<re.Match object; span=(0, 5), match='@foo@'>

In [177]:
print(re.search(regex, '#foo@'))

None


## 

##  Lookahead and lookbehind assertions
- **Looks what is immediately before or after the match**

### (?=\<lookahead_regex\>)
- **creates a positive lookahead assertion**

In [150]:
re.search('foo(?=[a-z])', 'foobar')

<re.Match object; span=(0, 3), match='foo'>

In [151]:
print(re.search('foo(?=[a-z])', 'foo123'))

None


In [181]:
m = re.search('foo(?=[a-z])(?P<ch>.)', 'foobar')

In [182]:
m

<re.Match object; span=(0, 4), match='foob'>

In [155]:
m.group('ch')

'b'

In [156]:
m = re.search('foo([a-z])(?P<ch>.)', 'foobar')

In [157]:
m.group('ch')

'a'

### (?!\<lookahead_regex\>)
- **creates a negative lookahead assertion**

### (?\<=\<lookbehind_regex\>)
- **creates a positive lookbehind assertion**

In [158]:
re.search('(?<=foo)bar', 'foobar')

<re.Match object; span=(3, 6), match='bar'>

In [159]:
re.search('(?<=a{3})def', 'aaadef')

<re.Match object; span=(3, 6), match='def'>

### (?\<!\<lookbehind_regex\>)
- **creates a negative lookbehind assertion**

In [160]:
re.search('(?<!qux)bar', 'foobar')

<re.Match object; span=(3, 6), match='bar'>

In [161]:
print(re.search('(?<!foo)bar', 'foobar'))

None


# 

### Miscellaneous Metacharacters 

### (?#...)
- **comment**

In [162]:
re.search('bar(?#This is a comment) *baz', 'foo bar baz qux')

<re.Match object; span=(4, 11), match='bar baz'>

### |
- **matches at most 1 regex**

In [184]:
re.search('foo|bar|baz', 'bar')

<re.Match object; span=(0, 3), match='bar'>

In [185]:
re.search('foo|grault', 'foograult')

<re.Match object; span=(0, 3), match='foo'>

In [165]:
re.search('(foo|bar|baz)+', 'foofoofoo')

<re.Match object; span=(0, 9), match='foofoofoo'>

In [187]:
re.search('([0-9]+|[a-f]+)', 'ffda33')

<re.Match object; span=(0, 4), match='ffda'>

# 

## Modified Regular Expression Matching with Flags

### re.search(\<regex\>, \<string\>, \<flags\>)
- **scans a string for a regex match, applying the specified modifier flag**

### re.I
### re.IGNORECASE
- **makes matching case insensitive**

In [168]:
 re.search('a+', 'aaaAAA')

<re.Match object; span=(0, 3), match='aaa'>

In [167]:
re.search('a+', 'aaaAAA', re.I)

<re.Match object; span=(0, 6), match='aaaAAA'>

### re.DEBUG
- **display debug information**

In [169]:
re.search('foo.bar', 'fooxbar', re.DEBUG)

LITERAL 102
LITERAL 111
LITERAL 111
ANY None
LITERAL 98
LITERAL 97
LITERAL 114

 0. INFO 12 0b1 7 7 (to 13)
      prefix_skip 3
      prefix [0x66, 0x6f, 0x6f] ('foo')
      overlap [0, 0, 0]
13: LITERAL 0x66 ('f')
15. LITERAL 0x6f ('o')
17. LITERAL 0x6f ('o')
19. ANY
20. LITERAL 0x62 ('b')
22. LITERAL 0x61 ('a')
24. LITERAL 0x72 ('r')
26. SUCCESS


<re.Match object; span=(0, 7), match='fooxbar'>

# 

# 

### :
### #
### =
### !
- **designates a specialized group**

# 

### <>
- **creates a named group**

# 