<a name="0"></a>
##### $\hspace{15pt}$ **Filename: regularExpressions.ipynb**
##### $\hspace{1.5pt}$ **Date Created: January 28, 2024**
##### **Date Modified: April 29, 2024**
##### $\rule{9.65in}{1pt}$
##### **Provide examples of working with regular expressions using the [`re`](https://docs.python.org/3/library/re.html) module. This notebook is divided into the following sections.**
* **[re module functions](#1)**
* **[Anchors](#2)**
* **[Character Types](#3)**
* **[Character Classes Indicated by [ ]](#4)**
* **[Repetition](#5)**
* **[Grouping and Backreferences](#6)**
* **[Lookahead](#7)**

##### $\rule{9.65in}{1pt}$

##### Load the `re` module.

In [1]:
import re

$\hspace{1in}$

<a name="1"></a>
##### **`re` module functions**

`re.findall(pattern, string, flags = 0)` returns all non-overlapping matches of `pattern` in `string`, as a list of strings or tuples.

`re.search(pattern, string, flags = 0)` scans through `string` looking for the first location where the regular expression `pattern` produces a match, and returns a corresponding `Match` object.

`re.split(pattern, string, maxsplit = 0, flags = 0)` splits `string` by the occurrences of `pattern`.

`re.sub(pattern, repl, string, count = 0, flags = 0)` returns the string obtained by replacing the leftmost non-overlapping occurrences of `pattern` in `string` by the replacement `repl`.

Back to the list of [**section links**](#0)

$\hspace{1in}$

<a name="2"></a>
##### **Anchors**

Match a position before or after other characters.

^ matches the start of the string, and in `MULTILINE` mode also matches immediately after each newline.

In [2]:
string = '''apples, avocadoes, and apples.
bananas, grapes, and guavas.
apples, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.'''

print(string)

apples, avocadoes, and apples.
bananas, grapes, and guavas.
apples, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.


In [3]:
re.findall("^apples", string)

['apples']

In [4]:
re.findall("^apples", string, flags = re.MULTILINE)

['apples', 'apples', 'apples']

In [5]:
re.search("^apples", string)

<re.Match object; span=(0, 6), match='apples'>

In [6]:
re.search("^apples", string, flags = re.MULTILINE)

<re.Match object; span=(0, 6), match='apples'>

In [7]:
re.split("^apples", string)

['',
 ', avocadoes, and apples.\nbananas, grapes, and guavas.\napples, lemons, and mangoes.\nmelons, oranges, and pineapples.\napples, strawberries, and watermelons.']

In [8]:
re.split("^apples", string, flags = re.MULTILINE)

['',
 ', avocadoes, and apples.\nbananas, grapes, and guavas.\n',
 ', lemons, and mangoes.\nmelons, oranges, and pineapples.\n',
 ', strawberries, and watermelons.']

In [9]:
re.split("^apples", string, maxsplit = 2, flags = re.MULTILINE)

['',
 ', avocadoes, and apples.\nbananas, grapes, and guavas.\n',
 ', lemons, and mangoes.\nmelons, oranges, and pineapples.\napples, strawberries, and watermelons.']

In [10]:
print(re.sub("^apples", "mangoes", string))

mangoes, avocadoes, and apples.
bananas, grapes, and guavas.
apples, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.


In [11]:
print(re.sub("^apples", "mangoes", string, flags = re.MULTILINE))

mangoes, avocadoes, and apples.
bananas, grapes, and guavas.
mangoes, lemons, and mangoes.
melons, oranges, and pineapples.
mangoes, strawberries, and watermelons.


In [12]:
print(re.sub("^apples", "mangoes", string, count = 2, flags = re.MULTILINE))

mangoes, avocadoes, and apples.
bananas, grapes, and guavas.
mangoes, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.


$\hspace{1in}$

`\A` matches only at the start of the string.

In [13]:
string = '''apples, avocadoes, and apples.
bananas, grapes, and guavas.
apples, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.'''

print(string)

apples, avocadoes, and apples.
bananas, grapes, and guavas.
apples, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.


In [14]:
re.findall("\Aapples", string)

['apples']

In [15]:
re.findall("\Aapples", string, flags = re.MULTILINE)

['apples']

In [16]:
re.search("\Aapples", string)

<re.Match object; span=(0, 6), match='apples'>

In [17]:
re.search("\Aapples", string, flags = re.MULTILINE)

<re.Match object; span=(0, 6), match='apples'>

In [18]:
re.split("\Aapples", string)

['',
 ', avocadoes, and apples.\nbananas, grapes, and guavas.\napples, lemons, and mangoes.\nmelons, oranges, and pineapples.\napples, strawberries, and watermelons.']

In [19]:
re.split("\Aapples", string, flags = re.MULTILINE)

['',
 ', avocadoes, and apples.\nbananas, grapes, and guavas.\napples, lemons, and mangoes.\nmelons, oranges, and pineapples.\napples, strawberries, and watermelons.']

In [20]:
print(re.sub("\Aapples", "mangoes", string))

mangoes, avocadoes, and apples.
bananas, grapes, and guavas.
apples, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.


In [21]:
print(re.sub("\Aapples", "mangoes", string, flags = re.MULTILINE))

mangoes, avocadoes, and apples.
bananas, grapes, and guavas.
apples, lemons, and mangoes.
melons, oranges, and pineapples.
apples, strawberries, and watermelons.


$\hspace{1in}$

$ matches the end of the string or just before the newline at the end of the string, and in `MULTILINE` mode also matches before a newline.

In [22]:
string = '''apples, avocadoes, and apples.
bananas, grapes, and guavas.
lemons, mangoes, and apples.
melons, oranges, and pineapples.
strawberries, watermelons, and apples.'''

print(string)

apples, avocadoes, and apples.
bananas, grapes, and guavas.
lemons, mangoes, and apples.
melons, oranges, and pineapples.
strawberries, watermelons, and apples.


In [23]:
re.findall("apples\.$", string)

['apples.']

In [24]:
re.findall("apples\.$", string, flags = re.MULTILINE)

['apples.', 'apples.', 'apples.', 'apples.']

In [25]:
re.search("apples.$", string)

<re.Match object; span=(153, 160), match='apples.'>

In [26]:
re.search("apples.$", string, flags = re.MULTILINE)

<re.Match object; span=(23, 30), match='apples.'>

In [27]:
re.split("apples.$", string)

['apples, avocadoes, and apples.\nbananas, grapes, and guavas.\nlemons, mangoes, and apples.\nmelons, oranges, and pineapples.\nstrawberries, watermelons, and ',
 '']

In [28]:
re.split("apples.$", string, flags = re.MULTILINE)

['apples, avocadoes, and ',
 '\nbananas, grapes, and guavas.\nlemons, mangoes, and ',
 '\nmelons, oranges, and pine',
 '\nstrawberries, watermelons, and ',
 '']

In [29]:
re.split("apples.$", string, maxsplit = 2, flags = re.MULTILINE)

['apples, avocadoes, and ',
 '\nbananas, grapes, and guavas.\nlemons, mangoes, and ',
 '\nmelons, oranges, and pineapples.\nstrawberries, watermelons, and apples.']

In [30]:
print(re.sub("apples.$", "mangoes.", string))

apples, avocadoes, and apples.
bananas, grapes, and guavas.
lemons, mangoes, and apples.
melons, oranges, and pineapples.
strawberries, watermelons, and mangoes.


In [31]:
print(re.sub("apples.$", "mangoes.", string, flags = re.MULTILINE))

apples, avocadoes, and mangoes.
bananas, grapes, and guavas.
lemons, mangoes, and mangoes.
melons, oranges, and pinemangoes.
strawberries, watermelons, and mangoes.


In [32]:
print(re.sub("apples.$", "mangoes.", string, count = 2, flags = re.MULTILINE))

apples, avocadoes, and mangoes.
bananas, grapes, and guavas.
lemons, mangoes, and mangoes.
melons, oranges, and pineapples.
strawberries, watermelons, and apples.


$\hspace{1in}$

`\Z` matches only at the end of the string.

In [33]:
string = '''apples, avocadoes, and apples.
bananas, grapes, and guavas.
lemons, mangoes, and apples.
melons, oranges, and pineapples.
strawberries, watermelons, and apples.'''

print(string)

apples, avocadoes, and apples.
bananas, grapes, and guavas.
lemons, mangoes, and apples.
melons, oranges, and pineapples.
strawberries, watermelons, and apples.


In [34]:
re.findall("apples.\Z", string)

['apples.']

In [35]:
re.findall("apples.\Z", string, flags = re.MULTILINE)

['apples.']

In [36]:
re.search("apples.\Z", string)

<re.Match object; span=(153, 160), match='apples.'>

In [37]:
re.search("apples.\Z", string, flags = re.MULTILINE)

<re.Match object; span=(153, 160), match='apples.'>

In [38]:
re.split("apples.\Z", string)

['apples, avocadoes, and apples.\nbananas, grapes, and guavas.\nlemons, mangoes, and apples.\nmelons, oranges, and pineapples.\nstrawberries, watermelons, and ',
 '']

In [39]:
re.split("apples.\Z", string, flags = re.MULTILINE)

['apples, avocadoes, and apples.\nbananas, grapes, and guavas.\nlemons, mangoes, and apples.\nmelons, oranges, and pineapples.\nstrawberries, watermelons, and ',
 '']

In [40]:
print(re.sub("apples.\Z", "mangoes", string))

apples, avocadoes, and apples.
bananas, grapes, and guavas.
lemons, mangoes, and apples.
melons, oranges, and pineapples.
strawberries, watermelons, and mangoes


In [41]:
print(re.sub("apples.\Z", "mangoes", string, flags = re.MULTILINE))

apples, avocadoes, and apples.
bananas, grapes, and guavas.
lemons, mangoes, and apples.
melons, oranges, and pineapples.
strawberries, watermelons, and mangoes


$\hspace{1in}$

`\b` matches the empty string, but only at the beginning or end of a word.

In [42]:
string = "apples, best, es, estimate, and es!"

print(string)

apples, best, es, estimate, and es!


In [43]:
re.findall(r"\bes", string)

['es', 'es', 'es']

In [44]:
re.findall(r"es\b", string)

['es', 'es', 'es']

In [45]:
re.findall(r"\bes\b", string)

['es', 'es']

In [46]:
re.search(r"\bes", string)

<re.Match object; span=(14, 16), match='es'>

In [47]:
re.search(r"es\b", string)

<re.Match object; span=(4, 6), match='es'>

In [48]:
re.search(r"\bes\b", string)

<re.Match object; span=(14, 16), match='es'>

In [49]:
re.split(r"\bes", string)

['apples, best, ', ', ', 'timate, and ', '!']

In [50]:
re.split(r"es\b", string)

['appl', ', best, ', ', estimate, and ', '!']

In [51]:
re.split(r"\bes\b", string)

['apples, best, ', ', estimate, and ', '!']

In [52]:
print(re.sub(r"\bes", "?????", string))

apples, best, ?????, ?????timate, and ?????!


In [53]:
print(re.sub(r"es\b", "?????", string))

appl?????, best, ?????, estimate, and ?????!


In [54]:
print(re.sub(r"\bes\b", "?????", string))

apples, best, ?????, estimate, and ?????!


$\hspace{1in}$

`\B` matches the empty string, but only when it is not at the beginning or end of a word.

In [55]:
string = "apples, best, es, estimate, and es!"

print(string)

apples, best, es, estimate, and es!


In [56]:
re.findall("\Bes", string)

['es', 'es']

In [57]:
re.findall("es\B", string)

['es', 'es']

In [58]:
re.findall("\Bes\B", string)

['es']

In [59]:
re.search("\Bes", string)

<re.Match object; span=(4, 6), match='es'>

In [60]:
re.search("es\B", string)

<re.Match object; span=(9, 11), match='es'>

In [61]:
re.search("\Bes\B", string)

<re.Match object; span=(9, 11), match='es'>

In [62]:
re.split("\Bes", string)

['appl', ', b', 't, es, estimate, and es!']

In [63]:
re.split("es\B", string)

['apples, b', 't, es, ', 'timate, and es!']

In [64]:
re.split("\Bes\B", string)

['apples, b', 't, es, estimate, and es!']

In [65]:
re.split("\Bes\B", string)

['apples, b', 't, es, estimate, and es!']

In [66]:
print(re.sub("\Bes", "?????", string))

appl?????, b?????t, es, estimate, and es!


In [67]:
print(re.sub("es\B", "?????", string))

apples, b?????t, es, ?????timate, and es!


In [68]:
print(re.sub("\Bes\B", "?????", string))

apples, b?????t, es, estimate, and es!


Back to the list of [**section links**](#0)

$\hspace{1in}$

<a name="3"></a>
##### **Character Types**

Match specific types of characters such as decimal digits, Unicode word characters, Unicode whitespace characters, and metacharacters.

. matches any character except a newline.

In [69]:
string = "23 apples"

print(string)

23 apples


In [70]:
re.findall(".pp.", string)

['appl']

In [71]:
re.search(".pp.", string)

<re.Match object; span=(3, 7), match='appl'>

In [72]:
re.split(".pp.", string)

['23 ', 'es']

In [73]:
re.sub(".pp.", "?", string)

'23 ?es'

$\hspace{1in}$

`\d` matches a decimal digit.

In [74]:
string = "23 apples"

print(string)

23 apples


In [75]:
re.findall("\d", string)

['2', '3']

In [76]:
re.search("\d", string)

<re.Match object; span=(0, 1), match='2'>

In [77]:
re.split("\d", string)

['', '', ' apples']

In [78]:
re.sub("\d", "?", string)

'?? apples'

$\hspace{1in}$

`\D` matches a character which is not a decimal digit.

In [79]:
string = "23 apples"

print(string)

23 apples


In [80]:
re.findall("\D", string)

[' ', 'a', 'p', 'p', 'l', 'e', 's']

In [81]:
re.search("\D", string)

<re.Match object; span=(2, 3), match=' '>

In [82]:
re.split("\D", string)

['23', '', '', '', '', '', '', '']

In [83]:
re.sub("\D", "?", string)

'23???????'

$\hspace{1in}$

`\w` matches a Unicode word character. This includes all Unicode alphanumeric characters as well as the underscore `_`.

In [84]:
string = "23 apples"

print(string)

23 apples


In [85]:
re.findall("\w", string)

['2', '3', 'a', 'p', 'p', 'l', 'e', 's']

In [86]:
re.search("\w", string)

<re.Match object; span=(0, 1), match='2'>

In [87]:
re.split("\w", string)

['', '', ' ', '', '', '', '', '', '']

In [88]:
re.sub("\w", "?", string)

'?? ??????'

$\hspace{1in}$

`\W` matches a character which is not a Unicode word character.

In [89]:
string = "23 apples"

print(string)

23 apples


In [90]:
re.findall("\W", string)

[' ']

In [91]:
re.search("\W", string)

<re.Match object; span=(2, 3), match=' '>

In [92]:
re.split("\W", string)

['23', 'apples']

In [93]:
re.sub("\W", "?", string)

'23?apples'

$\hspace{1in}$

`\s` matches a Unicode whitespace character.

In [94]:
string = '''23 apples
45 bananas'''

print(string)

23 apples
45 bananas


In [95]:
re.findall("\s", string)

[' ', '\n', ' ']

In [96]:
re.search("\s", string)

<re.Match object; span=(2, 3), match=' '>

In [97]:
re.split("\s", string)

['23', 'apples', '45', 'bananas']

In [98]:
re.sub("\s", "?", string)

'23?apples?45?bananas'

$\hspace{1in}$

`\S` matches a character which is not a Unicode whitespace character.

In [99]:
string = '''23 apples
45 bananas'''

print(string)

23 apples
45 bananas


In [100]:
re.findall("\S", string)

['2',
 '3',
 'a',
 'p',
 'p',
 'l',
 'e',
 's',
 '4',
 '5',
 'b',
 'a',
 'n',
 'a',
 'n',
 'a',
 's']

In [101]:
re.search("\S", string)

<re.Match object; span=(0, 1), match='2'>

In [102]:
re.split("\S", string)

['', '', ' ', '', '', '', '', '', '\n', '', ' ', '', '', '', '', '', '', '']

In [103]:
re.sub("\S", "?", string)

'?? ??????\n?? ???????'

$\hspace{1in}$

`\metacharacter` matches a metacharacter.

In [104]:
string = "2^3 = 8"

print(string)

2^3 = 8


In [105]:
re.findall("\^", string)

['^']

In [106]:
re.search("\^", string)

<re.Match object; span=(1, 2), match='^'>

In [107]:
re.split("\^", string)

['2', '3 = 8']

In [108]:
re.sub("\^", "?", string)

'2?3 = 8'

Back to the list of [**section links**](#0)

$\hspace{1in}$

<a name="4"></a>
##### **Character Classes Indicated by `[]`**

Matches sets or ranges of characters.

Matches characters that are listed individually.

In [109]:
string = "ball, basketball, bee, bell, bill, bird, and symbol"

print(string)

ball, basketball, bee, bell, bill, bird, and symbol


In [110]:
re.findall("b[ao]l", string)

['bal', 'bal', 'bol']

In [111]:
re.search("b[ao]l", string)

<re.Match object; span=(0, 3), match='bal'>

In [112]:
re.split("b[ao]l", string)

['', 'l, basket', 'l, bee, bell, bill, bird, and sym', '']

In [113]:
re.sub("b[ao]l", "?", string)

'?l, basket?l, bee, bell, bill, bird, and sym?'

$\hspace{1in}$

Matches a range of characters.

In [114]:
string = "ball, basketball, bee, bell, bill, bird, and symbol"

print(string)

ball, basketball, bee, bell, bill, bird, and symbol


In [115]:
re.findall("b[a-o]l", string)

['bal', 'bal', 'bel', 'bil', 'bol']

In [116]:
re.search("b[a-o]l", string)

<re.Match object; span=(0, 3), match='bal'>

In [117]:
re.split("b[a-o]l", string)

['', 'l, basket', 'l, bee, ', 'l, ', 'l, bird, and sym', '']

In [118]:
re.sub("b[a-o]l", "?", string)

'?l, basket?l, bee, ?l, ?l, bird, and sym?'

$\hspace{1in}$

Matches characters except the characters that follow `^` when `^` is the first character inside `[]`.

In [119]:
string = "ball, basketball, bee, bell, bill, bird, and symbol"

print(string)

ball, basketball, bee, bell, bill, bird, and symbol


In [120]:
re.findall("b[^ao]l", string)

['bel', 'bil']

In [121]:
re.search("b[^ao]l", string)

<re.Match object; span=(23, 26), match='bel'>

In [122]:
re.split("b[^ao]l", string)

['ball, basketball, bee, ', 'l, ', 'l, bird, and symbol']

In [123]:
re.sub("b[^ao]l", "?", string)

'ball, basketball, bee, ?l, ?l, bird, and symbol'

$\hspace{1in}$

Matches metacharacters in `[]`.

In [124]:
string = "2^3 = 8, 2.3 plus 3.2 = 5.5, 2-3 is 2 followed by 3."

print(string)

2^3 = 8, 2.3 plus 3.2 = 5.5, 2-3 is 2 followed by 3.


In [125]:
re.findall("2[\^\.\-]3", string)

['2^3', '2.3', '2-3']

In [126]:
re.search("2[\^\.\-]3", string)

<re.Match object; span=(0, 3), match='2^3'>

In [127]:
re.split("2[\^\.\-]3", string)

['', ' = 8, ', ' plus 3.2 = 5.5, ', ' is 2 followed by 3.']

In [128]:
re.sub("2[\^\.\-]3", "?", string)

'? = 8, ? plus 3.2 = 5.5, ? is 2 followed by 3.'

Back to the list of [**section links**](#0)

$\hspace{1in}$

<a name="5"></a>
##### **Repetition**

Matches repeated characters.

`?` matches the expression to its left zero or one time.

In [129]:
string = "alloy, balloon, ballot, boat, calories, carabao"

print(string)

alloy, balloon, ballot, boat, calories, carabao


In [130]:
re.findall("al?o", string)

['alo', 'ao']

In [131]:
re.search("al?o", string)

<re.Match object; span=(31, 34), match='alo'>

In [132]:
re.split("al?o", string)

['alloy, balloon, ballot, boat, c', 'ries, carab', '']

In [133]:
re.sub("al?o", "?", string)

'alloy, balloon, ballot, boat, c?ries, carab?'

$\hspace{1in}$

`*` matches the expression to its left zero or more times.

In [134]:
string = "alloy, balloon, ballot, boat, calories, carabao"

print(string)

alloy, balloon, ballot, boat, calories, carabao


In [135]:
re.findall("al*o", string)

['allo', 'allo', 'allo', 'alo', 'ao']

In [136]:
re.search("al*o", string)

<re.Match object; span=(0, 4), match='allo'>

In [137]:
re.split("al*o", string)

['', 'y, b', 'on, b', 't, boat, c', 'ries, carab', '']

In [138]:
re.sub("al*o", "?", string)

'?y, b?on, b?t, boat, c?ries, carab?'

$\hspace{1in}$

`+` matches the expression to its left one or more times.

In [139]:
string = "alloy, balloon, ballot, boat, calories, carabao"

print(string)

alloy, balloon, ballot, boat, calories, carabao


In [140]:
re.findall("al+o", string)

['allo', 'allo', 'allo', 'alo']

In [141]:
re.search("al+o", string)

<re.Match object; span=(0, 4), match='allo'>

In [142]:
re.split("al+o", string)

['', 'y, b', 'on, b', 't, boat, c', 'ries, carabao']

In [143]:
re.sub("al+o", "?", string)

'?y, b?on, b?t, boat, c?ries, carabao'

$\hspace{1in}$

`{m}` matches the expression to its left `m` times.

In [144]:
string = "123456789, 123333456789, 1233456789, 12333456789, 12334566789"

print(string)

123456789, 123333456789, 1233456789, 12333456789, 12334566789


In [145]:
re.findall("3{2}", string)

['33', '33', '33', '33', '33']

In [146]:
re.search("3{2}", string)

<re.Match object; span=(13, 15), match='33'>

In [147]:
re.split("3{2}", string)

['123456789, 12', '', '456789, 12', '456789, 12', '3456789, 12', '4566789']

In [148]:
re.sub("3{2}", "?", string)

'123456789, 12??456789, 12?456789, 12?3456789, 12?4566789'

$\hspace{1in}$

`{m,}` matches the expression to its left `m` or more times.

In [149]:
string = "123456789, 123333456789, 1233456789, 12333456789, 12334566789"

print(string)

123456789, 123333456789, 1233456789, 12333456789, 12334566789


In [150]:
re.findall("3{2,}", string)

['3333', '33', '333', '33']

In [151]:
re.search("3{2,}", string)

<re.Match object; span=(13, 17), match='3333'>

In [152]:
re.split("3{2,}", string)

['123456789, 12', '456789, 12', '456789, 12', '456789, 12', '4566789']

In [153]:
re.sub("3{2,}", "?", string)

'123456789, 12?456789, 12?456789, 12?456789, 12?4566789'

$\hspace{1in}$

`{m,n}` matches the expression to its left between `m` and `n` times.

In [154]:
string = "123456789, 123333456789, 1233456789, 12333456789, 12334566789"

print(string)

123456789, 123333456789, 1233456789, 12333456789, 12334566789


In [155]:
re.findall("3{1,3}", string)

['3', '333', '3', '33', '333', '33']

In [156]:
re.search("3{1,3}", string)

<re.Match object; span=(2, 3), match='3'>

In [157]:
re.split("3{1,3}", string)

['12', '456789, 12', '', '456789, 12', '456789, 12', '456789, 12', '4566789']

In [158]:
re.sub("3{1,3}", "?", string)

'12?456789, 12??456789, 12?456789, 12?456789, 12?4566789'

Back to the list of [**section links**](#0)

$\hspace{1in}$

<a name="6"></a>
##### **Grouping and Backreferences**

Capture a specific part of a string that is matched by a regular expression.

`()` matches the regular expression inside, indicates the start and end of a group, and the string matched by the group is captured and can be referenced later.

`(?:)` matches the regular expression inside, indicates the start and end of a group, but the string matched by the group is not captured and is ignored in any reference later.

`\n` references a previous capture where `n` is the group index starting at 1.

In [159]:
stringList = ["12", "12342", "1562", "1234562342", "123456562", "123456234562"]
stringList

['12', '12342', '1562', '1234562342', '123456562', '123456234562']

In [160]:
[s for s in stringList if re.search(r"\d*(234)\d*\1", s)]

['1234562342', '123456234562']

In [161]:
[s for s in stringList if re.search(r"\d*(56)\d*\1", s)]

['123456562', '123456234562']

In [162]:
[s for s in stringList if re.search(r"\d*(234)\d*(56)\1\2", s)]

['123456234562']

In [163]:
[s for s in stringList if re.search(r"\d*(?:234)\d*(56)\1", s)]

['123456562']

$\hspace{1in}$

`(?P<name>)` creates a named group where the string matched by the group is captured and can be referenced later using the name `name`.

`(?P=name)` references a previous capture where `name` is the name of the captured group.

In [164]:
stringList = ["12", "12342", "1562", "1234562342", "123456562", "123456234562"]
stringList

['12', '12342', '1562', '1234562342', '123456562', '123456234562']

In [165]:
[s for s in stringList if re.search(r"\d*(?P<this>234)\d*(?P=this)", s)]

['1234562342', '123456234562']

In [166]:
[s for s in stringList if re.search(r"\d*(?P<that>56)\d*(?P=that)", s)]

['123456562', '123456234562']

In [167]:
[s for s in stringList if re.search(r"\d*(?P<this>234)\d*(?P<that>56)(?P=this)(?P=that)", s)]

['123456234562']

In [168]:
[s for s in stringList if re.search(r"\d*(?P<this>234)\d*(?P<that>56)(?P=that)", s)]

['123456562']

$\hspace{1in}$

`(|)` matches any one of two regular expressions inside.

In [169]:
stringList = ["12", "12342", "1562", "1234562342", "123456562", "123456234562"]
stringList

['12', '12342', '1562', '1234562342', '123456562', '123456234562']

In [170]:
[s for s in stringList if re.search(r"\d*(234|56)\d*", s)]

['12342', '1562', '1234562342', '123456562', '123456234562']

In [171]:
[s for s in stringList if re.search(r"\d*(234|56)\d*\1", s)]

['1234562342', '123456562', '123456234562']

$\hspace{1in}$

`(?i:)` ignores case when matching the regular expression inside.

In [172]:
stringList = ["apple", "aPpLe", "banana", "mango", "pineapple", "pineApPlE"]
stringList

['apple', 'aPpLe', 'banana', 'mango', 'pineapple', 'pineApPlE']

In [173]:
[s for s in stringList if re.search("(apple)", s)]

['apple', 'pineapple']

In [174]:
[s for s in stringList if re.search("(?i:apple)", s)]

['apple', 'aPpLe', 'pineapple', 'pineApPlE']

Back to the list of [**section links**](#0)

$\hspace{1in}$

<a name="7"></a>
##### **Lookahead**

Indicate that specific characters must appear before or after a match without including the characters in the match.

`(?=)` looks ahead at the next characters without using them in the match.

In [175]:
stringList = ["angle", "mango", "mangrove", "orange", "wrangler"]
stringList

['angle', 'mango', 'mangrove', 'orange', 'wrangler']

In [176]:
[s for s in stringList if re.search("\w*ang(?=l)\w*", s)]

['angle', 'wrangler']

$\hspace{1in}$

`(?!)` looks ahead at the next characters to not match on.

In [177]:
stringList = ["angle", "mango", "mangrove", "orange", "wrangler"]
stringList

['angle', 'mango', 'mangrove', 'orange', 'wrangler']

In [178]:
[s for s in stringList if re.search("\w*ang(?!l)\w*", s)]

['mango', 'mangrove', 'orange']

$\hspace{1in}$

`(?<=)` looks at previous characters without using them in the match.

In [179]:
stringList = ["angle", "mango", "mangrove", "orange", "wrangler"]
stringList

['angle', 'mango', 'mangrove', 'orange', 'wrangler']

In [180]:
[s for s in stringList if re.search("\w*(?<=m)ang\w*", s)]

['mango', 'mangrove']

$\hspace{1in}$

`(?<!)` looks at previous characters to not match on.

In [181]:
stringList = ["angle", "mango", "mangrove", "orange", "wrangler"]
stringList

['angle', 'mango', 'mangrove', 'orange', 'wrangler']

In [182]:
[s for s in stringList if re.search("\w*(?<!m)ang\w*", s)]

['angle', 'orange', 'wrangler']

Back to the list of [**section links**](#0)