<h2> Regular Expression Using Python </h2>

In [1]:
import re

In [2]:
# the r at the beginning of the string is indicating a "raw string", we are using 'r' because when we use '\' in a regular string it is often treated as an escape character.
path = r"E:\Documents\RegEx" 


In [3]:
string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."
s = r"\d{3}" # to find consecutive 4 digit number
type(s)

str

In [4]:
t = re.compile(s)
type(t)

re.Pattern

The `re.compile()` method compiles a string into a Regular expression object.

In [5]:
result = re.findall(t, string)
print(result)

['600', '199', '600']


<b>The <i>`re.findall()`</i> method of the re (regular expression) module in Python is used to find all occurrences of a specified pattern in a given string, return all non-overlapping matches of pattern in string, as a list of strings. It doesn't require compilation before use like some other regular expression methods. We can use <i>`re.findall`</i> directly without needing to compile the regular expression pattern using <i>`re.compile`.</i> <b>

In [6]:
result = re.search(r"\d{3}", string)
result

<re.Match object; span=(15, 18), match='600'>

<b>`re.search()` method either returns None (if the pattern doesn’t match), or a re.MatchObject that contains information about the matching part of the string. This method stops after the first match, so this is best suited for testing a regular expression more than extracting data.</b>

In [7]:
result = re.match(r"\w{3}", string)
result

<re.Match object; span=(0, 3), match='The'>

<b>This function only checks for a match at the beginning of the string. This means that `re. match()` will return the match found in the first line of the string, but not those found in any other line, in which case it will return null .</b>

In [8]:
#this command will look for 285 characters, except for a new line '\n'.
result = re.fullmatch(r".{285}", string) 

result

<re.Match object; span=(0, 285), match='The Euro STOXX 600 index, which tracks all stock >

<b><u><i>Syntax: -</u> `re.fullmatch(pattern, string, flags=0)`</i><br>
`re.fullmatch()` returns a match object if and only if the `entire string` matches the pattern. Otherwise, it will return None. The flag at the end is optional and can be used to ignore cases etc. </b>

In [9]:
#this command will look for 3 digit numbers
result = re.findall(r"\d{3}", string)  

print(result)

['600', '199', '600']


The <b>`findall()`</b> method is used to find all non-overlapping matches of a pattern in a string. The string is scanned left-to-right, and matches are returned in the order found.

In [10]:
result = string.split(" ")
print(result)

['The', 'Euro', 'STOXX', '600', 'index,', 'which', 'tracks', 'all', 'stock', 'markets', 'across', 'Europe', 'including', 'the', 'FTSE,', 'fell', 'by', '11.48%', '–', 'the', 'worst', 'day', 'since', 'it', 'launched', 'in', '1998.', 'The', 'panic', 'selling', 'prompted', 'by', 'the', 'coronavirus', 'has', 'wiped', '£2.7tn', 'off', 'the', 'value', 'of', 'STOXX', '600', 'shares', 'since', 'its', 'all-time', 'peak', 'on', '19', 'February.']


In [11]:
# \s matches for one or more whitespace characters, such as spaces, tabs and newlines
result = re.split("\s", string) 

print(result)

['The', 'Euro', 'STOXX', '600', 'index,', 'which', 'tracks', 'all', 'stock', 'markets', 'across', 'Europe', 'including', 'the', 'FTSE,', 'fell', 'by', '11.48%', '–', 'the', 'worst', 'day', 'since', 'it', 'launched', 'in', '1998.', 'The', 'panic', 'selling', 'prompted', 'by', 'the', 'coronavirus', 'has', 'wiped', '£2.7tn', 'off', 'the', 'value', 'of', 'STOXX', '600', 'shares', 'since', 'its', 'all-time', 'peak', 'on', '19', 'February.']


The `re.split()` function in Python is used to split a string based on the occurrences of a regular expression pattern. The function takes two arguments: the regular expression pattern and the string to be split.

In [12]:
 # here we are replacing all the word, with word length of 2 or more with all capital letters in it, with the word INDEX
result = re.sub(r"[A-Z]{2,}", "INDEX", string)

result

'The Euro INDEX 600 index, which tracks all stock markets across Europe including the INDEX, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of INDEX 600 shares since its all-time peak on 19 February.'

In [13]:
# here we are doing the same as above but only changing the first 2 occurrences only
result = re.sub(r"[A-Z]{2,}", "INDEX", string, 2)

result

'The Euro INDEX 600 index, which tracks all stock markets across Europe including the INDEX, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February.'

The `re.sub()` function in Python is used to substitute a string for all occurrences of a regular expression pattern in another string. The function takes three arguments: <i>the regular expression pattern, the replacement string, and the string to be searched</i>.

The full syntax of the re.sub() function is as follows:<br>
<b>re.sub(pattern, repl, string, count=0, flags=0)</b>
<br><br>
<i><b>pattern:</b> The regular expression pattern to be searched for.<br>
<b>repl:</b> The replacement string.<br>
<b>string:</b> The string to be searched.<br>
<b>count:</b> The maximum number of replacements to make. If count is 0 (the default), then all occurrences of the pattern will be replaced.<br>
<b>flags:</b> A set of flags that modify the behavior of the regular expression pattern.<i><br>

In [14]:
result = re.subn(r"[A-Z]{2,}", "INDEX", string)

result

('The Euro INDEX 600 index, which tracks all stock markets across Europe including the INDEX, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of INDEX 600 shares since its all-time peak on 19 February.',
 3)

The `re.subn()` function returns a <i><b>tuple</b></i> of two elements: the new string and the number of replacements that were made.


In [15]:
# here we are trying to find two different pattern, one that ends with 'ex' and second one that starts with 2 digits followed with multiple characters
result = re.search(r".+\s(.+ex).+(\d\d\s.+).", string)
result.groups()

('index', '19 February')

The `groups()` method in Python is used to return a tuple of all the captured groups in a regular expression match. A captured group is a part of the regular expression pattern that is enclosed in parentheses. The `groups()` method returns a tuple of strings, where each string is the text that matched the corresponding captured group.<br>

Here is a breakdown of the pattern:<br><br>

`.+` : Matches any character one or more times.<br>
`\s` : Matches a whitespace character.<br>
`(.+ex)` : Matches any sequence of characters that does not include a newline character, and the . character matches any character. The + character matches the preceding character one or more times. The ex part is a literal match for the letters "ex".<br>
`(\d\d\s.+)` : Matches a sequence of two digits followed by a whitespace character and any sequence of characters. The \d character matches a digit. The + character matches the preceding character one or more times.<br>

In [16]:
# in the group the numbering start from 1, 1 represent the first matched pattern from groups method and so on
result.group(1)

'index'

In [17]:
result.group(2)

'19 February'

In [18]:
result = re.findall(r".+\s(.+ex).+(\d\d\s.+).", string)
result

[('index', '19 February')]

So by using `findall` we are getting the same result, but we won't be able to use the `group()` and `groups()` method on this.

In [19]:
result = re.findall(r"\s(\d\d\s.+).", string)
result

['19 February']

In [20]:
# revised version of the above command
result = re.search(r"(\w+ex).+(\d\d\s.+).", string)

result.groups()

('index', '19 February')

In [21]:
result.start(1)

19

In [22]:
result.start(2)

273

`start()` method in conjunction with capturing groups to find the starting position of a matched substring for a specific group within a regular expression match. This method is useful when you want to know where a particular capturing group's match begins within the input string.

In [23]:
result = re.findall(r"the", string, re.I)
result

['The', 'the', 'the', 'The', 'the', 'the']

`re.I (re.IGNORECASE):`<br>

This flag makes the regular expression matching case-insensitive. It allows the regex pattern to match both uppercase and lowercase letters without distinction.

In [24]:
string2 = "Hello \n Python"
result = re.search(r".+", string2)
result

<re.Match object; span=(0, 6), match='Hello '>

In [25]:
string2 = "Hello \n Python"
result = re.search(r".+", string2, re.S)
result

<re.Match object; span=(0, 14), match='Hello \n Python'>

`re.S (re.DOTALL):`<br>

By default, the `.` metacharacter in regular expressions matches any character except a newline `(\n)`. However, when you use the `re.S` flag, it allows the `.` to match newline characters as well.

In [26]:
result = re.search(r""".+\s #Beginning of the string
                   (.+ex) # Searching for index
                   .+(\d\d\s.+). #Date at the end""", string, re.X)
result.groups()

('index', '19 February')

`re.X (re.VERBOSE):`<br>

The re.X flag allows you to write regular expressions in a more human-readable and organized way. It ignores whitespace and treats # as a comment character within the pattern.

In [27]:
result = re.search(r"(.+)", string)
result.group()

'The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February.'

`r"(.+)"`: This is the regular expression pattern we are searching for within the string. <br>
<br>
`.`: Matches any character except a newline character.<br>
`+`: Matches <b>one or more</b> of the preceding element. So, `.+` matches one or more of any character.<br>
`(` and `)`: Parentheses are used to create a capturing group. In this case, `(.+)` captures one or more of any character.

In [28]:
result = re.search(r"^\w{3}", string)
result

<re.Match object; span=(0, 3), match='The'>

In [29]:
string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998.\nThe panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."

In [30]:
print(string)

The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998.
The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February.


In [31]:
# here we are searching for 3 letter word at the beginning of the sentence
result = re.findall(r"^\w{3}", string, re.M)
result

['The', 'The']

`^`  is called a metacharacter and serves as an anchor that specifies the start of a line or string. The <b>caret</b> `(^)` is used to match a pattern only if it appears at the beginning of a line or the start of the entire input string, depending on the context in which it is used.<br><br>
`re.M (or re.MULTILINE)`: This flag specifies that the `^` anchor should match the <i>start of each line within the input string</i>, rather than just the start of the entire string. It makes the regex pattern operate in "multi-line" mode.

In [32]:
# here we are searching for a word that consist of at least 2 character(letter, digit or underscore) at the end of each line
result = re.findall(r"\s(\w{2,})\W$", string, re.M)
result

['1998', 'February']

`r"\s(\w{2,})\W$"`: This is the regular expression pattern that we are searching for:<br><br>

`\s`: Matches a whitespace character (space, tab, newline, etc.).<br>

`(\w{2,})`: This is a capturing group that matches two or more word characters. The `\w` matches word characters, and `{2,}` specifies that there should be at least two or more consecutive word characters. The parentheses `( )` indicate a capturing group, which allows you to extract the matched word.

`\W`: Matches a non-word character (anything that is not a word character).

`$`: Anchors the pattern to the end of a line.

In [33]:
# here we trying to find a minimum of two digit number upto possible matches, the third digit is optional
result = re.findall(r"\d\d\d*", string)
result   

['600', '11', '48', '1998', '600', '19']

In [34]:
result = re.findall(r"E.* ", string)
result

['Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in ']

`r"E.* "`: This is the regular expression pattern that we are searching for:

`E`: Matches the character 'E' literally.

`.*`: Matches zero or more of any character (. matches any character, and `* matches zero or more of the preceding element`).

`" "`: Matches a space character.

In [35]:
result = re.findall(r"E\w*", string)
result

['Euro', 'Europe', 'E']

 The asterisk `(*)` is a metacharacter that has a special meaning. It is used to specify that the preceding element (character or group) can occur zero or more times. In other words, it quantifies the preceding element as <b>"zero or more occurrences."</b>

In [36]:
# here we are searching for 2 digit number where the 3rd one can be optional
result = re.findall(r"\d\d\d?", string)
result

['600', '11', '48', '199', '600', '19']

`(?)` is a metacharacter in regular expressions with special meaning. <br> It is used to specify that the <b>preceding element</b> in the regex pattern is <b>optional<b>, meaning it can occur <b><i>zero times or one time (at most)</i></b>. 

In [37]:
result = re.findall(r"E.? ", string)
result

['E, ']

In [38]:
result = re.findall(r"E\w?", string)
result

['Eu', 'Eu', 'E']

In [39]:
result = re.findall(r"\d\d\d*", string)
result

['600', '11', '48', '1998', '600', '19']

In [40]:
result = re.findall(r"\d\d\d*?", string)
result

['60', '11', '48', '19', '98', '60', '19']

If we look the above two example, we are seeing a `greedy` (1st example) and a `non-greedy` (2nd example) code.<br> In the 2nd one we are using `'*?'` to limit it to find the minimum no.of times possible.

In [41]:
result = re.findall(r"\d\d\d+", string)
result

['600', '1998', '600']

In [42]:
# this is also a non-greedy example, where we are getting the minimum digits possible
result = re.findall(r"\d\d\d+?", string)
result

['600', '199', '600']

In [43]:
result = re.findall(r"[wxkq]", string)
result

['x', 'w', 'k', 'k', 'k', 'w', 'w', 'k']

Here we are using square bracket `[]` with the characters in it we want to search for, the opertion it's performing on it is `OR` which mean look for `w` OR `x` OR `k` OR `q`.<br>The result we will have will be a list of the characters.

In [44]:
result = re.findall(r"[a-q]", string)
result

['h',
 'e',
 'o',
 'i',
 'n',
 'd',
 'e',
 'h',
 'i',
 'c',
 'h',
 'a',
 'c',
 'k',
 'a',
 'l',
 'l',
 'o',
 'c',
 'k',
 'm',
 'a',
 'k',
 'e',
 'a',
 'c',
 'o',
 'o',
 'p',
 'e',
 'i',
 'n',
 'c',
 'l',
 'd',
 'i',
 'n',
 'g',
 'h',
 'e',
 'f',
 'e',
 'l',
 'l',
 'b',
 'h',
 'e',
 'o',
 'd',
 'a',
 'i',
 'n',
 'c',
 'e',
 'i',
 'l',
 'a',
 'n',
 'c',
 'h',
 'e',
 'd',
 'i',
 'n',
 'h',
 'e',
 'p',
 'a',
 'n',
 'i',
 'c',
 'e',
 'l',
 'l',
 'i',
 'n',
 'g',
 'p',
 'o',
 'm',
 'p',
 'e',
 'd',
 'b',
 'h',
 'e',
 'c',
 'o',
 'o',
 'n',
 'a',
 'i',
 'h',
 'a',
 'i',
 'p',
 'e',
 'd',
 'n',
 'o',
 'f',
 'f',
 'h',
 'e',
 'a',
 'l',
 'e',
 'o',
 'f',
 'h',
 'a',
 'e',
 'i',
 'n',
 'c',
 'e',
 'i',
 'a',
 'l',
 'l',
 'i',
 'm',
 'e',
 'p',
 'e',
 'a',
 'k',
 'o',
 'n',
 'e',
 'b',
 'a']

In [45]:
result = re.findall(r"[1-5]", string)
result

['1', '1', '4', '1', '2', '1']

In [46]:
# we are looking for two consecutive letter, 1st character which lies within a-f, and for the 2nd character c-w
result = re.findall(r"[a-f][c-w]", string)
result

['de',
 'ch',
 'ac',
 'al',
 'ck',
 'ar',
 'et',
 'ac',
 'cl',
 'di',
 'fe',
 'ce',
 'au',
 'ch',
 'ed',
 'an',
 'el',
 'ed',
 'co',
 'av',
 'as',
 'ed',
 'ff',
 'al',
 'ar',
 'es',
 'ce',
 'al',
 'ak',
 'br',
 'ar']

In [47]:
#here are searching for all characters in the string, except X
result = re.findall(r"[^X]", string)
result

['T',
 'h',
 'e',
 ' ',
 'E',
 'u',
 'r',
 'o',
 ' ',
 'S',
 'T',
 'O',
 ' ',
 '6',
 '0',
 '0',
 ' ',
 'i',
 'n',
 'd',
 'e',
 'x',
 ',',
 ' ',
 'w',
 'h',
 'i',
 'c',
 'h',
 ' ',
 't',
 'r',
 'a',
 'c',
 'k',
 's',
 ' ',
 'a',
 'l',
 'l',
 ' ',
 's',
 't',
 'o',
 'c',
 'k',
 ' ',
 'm',
 'a',
 'r',
 'k',
 'e',
 't',
 's',
 ' ',
 'a',
 'c',
 'r',
 'o',
 's',
 's',
 ' ',
 'E',
 'u',
 'r',
 'o',
 'p',
 'e',
 ' ',
 'i',
 'n',
 'c',
 'l',
 'u',
 'd',
 'i',
 'n',
 'g',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'F',
 'T',
 'S',
 'E',
 ',',
 ' ',
 'f',
 'e',
 'l',
 'l',
 ' ',
 'b',
 'y',
 ' ',
 '1',
 '1',
 '.',
 '4',
 '8',
 '%',
 ' ',
 '–',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'w',
 'o',
 'r',
 's',
 't',
 ' ',
 'd',
 'a',
 'y',
 ' ',
 's',
 'i',
 'n',
 'c',
 'e',
 ' ',
 'i',
 't',
 ' ',
 'l',
 'a',
 'u',
 'n',
 'c',
 'h',
 'e',
 'd',
 ' ',
 'i',
 'n',
 ' ',
 '1',
 '9',
 '9',
 '8',
 '.',
 '\n',
 'T',
 'h',
 'e',
 ' ',
 'p',
 'a',
 'n',
 'i',
 'c',
 ' ',
 's',
 'e',
 'l',
 'l',
 'i',
 'n',
 'g',
 ' ',
 'p',
 'r

In [48]:
result = re.findall(r"[(.+?)]", string)
result

['.', '.', '.', '.']

<b>In the square brackets, the special characters loose their power.</b>

In [49]:
# searching for exactly 4 word with boundaries
result = re.findall(r"\b\w{4}\b", string)
result

['Euro', 'FTSE', 'fell', '1998', 'time', 'peak']

 The `\b` metacharacter is used for word boundary assertions, ensuring that the matched words are exactly four characters long and surrounded by word boundaries `(spaces, punctuation, or the beginning/end of the string)`.

In [50]:
# here we are searching for a word whose length is 3 and can go upto 5 max
result = re.findall(r"\b\w{3,5}\b", string)
result

['The',
 'Euro',
 'STOXX',
 '600',
 'index',
 'which',
 'all',
 'stock',
 'the',
 'FTSE',
 'fell',
 'the',
 'worst',
 'day',
 'since',
 '1998',
 'The',
 'panic',
 'the',
 'has',
 'wiped',
 '7tn',
 'off',
 'the',
 'value',
 'STOXX',
 '600',
 'since',
 'its',
 'all',
 'time',
 'peak']

In [51]:
result = re.findall(r"\b\w{3,}\b", string)
result

['The',
 'Euro',
 'STOXX',
 '600',
 'index',
 'which',
 'tracks',
 'all',
 'stock',
 'markets',
 'across',
 'Europe',
 'including',
 'the',
 'FTSE',
 'fell',
 'the',
 'worst',
 'day',
 'since',
 'launched',
 '1998',
 'The',
 'panic',
 'selling',
 'prompted',
 'the',
 'coronavirus',
 'has',
 'wiped',
 '7tn',
 'off',
 'the',
 'value',
 'STOXX',
 '600',
 'shares',
 'since',
 'its',
 'all',
 'time',
 'peak',
 'February']


`Curly braces {}` in regular expressions serve as quantifiers, and they are used to <i>specify the exact number of occurrences of a character or group that we want to match.</i><br> They provide control over the repetition of elements in our regex pattern, making it more precise and flexible.

In [52]:
# here we are searching for 3 digit number OR 4 digit number OR a word with 4 character
result = re.search(r"\d{3}|\d{4}|\b[A-Z]{4}\b",string)
result

<re.Match object; span=(15, 18), match='600'>

The `vertical bar | (pipe)` serves as an `OR` operator. It allows you to specify multiple alternative patterns, and the regular expression engine will match any of those alternatives.

In [53]:
string1 = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998.\nThe panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."

In [54]:
result = re.findall(r"^([A-Z].*?)\s", string1, re.M)
result

['The', 'The']

In [55]:
result = re.findall(r"\A([A-Z].*?)\s", string1, re.M)
result

['The']

 The `\A` metacharacter is used to specify the start of a string. It anchors the pattern to the very beginning of the string and ensures that the match occurs only at the start of the string, not anywhere else within it. Unlike the caret `^` metacharacter, which can match the start of a line within a multiline string,<b><i> \A always matches the start of the entire string.

In [56]:
result = re.findall(r"\W\Z", string1, re.M)
result

['.']

In [57]:
result = re.findall(r"\W$", string1, re.M)
result

['.', '.']

`\Z` is a metacharacter that represents the end of the string. It's similar to the dollar sign `$`, which also represents the end of the string, but there is a subtle difference between them:<br><i> `\Z` matches the end of the string even if there's a trailing newline character, whereas `$` matches the end of the string or the end of a line.

In [58]:
result = re.search(r".+(\b.+ex\b).+(\b[A-Z]{4}\b)", string)
result.groups()

('index', 'FTSE')

In [59]:
result.group(1)

'index'

In [60]:
result = re.search(r".+(?:\b.+ex\b).+(\b[A-Z]{4}\b)", string)
result.groups()

('FTSE',)

`?:` construct is used to create a non-capturing group. A non-capturing group is a way to group and match pattern in a regex without capturing the matched text as a separate group in the result.<br><br>
It is helpful in cases where we need to use grouping for the purpose of applying a quantifier, specifying an alteration, but we dont't want to capture the matched text.

In [61]:
result = re.search(r".+(?P<wordex>\b.+ex\b).+(?P<uppercase>\b[A-Z]{4}\b)", string)
result.groups()

('index', 'FTSE')

Named groups allow us to capture specific patterns within a matched string and assign them meaningful names<i>(wordex, uppercase)</i>.<br><br>
<b>Syntax:</b> -<br>
To create a group, use the syntax`(?P<name> ..)`

In [62]:
result.group("wordex")

'index'

In [63]:
result.group("uppercase")

'FTSE'

In [64]:
result.groupdict()

{'wordex': 'index', 'uppercase': 'FTSE'}

In [65]:
#here we are searching for word which is 5 character long and should end with a space followed by a 3 digit number
result = re.findall(r"([A-Z]{5})\s(?=[0-9]{3})", string)
result

['STOXX', 'STOXX']

In [66]:
# here we are searching for the work 'Euro' which is a part of a bigger word
result = re.findall(r"Euro(?=[a-z]+)", string)
result

['Euro']

A <b> positive lookahead assertion</b> in regular expression is a way to specify a pattern that must be followed by another pattern without including the latter in the match.<br><br>
<b>Syntax: -</b>
To create a positive lookahead assertion, use the syntax `(?= ..)`,  '`...`' represents the pattern that must be followed by the main pattern.


In [67]:
result = re.findall(r"\d(?![5-9]|\D)", string)
result

['6', '0', '1', '6', '0']

In [68]:
result = re.findall(r"\b\w+\b(?!\s)", string)
result

['index', 'FTSE', '11', '48', '1998', '2', 'all', 'February']

<b>Negative lookahead assertion</b> in regular expressions is a way to specify a conditions that must not be met for a match to occur.<br><br>
<b>Syntax: -</b><br>
To create a negative lookahead assertion, use the syntax `(?! ...)`, '`...`' represents the pattern we want to ensure is not present.

In [69]:
result = re.findall(r"(?<=\s)\d{1,}", string)
result

['600', '11', '1998', '600', '19']

In [70]:
result = re.findall(r"(?<=,\s)\b\w+\b", string)
result

['which', 'fell']

<b>Positive lookbehind assertions</b> in regex is a mechanism that allows us to specify a condition that must be met immediately before the main part of the pattern is matched.<br><br>
<b>Syntax: -</b><br>
To create a positive lookbehind assertion, use the syntax `(?<= ..)`,  `'...'` represents the pattern we want to ensure is present.

In [71]:
result = re.findall(r"(?<!\s)\d{1,}", string)
result

['00', '1', '48', '998', '2', '7', '00', '9']

In [72]:
result = re.findall(r"(?<!x)x(?!x)", string, re.I)
result

['x']

<b>Negative lookbehind assertions</b> in regex is a mechanism that allows us to specify a condition that must <b>not</b> be met immediately before the main part of the pattern is matched.<br><br>
<b>Syntax: -</b><br>
To create a negative lookbehind assertion, use the syntax `(?<! ...)`,  `'...'` represents the pattern we want to ensure is not present.