# <center> RegEx in Python</center>

![](images/memes/meme1.jpg)


> ### *A regular expression is a sequence of characters that define a search pattern.*

## Understanding the Regular Expression Syntax

A regex pattern is a simple sequence of characters. The components of a regex pattern are:

- **literals (ordinary characters)**: these characters carry no special meaning and are processed as it is.

- **metacharacters (special characters)**: these characters carry a special meaning and processed in some special way.


![](images/components.png)

Let's start with a simple example.

Consider that we have got the list of several filenames in a folder.

```
file1.xml
file1.txt
file2.txt
file15.xml
file5.docx
file60.txt
file.txt
file5.txt
```

And we want to filter out only those filenames which follow a specific pattern, i.e.  `file<one or more digits>.txt`.

> Let's try to do this on an online tool to learn, build, & test Regular Expressions (RegEx / RegExp), [RegExr](https://regexr.com).

So, the regular expression we need here is:

`file\d+\.txt`

This expression can be understood as follows:

- `file` is a substring of literals which are matched with the input as it is.

- `\d` is a metacharacter which instructs the software to match this position with a digit (0-9).

- `+` is also a metacharacter which instructs the software to match one or more iterations of the preceeding character (`\d` in this case)

- `\.` is a literal. `.` is a metacharacter but we want to use it as a literal in this case. Hence, we escape it using `\` character.

- `txt` is a substring of literals which are matched with the input as it is.

![](images/example1.png)

# Getting started with RegEx in Python

The **[re](https://docs.python.org/3/howto/regex.html)** module provides an interface to the regular expression engine, allowing you to **compile regular expressions into objects and then perform matches with them**.

In [2]:
import re

## 1. Compiling Regular Expressions

Regular expressions are **compiled** into `Pattern` objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.


### `re.compile(pattern, flags=0)`

Compile a regular expression pattern, returning a pattern object.
- `re.compile()` also accepts an optional `flags` argument, used to enable various special features and syntax variations. [More about flags](http://xahlee.info/python/python_regex_flags.html)

<br>

If we use the flag `re.I` (short for `re.IGNORECASE`) we ignore letter case in the regex pattern.

## 2. Performing Matches

So, we have created a `Pattern` object representing a compiled regular expression using `re.compile()` method.

Pattern objects have several methods and attributes.

Here is the list of different methods used for performing matches:


<table style="border: 1px solid black; font-size:15px;">
<thead>
    <th>Method/Attribute</th>
    <th>Purpose</th>
</thead>
    
<tbody>
<tr>
    <td>match()</td>
    <td>Determine if the RE matches at the beginning of the string.</td>
</tr>
    
<tr>
    <td>search()</td>
    <td>Scan through a string, looking for any location where this RE matches.</td>
</tr>

<tr>
    <td>findall()</td>
    <td>Find all substrings where the RE matches, and returns them as a list.</td>
</tr>

<tr>
    <td>finditer()</td>
    <td>Find all substrings where the RE matches, and returns them as an iterator.</td>
</tr>
</tbody>
</table>

Let us go through them one by one:

### `match(string[, pos[, endpos]])`

- A match is checked only at the beginning (by default).

- Checking starts from `pos` index of the string. (default is 0)

- Checking is done until `endpos` index of string. `endpos` is set as a very large integer (by default).

- Returns `None` if no match found.

- If a match is found, a `Match` object is returned, containing information about the match: where it starts and ends, the substring it matched, and more.

In [8]:
#pattern = re.compile("Hello")
pattern = re.compile("Hello",re.I)
#pattern = re.compile("Hello",re.I)
match = pattern.match("hello world")
print(match)

<re.Match object; span=(0, 5), match='hello'>


### `search(string[, pos[, endpos]])`

- A match is checked throughtout the string.

- Same behaviour of `pos` and `endpos` as the `match()` function.

- Returns `None` if no match found.

- If a match is found, a `Match` object is returned.

In [9]:
pattern = re.compile("hello")
pattern.search("say hello")

<re.Match object; span=(4, 9), match='hello'>

### `findall(string[, pos[, endpos]])`

- Finds **all non-overlapping substrings** where the match is found, and returns them as a list.

- Same behaviour of `pos` and `endpos` as the `match()` and `search()` function.

In [12]:
pattern = re.compile("Hello",re.I)
pattern.findall("world hello")

['hello']

### `finditer(string[, pos[, endpos]])`

- Finds **all non-overlapping substrings** where the match is found, and returns them as an iterator of the `Match` objects.

- Same behaviour of `pos` and `endpos` as the `match()`, `search()` and `findall()` function.

In [13]:
pattern = re.compile("hello")
matches = pattern.finditer("say hello hello")

In [16]:
pattern = re.compile("hello")
matches = pattern.finditer("say hello hello")
for match in matches:
    print(match.span())

(4, 9)
(10, 15)


### Note:

It is not mandatory to create a `Pattern` object explicitly using `re.compile()` method in order to perform a regex operation.

You can direclty use the module level functions such as:
- `re.match(pattern, string, flags=0)`

- `re.search(pattern, string, flags=0)`

- `re.findall(pattern, string, flags=0)`

- `re.finditer(pattern, string, flags=0)`

and so on.

In a module level function, you can simply pass a **string** as your **regex pattern** as shown in the examples below.

### Important Example

Consider the example below:

In [17]:
txt = "This book costs $15."

Search for the pattern `$15`.

In [18]:
re.findall("$15", txt)

[]

### No match found. Why?

`$` is a metacharacter and has a special meaning for regex engine. Here, we want to treat it like a literal.

In order to treat a metacharacter like a literal, you need to **escape** it using `\` character.

In [29]:
re.findall("\$15", txt)

['$15']

In regular expressions, there are twelve metacharacters that should be escaped if they are to be used with their literal meaning:

- Backslash `\`
- Caret `^`
- Dollar sign `$`
- Dot `.`
- Pipe symbol `|`
- Question mark `?`
- Asterisk `*`
- Plus sign `+`
- Opening parenthesis `(`
- Closing parenthesis `)`
- Opening square bracket `[`
- The opening curly brace `{`

# 3. Character Classes

- The **character classes** (also known as **character sets**) allow us to define a character that will match if any of the defined characters on the set is present.


- To define a character class, we should use the opening square bracket metacharacter `[`, then any accepted characters, and finally close with a closing square bracket `]`.

### Example 1

Consider an example below where we have messed up between `license` and `licence` spellings and want to find all occurances of `license`/`licence` in the text.

In [19]:
txt = """
Yesterday, I was driving my car without a driving licence. The traffic police stopped me and asked me for my 
license. I told them that I forgot my licence at home. 
"""

In [31]:
re.findall("licen[cs]e",txt)

['licence', 'license', 'licence']

# Character Set Range

> It is possible to also use the range of a character. This is done by leveraging the hyphen symbol (-) between two related characters; for example, to match any lowercase letter we can use `[a-z]`. Likewise, to match any single digit we can define the character set `[0-9]`.

Let us consider an example in which we want to retrieve all the years from the given text.

In [23]:
txt = """
The first season of Indian Premiere League (IPL) was played in 2008. 
The second season was played in 2009 in South Africa. 
Last season was played in 2018 and won by Chennai Super Kings (CSK).
CSK won the title in 2010 and 2011 as well.
Mumbai Indians (MI) has also won the title 3 times in 2013, 2015 and 2017.
"""

pattern = re.compile("[1-9][0-9][0-9][0-9]")

pattern.findall(txt)

['2008', '2009', '2018', '2010', '2011', '2013', '2015', '2017']

> There is another possibility—the negation of ranges. We can invert the meaning
of a character set by placing a caret (`^`) symbol right after the opening square
bracket metacharacter (`[`).

For example, to find all the characters used in a text except vowels, we can use the pattern:

In [34]:
pattern = re.compile("[^aeiou]")
pattern.findall(txt)

['\n',
 'T',
 'h',
 ' ',
 'f',
 'r',
 's',
 't',
 ' ',
 's',
 's',
 'n',
 ' ',
 'f',
 ' ',
 'I',
 'n',
 'd',
 'n',
 ' ',
 'P',
 'r',
 'm',
 'r',
 ' ',
 'L',
 'g',
 ' ',
 '(',
 'I',
 'P',
 'L',
 ')',
 ' ',
 'w',
 's',
 ' ',
 'p',
 'l',
 'y',
 'd',
 ' ',
 'n',
 ' ',
 '2',
 '0',
 '0',
 '8',
 '.',
 ' ',
 '\n',
 'T',
 'h',
 ' ',
 's',
 'c',
 'n',
 'd',
 ' ',
 's',
 's',
 'n',
 ' ',
 'w',
 's',
 ' ',
 'p',
 'l',
 'y',
 'd',
 ' ',
 'n',
 ' ',
 '2',
 '0',
 '0',
 '9',
 ' ',
 'n',
 ' ',
 'S',
 't',
 'h',
 ' ',
 'A',
 'f',
 'r',
 'c',
 '.',
 ' ',
 '\n',
 'L',
 's',
 't',
 ' ',
 's',
 's',
 'n',
 ' ',
 'w',
 's',
 ' ',
 'p',
 'l',
 'y',
 'd',
 ' ',
 'n',
 ' ',
 '2',
 '0',
 '1',
 '8',
 ' ',
 'n',
 'd',
 ' ',
 'w',
 'n',
 ' ',
 'b',
 'y',
 ' ',
 'C',
 'h',
 'n',
 'n',
 ' ',
 'S',
 'p',
 'r',
 ' ',
 'K',
 'n',
 'g',
 's',
 ' ',
 '(',
 'C',
 'S',
 'K',
 ')',
 '.',
 '\n',
 'C',
 'S',
 'K',
 ' ',
 'w',
 'n',
 ' ',
 't',
 'h',
 ' ',
 't',
 't',
 'l',
 ' ',
 'n',
 ' ',
 '2',
 '0',
 '1',
 '0',
 ' ',
 'n',


# Predefined Character Classes

There exist some predefined character classes which can be used as a shortcut for some frequently used classes.


<table style="border: 1px solid black; font-size:15px;">
<thead>
    <th>Element</th>
    <th>Description</th>
</thead>
    
<tbody>
<tr>
    <td>.</td>
    <td>This element matches any character except newline</td>
</tr>

<tr>
    <td>\d</td>
    <td>This matches any decimal digit; this is equivalent to the class [0-9]</td>
</tr>

<tr>
    <td>\D</td>
    <td>This matches any non-digit character; this is equivalent to the class [^0-9]</td>
</tr>

<tr>
    <td>\s</td>
    <td>This matches any whitespace character; this is equivalent to the class
[ \t\n\r\f\v]</td>
</tr>

<tr>
    <td>\S</td>
    <td>This matches any non-whitespace character; this is equivalent to the class
[^ \t\n\r\f\v]</td>
</tr>

<tr>
    <td>\w</td>
    <td>This matches any alphanumeric character; this is equivalent to the class
[a-zA-Z0-9_]</td>
</tr>
    
<tr>
    <td>\W</td>
    <td>This matches any non-alphanumeric character; this is equivalent to the
class [^a-zA-Z0-9_]</td>
</tr>
</tbody>
</table>


Now, we can improve our pattern to find years in a given text a bit:

In [35]:
pattern = re.compile("[1-9]\d\d\d")

In [36]:
pattern.findall(txt)

['2008', '2009', '2018', '2010', '2011', '2013', '2015', '2017']

# Alteration

Just like character classes are used to match a single character out of several possible characters, **alternation** is used to match a single regular expression out of several possible regular expressions.

This is accomplished using the pipe symbol `|`.

Consider a scenario where you want to find all occurances of `and`, `or`, `the` in a given text.

> One way is to write and execute 3 separate regular expressions. Using alteration, it can be done in a single regular expression!

In [27]:
txt = """
the most common conjunctions are and, or and but.
"""
pattern = re.compile("and|or|the")
pattern.findall(txt)

['the', 'and', 'or', 'and']

# Quantifiers

**Quantifiers** are the mechanisms to define how a **character**, **metacharacter**, or **character set** can be **repeated**.

Here is the list of 4 basic quantifers:

<table style="border: 1px solid black; font-size:15px;">
<thead>
    <th>Symbol</th>
    <th>Name</th>
    <th>Quantification of previous character</th>
</thead>
    
<tbody>
<tr>
    <td>?</td>
    <td>Question Mark</td>
    <td>Optional (0 or 1 repetitions)</td>
</tr>
    
<tr>
    <td>*</td>
    <td>Asterisk</td>
    <td>Zero or more times</td>
</tr>

<tr>
    <td>+</td>
    <td>Plus Sign</td>
    <td>One or more times</td>
</tr>

<tr>
    <td>{n,m}</td>
    <td>Curly Braces</td>
    <td>Between n and m times</td>
</tr>
</tbody>
</table>


Let us go through different examples to understand them one by one.

### Example 1

Find all the matches for `dog` and `dogs` in the given text.

In [30]:
txt = """
I have 2 dogs. One dog is 1 year old and other one is 2 years old. Both dogs are very cute! 
"""
pattern = re.compile("dogs?") #s is repeated 0 or once
pattern.findall(txt)

['dogs', 'dog', 'dogs']

### Example 2

Find all filenames starting with `file` and ending with `.txt` in the given text.

In [49]:
txt = """
file1.txt
file_one.txt
file.txt
fil.txt
file.xml
file-1.txt
"""

In [54]:
pattern = re.compile("file[\w-]*\.txt")

In [55]:
pattern.findall(txt)

['file1.txt', 'file_one.txt', 'file.txt', 'file-1.txt']

### Example 3

Find all filenames starting with `file` followed by 1 or more digits and ending with `.txt` in the given text.

In [10]:
txt = """
file1.txt
file_one.txt
file09.txt
fil.txt
file23.xml
file.txt
"""

In [11]:
pattern = re.compile("file\d+\.txt")

In [12]:
pattern.findall(txt)

['file1.txt', 'file09.txt']

We can use the curly brackets syntax here with these modifications:

<table style="border: 1px solid black; font-size:15px;">
<thead>
    <th>Syntax</th>
    <th>Description</th>
</thead>
    
<tbody>
<tr>
    <td>{n}</td>
    <td>The previous character is repeated exactly n times.</td>
</tr>
    
<tr>
    <td>{n,}</td>
    <td>The previous character is repeated at least n times.</td>
</tr>

<tr>
    <td>{,n}</td>
    <td>The previous character is repeated at most n times.</td>
</tr>

<tr>
    <td>{n,m}</td>
    <td>The previous character is repeated between n and m times (both inclusive).</td>
</tr>
</tbody>
</table>

### Example 4

Find years in the given text.


In [14]:
txt = """
The first season of Indian Premiere League (IPL) was played in 2008. 
The second season was played in 2009 in South Africa. 
Last season was played in 2018 and won by Chennai Super Kings (CSK).
CSK won the title in 2010 and 2011 as well.
Mumbai Indians (MI) has also won the title 3 times in 2013, 2015 and 2017.
"""

In [15]:
pattern = re.compile("\d{4}")

In [16]:
pattern.findall(txt)

['2008', '2009', '2018', '2010', '2011', '2013', '2015', '2017']

### Example 5

In the given text, filter out all 4 or more digit numbers.

In [17]:
txt = """
123143
432
5657
4435
54
65111
"""

In [18]:
pattern = re.compile("\d{4,}")

In [19]:
re.findall(pattern, txt)

['123143', '5657', '4435', '65111']

### Example 6

Write a pattern to validate telephone numbers.

Telephone numbers can be of the form: `555-555-5555`, `555 555 5555`, `5555555555`

In [20]:
txt = """
555-555-5555
555 555 5555
5555555555
"""

In [21]:
pattern = re.compile("\d{3}[-\s]?\d{3}[-\s]?\d{4}")

In [22]:
pattern.findall(txt)

['555-555-5555', '555 555 5555', '5555555555']

# Boundary Matchers

Consider a scenario where you want to find all occurances of `and`, `or` and `the` in the given text.

In [31]:
txt = """
Lorem Ipsum is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, 
when an unknown printer took a galley of type and scrambled it to make a type specimen book. 
It has survived not only five centuries, but also the leap into electronic typesetting, 
remaining essentially unchanged. 
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, 
and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
"""

In [32]:
pattern = re.compile("and|or|the")

In [33]:
pattern.findall(txt)

['or',
 'the',
 'and',
 'or',
 'the',
 'and',
 'the',
 'and',
 'the',
 'the',
 'the',
 'or',
 'and',
 'or',
 'or']

There is a slight problem with the above pattern. `and`, `or`, `the` inside the words are also counted as a match where as we want to find individual strings containing `and`, `or`, `the` only.

### What is the solution?

Solution is to use this pattern:

`\b(and|or|the)\b`

where `\b` is a metacharacter that matches at a position that is called a **word boundary**. 

Such identifiers that correspond to a particular position inside of the input are called **Boundary Matchers**.

**Note:** Since `\b` is also an escape sequence for strings in Python, we need to escape it using `\`, i.e. `\\b`,  in order to treat it like a metacharacter for regex matching.

In [6]:
pattern = re.compile("\\b(and|or|the)\\b")

In [7]:
highlight_regex_matches(pattern, txt)


Lorem Ipsum is simply dummy text of [43m[1mthe[0m printing [43m[1mand[0m typesetting industry. 
Lorem Ipsum has been [43m[1mthe[0m industry's standard dummy text ever since [43m[1mthe[0m 1500s, 
when an unknown printer took a galley of type [43m[1mand[0m scrambled it to make a type specimen book. 
It has survived not only five centuries, but also [43m[1mthe[0m leap into electronic typesetting, 
remaining essentially unchanged. 
It was popularised in [43m[1mthe[0m 1960s with [43m[1mthe[0m release of Letraset sheets containing Lorem Ipsum passages, 
[43m[1mand[0m more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.



Here is a table which shows the list of all boundary matchers available in Python:

<table style="border: 1px solid black; font-size:15px;">
<thead>
    <th>Matcher</th>
    <th>Description</th>
</thead>
    
<tbody>
<tr>
    <td>^</td>
    <td>Matches at the beginning of a line</td>
</tr>
    
<tr>
    <td>$</td>
    <td>Matches at the end of a line</td>
</tr>

<tr>
    <td>\b</td>
    <td>Matches a word boundary</td>
</tr>

<tr>
    <td>\B</td>
    <td>Matches the opposite of \b. Anything that is not a word boundary</td>
</tr>

<tr>
    <td>\A</td>
    <td>Matches the beginning of the input</td>
</tr>

<tr>
    <td>\Z</td>
    <td>Matches the end of the input</td>
</tr>
</tbody>
</table>

### Example 1

Consider a scenario where we want to find all the lines in the given text which **start** with the pattern `Name:`.

In [59]:
txt = """
Name:
Age: 0
Roll No.: 15
Grade: S

Name: Ravi
Age: -1
Roll No.: 123 Name: ABC
Grade: K

Name: Ram
Age: N/A
Roll No.: 1
Grade: G
"""

In [60]:
import re
pattern = re.compile("^Name: \w+", flags=re.M)

In [61]:
pattern.findall(txt)

['Name: Ravi', 'Name: Ram']

> `re.M` (short for `re.MULTILINE`) is a flag which is used to make begin/end `(^, $)` consider each line.

### Example 2

Find all the sentences which do not end with a full stop (`.`) in the given text.

In [11]:
txt = """
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s!
It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages
More recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""

In [12]:
pattern = re.compile("^.+[^\.]$", flags=re.M)

In [13]:
pattern.findall(txt)

["Lorem Ipsum has been the industry's standard dummy text ever since the 1500s!",
 'It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages']

In [14]:
highlight_regex_matches(pattern, txt)


Lorem Ipsum is simply dummy text of the printing and typesetting industry.
[43m[1mLorem Ipsum has been the industry's standard dummy text ever since the 1500s![0m
It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
[43m[1mIt was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages[0m
More recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.


# Split using RegEx

> In almost every language, you can find the split operation in strings. The big difference is that the split in the `re` module is more powerful due to which you can use a regex. So, in this case, the string is split based on the matches of the pattern.

### `split(string[, maxsplit])`

- Every pattern object has a `split()` method which splits the input string at all positions where a match is found.

- `maxsplit` is an optional argument (default value 0) which specifies the max no. of splits that can take place. `0` value means there is no limit on the no. of splits.

- Pattern match is not included in any of the substrings obtained after splitting.

#### Example 1

Let us try to split a string to get individual lines in it.

In [1]:
import re

In [2]:
txt = """Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated."""

In [3]:
pattern = re.compile("\n")

In [4]:
pattern.split(txt)

['Beautiful is better than ugly.',
 'Explicit is better than implicit.',
 'Simple is better than complex.',
 'Complex is better than complicated.']

#### Example 2

Let us try one more example in which we want to get all the words in the given text.

In [5]:
pattern = re.compile("\W")

In [6]:
pattern.split(txt)

['Beautiful',
 'is',
 'better',
 'than',
 'ugly',
 '',
 'Explicit',
 'is',
 'better',
 'than',
 'implicit',
 '',
 'Simple',
 'is',
 'better',
 'than',
 'complex',
 '',
 'Complex',
 'is',
 'better',
 'than',
 'complicated',
 '']

#### Example 3

What is we want only first 3 words? We need to split only 3 times in this case, which can be done by setting the value of `maxsplit` as 3.

In [7]:
pattern.split(txt, maxsplit=3)

['Beautiful',
 'is',
 'better',
 'than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.']

# Substitution

Now, we are going to look at a method which will replace all the **leftmost non-overlapping occurrences** of a pattern in a given string and return the new string as result.

### `sub(repl, string[, count=0])`

- `repl` is the replacement string which gets substituted in the place of match

- `string` is the input text on which substitution takes place.

- `count` is an optional argument (default is 0) which specifies the max no. of substitutions that can take place.  0 means there is no limit on substitution count.


Let us consider a case where we want to replace all occurances of numbers with a `-` in the given text.

In [1]:
import re

In [2]:
txt = "100 cats, 23 dogs, 3 rabbits"

In [3]:
pattern = re.compile("\d+")

In [4]:
pattern.sub("-", txt)

'- cats, - dogs, - rabbits'

### `subn(repl, string[, count=0])`

- Returns the substituted string as well as the no. of substitutions.

- Can be thought of as a utility function over `sub()`.

In [5]:
pattern.subn("-", txt)

('- cats, - dogs, - rabbits', 3)

# Grouping

> Frequently you need to obtain more information than just whether the regex pattern matched or not.

By placing part of a regular expression inside round brackets or parentheses `(`, `)`, you can **group that part** of the regex pattern together.

### Applications of grouping:

#### 1. apply a quantifier to the entire group.

For example, `(ab)+` will match one or more repetitions of `ab`.

In [1]:
import re
from utils import highlight_regex_matches

In [2]:
txt = "abbbbbabbbb"

In [3]:
pattern1 = re.compile("ab+")
pattern2 = re.compile("(ab)+")

In [4]:
highlight_regex_matches(pattern1, txt)

[43m[1mabbbbb[0m[43m[1mabbbb[0m


In [5]:
highlight_regex_matches(pattern2, txt)

[43m[1mab[0mbbbb[43m[1mab[0mbbb


#### 2. restrict alternation to part of the regex.

For example, `my name is ram|sam` will match `my name is ram` and `sam` whereas `my name is (ram|sam)` will match `my name is ram` and `my name is sam`.

#### 2. restrict alternation to part of the regex.

For example, `my name is ram|sam` will match `my name is ram` and `sam` whereas `my name is (ram|sam)` will match `my name is ram` and `my name is sam`.

In [72]:
txt = """
my name is ram
my name is sam
"""

In [73]:
pattern1 = re.compile("my name is ram|sam")
pattern2 = re.compile("my name is (ram|sam)")

In [8]:
highlight_regex_matches(pattern1, txt)


[43m[1mmy name is ram[0m
my name is [43m[1msam[0m



In [9]:
highlight_regex_matches(pattern2, txt)


[43m[1mmy name is ram[0m
[43m[1mmy name is sam[0m

