### Boundary Matchers
Consider a scenario where you want to find all occurances of and, or and the in the given text.

In [11]:
import re 
from colorama import Back, Style


def highlight_regex_matches(pattern, text, print_output=True):
	output = text
	len_inc = 0
	for match in pattern.finditer(text):
		start, end = match.start() + len_inc, match.end() + len_inc
		output = output[:start] + Back.YELLOW + Style.BRIGHT + output[start:end] + Style.RESET_ALL + output[end:]
		len_inc = len(output) - len(text)  

	if print_output:
		print(output)
	else:
		return output


In [12]:
txt = """
Lorem Ipsum is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, 
when an unknown printer took a galley of type and scrambled it to make a type specimen book. 
It has survived not only five centuries, but also the leap into electronic typesetting, 
remaining essentially unchanged. 
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, 
and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
"""

In [13]:
pattern = re.compile("and|or|the")


In [14]:
pattern.findall(txt)

['or',
 'the',
 'and',
 'or',
 'the',
 'and',
 'the',
 'and',
 'the',
 'the',
 'the',
 'or',
 'and',
 'or',
 'or']

In [15]:
highlight_regex_matches(pattern,txt)


L[43m[1mor[0mem Ipsum is simply dummy text of [43m[1mthe[0m printing [43m[1mand[0m typesetting industry. 
L[43m[1mor[0mem Ipsum has been [43m[1mthe[0m industry's st[43m[1mand[0mard dummy text ever since [43m[1mthe[0m 1500s, 
when an unknown printer took a galley of type [43m[1mand[0m scrambled it to make a type specimen book. 
It has survived not only five centuries, but also [43m[1mthe[0m leap into electronic typesetting, 
remaining essentially unchanged. 
It was popularised in [43m[1mthe[0m 1960s with [43m[1mthe[0m release of Letraset sheets containing L[43m[1mor[0mem Ipsum passages, 
[43m[1mand[0m m[43m[1mor[0me recently with desktop publishing software like Aldus PageMaker including versions of L[43m[1mor[0mem Ipsum.



There is a slight problem with the above pattern. and, or, the inside the words are also counted as a match where as we want to find individual strings containing and, or, the only.

### What is the solution?
Solution is to use this pattern:

\b(and|or|the)\b

where \b is a metacharacter that matches at a position that is called a word boundary.

Such identifiers that correspond to a particular position inside of the input are called Boundary Matchers.

Note: Since \b is also an escape sequence for strings in Python, we need to escape it using \, i.e. \\b, in order to treat it like a metacharacter for regex matching.

In [16]:
pattern = re.compile ("\\b(and|or|the)\\b")

In [17]:
highlight_regex_matches(pattern,txt)


Lorem Ipsum is simply dummy text of [43m[1mthe[0m printing [43m[1mand[0m typesetting industry. 
Lorem Ipsum has been [43m[1mthe[0m industry's standard dummy text ever since [43m[1mthe[0m 1500s, 
when an unknown printer took a galley of type [43m[1mand[0m scrambled it to make a type specimen book. 
It has survived not only five centuries, but also [43m[1mthe[0m leap into electronic typesetting, 
remaining essentially unchanged. 
It was popularised in [43m[1mthe[0m 1960s with [43m[1mthe[0m release of Letraset sheets containing Lorem Ipsum passages, 
[43m[1mand[0m more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.



Here is a table which shows the list of all boundary matchers available in Python:

Matcher         	Description
* ^	Matches at the beginning of a line
* $	Matches at the end of a line
* \b	Matches a word boundary
* \B	Matches the opposite of \b. Anything that is not a word boundary
* \A	Matches the beginning of the input
* \Z	Matches the end of the input

### Example 1
Consider a scenario where we want to find all the lines in the given text which start with the pattern Name:.


In [30]:
txt = """
Name: 123
Age: 0
Roll No.: 15
Grade: S

Name: jayesh123
Age: 0
Roll No.: 15
Grade: S

Name: Ravi
Age: -1
Roll No.: 123 Name: ABC
Grade: K

Name: Ram
Age: N/A
Roll No.: 1
Grade: G
"""

In [31]:
pattern = re.compile("^Name: \w+",flags=re.M)

In [32]:
pattern.findall(txt)

['Name: 123', 'Name: jayesh123', 'Name: Ravi', 'Name: Ram']

re.M (short for re.MULTILINE) is a flag which is used to make begin/end (^, $) consider each line.

### Example 2
Find all the sentences which do not end with a full stop (.) in the given text.

In [33]:
txt = """
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s!
It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages
More recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""


In [34]:
pattern = re.compile("^.+[^\.]$", flags=re.M)


In [35]:
pattern.findall(txt)

["Lorem Ipsum has been the industry's standard dummy text ever since the 1500s!",
 'It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages']