# Word Boundaries

We will now learn about another special sequence that you can create using the backslach:

* **\b** 

This special sequence doesn't really match a particular set of characters, but rather determines word boundaries. A word in this context is defined as a sequence of alphanumeric characters, while a boundary is defined as a whitespace, a non-alphanumeric character, or the beginning or end of a string. We can have boundaries either before or after a word. Let's see how this works with an example.

In the code below we have a sentence that contains the word `class` in three different positions:

1. As a stand alone word, `class` with whitespaces both before and after the word.
2. At the beginning of a word, `classroom`, with a whitespace before the word.
3. At the end of a word, `subclass`, with a whitespace after the word.

If we use `class` as our regular expression, we will match the word `class` in all three postions as shown in the code below:

In [1]:
import re

sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

regex = re.compile(r'class')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>
<_sre.SRE_Match object; span=(47, 52), match='class'>
<_sre.SRE_Match object; span=(85, 90), match='class'>


We can see that we have three matches. Now, let's use \\b to only find the word `class` when it has a boundary directly before it:

In [2]:
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

regex = re.compile(r'\bclass')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>
<_sre.SRE_Match object; span=(47, 52), match='class'>


We can see that now we only have two matches. It's matching the standalone word, `class`, and the `class` in `classroom` because both have a white space directly before it. We can also see that it is not matching the `class` in `subclass` because there is no word boundary directly before it. 

Now, let's use \\b to only find the word `class` when it has a boundary directly after it:

In [3]:
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

regex = re.compile(r'class\b')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>
<_sre.SRE_Match object; span=(85, 90), match='class'>


We can see that now we have two matches as well. It's matching the standalone word, `class`, again and the `class` in `subclass` because both have a white space directly after it. We can also see that it is not matching the `class` in `classroom` because there is no word boundary directly after it.

Now, let's use \\b to only find the word `class` when it has a boundary directly before and after:

In [4]:
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

regex = re.compile(r'\bclass\b')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>


We can see that now we have only one match. This match corresponds to the standalone word, `class`, becuase is the only word in the sentence that has both a whitespace before and after it. 

# TODO: Find all 3-letter Words

In the code below, write a regular expression that can match all 3-letter words in the `sample_text`. Then use the `.finditer()` method to find the regex in the `sample_text` string. Then, write a loop to print the `matches`. 

In [5]:
# import re module
import re

sample_text = 'John went to the store in his car, but forgot to buy bread.'

# Write a regex that matches all 3-letter words
regex = re.compile(r'\b\w\w\w\b')

# Use the .finditer method to find the above regex
matches = regex.finditer(sample_text)

# Write a loop to print the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(13, 16), match='the'>
<_sre.SRE_Match object; span=(26, 29), match='his'>
<_sre.SRE_Match object; span=(30, 33), match='car'>
<_sre.SRE_Match object; span=(35, 38), match='but'>
<_sre.SRE_Match object; span=(49, 52), match='buy'>


As with the other special sequences that we saw before, we also the uppercase version of \\b:

* **\B** 

This sequence does the opposite of \\b, only matching when the current position is not at a word boundary. Let's see how this works:

Let's use \\B to only find the word `class` when it doesn't have a boundary directly before it:

In [6]:
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

regex = re.compile(r'\Bclass')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

<_sre.SRE_Match object; span=(85, 90), match='class'>


We can see that we only have ne match. This is because the `class` in `subclass` it's the only one that doesn't have a boundary directly before it. 

Similarly, Let's use \\B to only find the word `class` when it doesn't have a boundary directly after it:

In [7]:
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

regex = re.compile(r'class\B')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

<_sre.SRE_Match object; span=(47, 52), match='class'>


We can see that again we only have one match. This is because the `class` in `classroom` it's the only one that doesn't have a boundary directly after it. 

Finally, Let's use \\B to only find the word `class` when it doesn't have a boundary directly after or before it:

In [8]:
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

regex = re.compile(r'\Bclass\B')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

We can see that we have no mathces in this case. This is because all instances of the word `class` in our sentence, have a boundary either before or after it. It order to have a match in this case, the word `class` will have to be in the middle of a word, such as in the word `declassified`. Let's see an example:

In [9]:
sample_text = 'declassified'

regex = re.compile(r'\Bclass\B')

matches = regex.finditer(sample_text)

for match in matches:
    print(match)

<_sre.SRE_Match object; span=(2, 7), match='class'>


# TODO: Finding Last Digits

In the code below, we have a string with some numbers. Write a code that can count how many numbers in this string, greater than 3, have 3 as their last digit. For example, 93 is greater than 3 and its last digit is 3, so your code should count this number as a match. However, the number 3 by itself should not be counted as a match. Write a regular expression to help you do this. Then use the `.finditer()` method to find the regex in the `sample_text` string. Finally, write a loop to print and count the number of `matches`. 

In [10]:
# import re module
import re

sample_text = '203 3 403 687 283 234 983 345 23 3 74 978'

# Write a regex that matches 3 at the end of a number greater than 3
regex = re.compile(r'\B3\b')

# Use the .finditer method to find the above regex
matches = regex.finditer(sample_text)

num_matches = 0

# Write a loop to print the matches
for match in matches:
    print(match)
    num_matches += 1
    
# Print the total number of matches    
print('\nTotal Number of Matches:', num_matches)

<_sre.SRE_Match object; span=(2, 3), match='3'>
<_sre.SRE_Match object; span=(8, 9), match='3'>
<_sre.SRE_Match object; span=(16, 17), match='3'>
<_sre.SRE_Match object; span=(24, 25), match='3'>
<_sre.SRE_Match object; span=(31, 32), match='3'>

Total Number of Matches: 5
