# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [2]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [3]:
text = "This is going to be a sentence with a good number of vowels in it."

In [6]:
pattern1 = '[aeiou]' # Any "e" or "r" will give a match
re.findall(pattern1, text)

['i', 'e', 'e', 'e', 'a', 'o', 'o', 'a', 'i', 'e']

### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [11]:
text2 = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [16]:
pattern2 = 'pupp[a-z]+'
re.findall(pattern2, text2)

['puppy', 'puppies', 'puppy']

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [17]:
text3 = "I ran the relay race the only way I knew how to run it."

In [18]:
pattern3 = 'r[a-z]n'
re.findall(pattern3, text3)

['ran', 'run']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [25]:
pattern4 = '\\br[a-z]+'
re.findall(pattern4, text3)

['ran', 'relay', 'race', 'run']

### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [27]:
text5 = "Th!s !s a sentence w!th spec!al characters !n !t."

In [29]:
pattern5 = '\!'
re.findall(pattern5, text5)
re.sub(pattern5, 'i', text5)

'This is a sentence with special characters in it.'

### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [30]:
text6 = "This sentence has words of varying lengths."

In [31]:
pattern6 = "[\w']{5,}"
print(re.findall(pattern6, text6))

['sentence', 'words', 'varying', 'lengths']


### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [32]:
text7 = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [42]:
pattern7 = '\\bb\w+t\\b'
re.findall(pattern7, text7)

['bet', 'beat', 'bot', 'bat', 'but', 'bit']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [35]:
text8 = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."

In [37]:
pattern8 = r'\w*ea\w*|\w*eo\w*'
re.findall(pattern8, text8)

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']


### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [43]:
text9 = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [44]:
pattern9 = '[A-Z][a-z]+'
re.findall(pattern9, text9)

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']

### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [45]:
pattern10 = '([A-Z][a-z]+ ?[A-Z][a-z]+)'
re.findall(pattern10, text9)

['Teddy Roosevelt', 'Abraham Lincoln']

### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [46]:
text11 = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'


In [60]:
pattern11 = '"([^"]*)"'
re.findall(pattern11, text11)

['I will bet you $50 I can get the bartender to give me a free drink.',
 'I am in!']

### 12. Use a regular expression to find and extract all the numbers from the text below.

In [61]:
text12 = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [63]:
pattern12 = '\d{1,}'
#pattern12 = '\d' # extract every single number
re.findall(pattern12, text12)

['30', '30', '14', '16', '10']

### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [64]:
text13 = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [65]:
pattern13 = '\d{3}-\d{2}-\d{4}'
re.findall(pattern13, text13)

['876-93-2289', '098-32-5295']

### 14. Use a regular expression to find and extract all the phone numbers from the previous text.

In [68]:
pattern14 = '(\(\d{3}\)\d{3}-\d{4})'
re.findall(pattern14,text13)

['(847)789-0984', '(987)222-0901']

### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text above.

In [74]:
pattern15 = '(\d{3}-\d{2}-\d{4})|(\(\d{3}\)\d{3}-\d{4})'
re.findall(pattern15, text13)

[('876-93-2289', ''),
 ('', '(847)789-0984'),
 ('098-32-5295', ''),
 ('', '(987)222-0901')]

In [76]:
result15 = [i for j in re.findall(pattern15, text13) for i in j if i != '']
result15

['876-93-2289', '(847)789-0984', '098-32-5295', '(987)222-0901']