# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [4]:
import re

* https://www.programiz.com/python-programming/regex

##### Test regex
* https://www.regexpal.com/97161

#### search or match?
* https://docs.python.org/3/library/re.html#matching-vs-searching

#### Regex cheatsheet
* https://www.debuggex.com/cheatsheet/regex/python

### 1. Use a regular expression to find and extract all vowels in the following text.

In [2]:
text = "This is going to be a sentence with a good number of vowels in it."

In [3]:
regex = '[aeiou]'

In [4]:
re.findall(regex,text)

['i',
 'i',
 'o',
 'i',
 'o',
 'e',
 'a',
 'e',
 'e',
 'e',
 'i',
 'a',
 'o',
 'o',
 'u',
 'e',
 'o',
 'o',
 'e',
 'i',
 'i']

### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [6]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [7]:
regex = 'pupp[yies]+'

In [8]:
re.findall(regex,text)

['puppy', 'puppies', 'puppy']

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [11]:
text = "I ran the relay race the only way I knew how to run it."

In [12]:
regex = 'r[au]n'

In [13]:
re.findall(regex,text)

['ran', 'run']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

###### In order to find a word we have to define the starting letter, in this case 'r', and than `\S` until the and of the word with `.*`

* \S = anything except a whitespace (newline, tab, space)
* .* = zero or more of anything but newline

In [14]:
regex = 'r\S*'

In [15]:
re.findall(regex,text)

['ran', 'relay', 'race', 'run']

### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [368]:
# text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [17]:
text = "This is a sentence with special characters in it."

In [22]:
regex = 'i'

In [23]:
"!".join(re.split(regex,text))

'Th!s !s a sentence w!th spec!al characters !n !t.'

In [45]:
# Another way to do this:
text = "This is a sentence with special characters in it."
pattern = "i"
replace = '!'
re.sub(pattern, replace, text)

'Th!s !s a sentence w!th spec!al characters !n !t.'

### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [39]:
text = "This sentence has words of varying lengths."

In [46]:
regex = '\w{4,}'

In [47]:
re.findall(regex,text)

['This', 'sentence', 'words', 'varying', 'lengths']

### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [48]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [49]:
regex = '[bst]'

In [50]:
re.findall(regex,text)

['b',
 't',
 't',
 'b',
 't',
 't',
 'b',
 't',
 't',
 't',
 'b',
 't',
 't',
 'b',
 't',
 'b',
 't',
 's',
 't',
 't',
 'b',
 't']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [51]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."


In [54]:
regex = '\w*ea\w*|\w*eo\w*'

In [55]:
re.findall(regex, text)

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']

### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [56]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [57]:
regex = '[A-Z]\w*'

In [58]:
re.findall(regex, text)

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']

### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [59]:
regex = '\w*[A-Z]{2,}\w*'


In [62]:
text = "BBeavers BTint aoij asop nfueifp KLOp in WYinodins"

In [63]:
re.findall(regex, text)

['BBeavers', 'BTint', 'KLOp', 'WYinodins']

### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [46]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'


In [16]:
#https://www.regextester.com/3269

regex = '("([^"]|"")*")'

In [21]:
quotation_extractor = re.findall(regex, text)

print(quotation_extractor)

quotation_extractor = [tup[0] for tup in quotation_extractor]

print("\n")
print(quotation_extractor)

[('"I will bet you $50 I can get the bartender to give me a free drink."', '.'), ('"I am in!"', '!')]


['"I will bet you $50 I can get the bartender to give me a free drink."', '"I am in!"']


In [49]:
#alternative: take (") then anything (.) one ore more times, but with minimun number of characters (?)


In [51]:
regex = '".+?"'

In [52]:
re.findall(regex, text)

['"I will bet you $50 I can get the bartender to give me a free drink."',
 '"I am in!"']

In [53]:
#alternative2: take (") then anything that is not (") {one or more times} (that istranslated by [^"]) then (")
regex = '"[^"]+"'

In [54]:
re.findall(regex, text)

['"I will bet you $50 I can get the bartender to give me a free drink."',
 '"I am in!"']

* https://www.regular-expressions.info/brackets.html

### 12. Use a regular expression to find and extract all the numbers from the text below.

In [23]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [22]:
regex = "\d+"

In [24]:
re.findall(regex, text)

['30', '30', '14', '16', '10']

### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [35]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [28]:
regex = '\d{3}-\d{2}-\d{4}'

In [29]:
re.findall(regex, text)

['876-93-2289', '098-32-5295']

### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [30]:
regex = '\(\d{3}\)\d{3}-\d{4}'

In [31]:
re.findall(regex, text)

['(847)789-0984', '(987)222-0901']

### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [43]:
regex = '\d{3}-\d{2}-\d{4}|\(\d{3}\)\d{3}-\d{4}'

In [44]:
re.findall(regex, text)

['876-93-2289', '(847)789-0984', '098-32-5295', '(987)222-0901']