# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [1]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [249]:
text = "This is going to be a sentence with a good number of vowels in it."

In [251]:
# Recorremos todo el string y almacenamos en una variable todas las vocales
vowels = [character for character in text if re.match(r"[aeiou]", character)]
vowels

['i',
 'i',
 'o',
 'i',
 'o',
 'e',
 'a',
 'e',
 'e',
 'e',
 'i',
 'a',
 'o',
 'o',
 'u',
 'e',
 'o',
 'o',
 'e',
 'i',
 'i']

### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [226]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [227]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todos los elementos que sean iguales a puppy or puppies
puppies = [word for word in text.split(" ") if re.match(r"pupp[y|ies]",word)]
puppies

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [191]:
text = "I ran the relay race the only way I knew how to run it."

In [192]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todos los elementos que sean iguales a run o ran
run_tenses = [word for word in text.split(" ") if re.match(r"r[u|a]n",word)]
run_tenses

['ran', 'run']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [28]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todos los elementos que empiecen por r 
word_starting_with_r = [word for word in text.split(' ') if re.match(r"[r][a-z]+", word)]
word_starting_with_r

['ran', 'relay', 'race', 'run']

### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [41]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [43]:
# Recorremos el texto y sustituímos todos los caracteres de admiración por i
not_amazed_text = re.sub('!',  'i', text)
not_amazed_text

'This is a sentence with special characters in it.'

### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [213]:
text = "This sentence has words of varying lengths."

In [215]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todos los elementos cuya longitud sea superior a 4, por eso cojo 5
# Previamente hemos eliminado el punto en las palabras que lo tienen
four_range_words = [word.replace('.','') for word in text.split(' ') if re.findall(r'\w{5}', word)]
four_range_words

['sentence', 'words', 'varying', 'lengths']

### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [218]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [219]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todas las palabras que empiecen por b y acaben con t
# Previamente hemos eliminado el punto y las comas en las palabras que lo tienen
b_blablabla_t = [word.replace('.','').replace(',','') for word in text.split(' ') if re.findall(r'b\w+t', word)]
b_blablabla_t 

['bet', 'robot', 'beat', 'bot', 'bat', 'but', 'bit']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [229]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."


In [231]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todas las palabras que contengan la secuencia eo o ea
e_groups = [word for word in text.split(' ') if re.findall(r'e[a|o]', word)]
e_groups

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']

### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [113]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [83]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todas las palabras que empiecen por mayúscula
capitalized_words = [character for character in text.split(' ') if re.match(r'[A-Z][a-z]', character)]
capitalized_words

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']

### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [121]:
# Buscamos los grupos que contengan un conjunto de letras que empiecen por mayúscula, 
# siga un espacio y el siguiente conjunto empiece por mayúscula
consecutive_capitalized_words = re.findall(r'[A-Z]\w+\s[A-Z]\w+', text)
consecutive_capitalized_words



['Teddy Roosevelt', 'Abraham Lincoln']

### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [244]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'


In [245]:
# Buscamos los elementos que estén entrecomillados
# Se pone el interrogante para que encuentre la comilla final más cercana
quotes = re.findall(r'".*?"', text)
quotes

['"I will bet you $50 I can get the bartender to give me a free drink."',
 '"I am in!"']

### 12. Use a regular expression to find and extract all the numbers from the text below.

In [247]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [248]:
# Pasamos el string a lista
# Recorremos la lista y buscamos todas elememtos que sean números

numbers = [character for character in text.split(' ') if re.match(r'[0-9]', character)] 
numbers

['30', '30', '14', '16', '10']

### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [146]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [151]:
# Buscamos los conjuntos que cumplan la regla de los social security numbers
social_security_numbers = re.findall(r'\d{3}-\d{2}-\d{4}',text)
social_security_numbers

['876-93-2289', '098-32-5295']

### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [152]:
# Buscamos los conjuntos que cumplan la regla de los phone numbers
phone_numbers = re.findall(r'\(\d{3}\)\d{3}-\d{4}',text)
phone_numbers

['(847)789-0984', '(987)222-0901']

### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [153]:
# Buscamos los conjuntos que cumplan la regla de los social security numbers y los phone numbers
all_formatted_numbers = re.findall(r'\d{3}-\d{2}-\d{4}|\(\d{3}\)\d{3}-\d{4}',text)
all_formatted_numbers

['876-93-2289', '(847)789-0984', '098-32-5295', '(987)222-0901']