## Regular Expressions (Regex) Exercises

In [1]:
# Imports
import pandas as pd
import re

##### 1. Write a function named is_vowel. It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

In [2]:
def is_vowel(string):
    regex = r'^[aeiouAEIOU]$'
    return bool(re.search(regex, string))

print(is_vowel("A"))
print(is_vowel("a"))
print(is_vowel("X"))
print(is_vowel("aeiou")) #multiple vowels

True
True
False
False


##### 2. Write a function named is_valid_username that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character. It should also be no longer than 32 characters. The function should return either True or False depending on whether the passed string is a valid username.

In [3]:
def is_valid_username(string):
    return bool(re.search(r'^[a-z][a-z0-9_]{,31}$', string))

print(is_valid_username("password123"))
print(is_valid_username("PASSWORD123")) #invalid uppercase characters
print(is_valid_username("abcdefghijklmnopqrstuvwxyz1234567890")) #too long
print(is_valid_username("password!@#$%")) #invalid special characters

True
False
False
False


##### 3. Write a regular expression to capture phone numbers. It should match all of the following:
(210) 867 5309  
+1 210.867.5309  
867-5309  
210-867-5309  

In [4]:
phone_number = re.compile(r'/(?:(\+1)[ -])?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})')

df = pd.DataFrame()
df['Phone_Number'] = [
    '(210) 867 5309',
    '+1 210.867.5309',
    '867-5309',
    '210-867-5309']
df

Unnamed: 0,Phone_Number
0,(210) 867 5309
1,+1 210.867.5309
2,867-5309
3,210-867-5309


##### 4. Use regular expressions to convert the dates below to the standardized year-month-day format.
02/04/19  
02/05/19  
02/06/19  
02/07/19  
02/08/19  
02/09/19  
02/10/19  

In [5]:
dates = [
    '02/04/19',
    '02/05/19',
    '02/06/19',
    '02/07/19',
    '02/08/19',
    '02/09/19',
    '02/10/19',
]

dates = [re.sub(r'(\d+)/(\d+)/(\d+)', r'20\3-\1-\2', date) for date in dates]
dates

['2019-02-04',
 '2019-02-05',
 '2019-02-06',
 '2019-02-07',
 '2019-02-08',
 '2019-02-09',
 '2019-02-10']

##### 5. Write a regex to extract the various parts of these logfile lines:

GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58

##### 6. You can find a list of words on your mac at /usr/share/dict/words. Use this file to answer the following questions:

In [6]:
dictionary_words = pd.read_csv('/usr/share/dict/words', header=None, squeeze=True).dropna()
dictionary_words = dictionary_words.str.lower()
dictionary_words

0                  a
1                  a
2                 aa
3                aal
4              aalii
             ...    
235881        zythem
235882        zythia
235883        zythum
235884       zyzomys
235885    zyzzogeton
Name: 0, Length: 235884, dtype: object

- How many words have at least 3 vowels?

In [7]:
at_least_3_vowels = (dictionary_words.str.count(r"[aeiou]")  >= 3)
at_least_3_vowels.sum()

191365

- How many words have at least 3 vowels in a row?

In [8]:
dictionary_words.str.count(r"[aeiou]{3}").sum()

6251

- How many words have at least 4 consonants in a row?

In [9]:
dictionary_words.str.count(r'[^aeiou]{4,}').value_counts()

0    216643
1     18881
2       360
Name: 0, dtype: int64

- How many words start and end with the same letter

In [10]:
import warnings
warnings.filterwarnings("ignore")
dictionary_words.str.contains(r'^(.).*\1$').sum()

11452

- How many words start and end with a vowel?

In [11]:
dictionary_words.str.contains(r'^[aeiou].*[aeiou]$').sum()

14657

- How many words contain the same letter 3 times in a row?

In [12]:
dictionary_words[dictionary_words.str.contains(r'(.)\1\1')]

24988             bossship
50636      demigoddessship
78498          goddessship
82997     headmistressship
140481       patronessship
230262            wallless
231688           whenceeer
Name: 0, dtype: object

- What other interesting patterns in words can you find?

In [13]:
# All the words that have "i before e, except after c"
dictionary_words[dictionary_words.str.contains(r"cei")]

11795       apperceive
12514     archdeceiver
28764       calceiform
32143            ceiba
32144            ceibo
              ...     
221429     unreceipted
221430    unreceivable
221431      unreceived
221432     unreceiving
225782       urceiform
Name: 0, Length: 156, dtype: object