# Regular Expressions Exercises

### Imports

In [1]:
import pandas as pd
import re

#### 1. Write a function named is_vowel. It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

In [2]:
# create function to determine if string is a vowel

def is_vowel(x): 
   
    '''
    This function accepts a string input and uses regex to determine if it is a vowel.
    '''
    
    # regex for vowel check
    regex = r'^[aeiouAEIOU]$'
    
    # use search() method
    if(re.search(regex, x)):
        result=True
        
    else:
        result=False
        
    return result

In [3]:
is_vowel('a')

True

In [4]:
is_vowel('E')

True

In [5]:
is_vowel('b')

False

In [6]:
is_vowel('C')

False

#### 2. Write a function named is_valid_username that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character. It should also be no longer than 32 characters. The function should return either True or False depending on whether the passed string is a valid username.

- is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
    - False
- is_valid_username('codeup')
    - True
- is_valid_username('Codeup')
    - False
- is_valid_username('codeup123')
    - True
- is_valid_username('1codeup')
    - False

In [7]:
# create function to check for valid usernames

def is_valid_username(username):
    
    '''
    This function accepts a string as input and checks to see if username: 
    starts with a lowercase letter, 
    only consists of lowercase letters, numbers, or the _ character,
    is no longer than 32 characters. 
    Returns True is valid and False if invalid. 
    
    '''
    
    # regex for valid username perameters
    regex =r'^[a-z][a-z0-9_+]{2,32}$'
    
    # use serch() method
    if(re.search(regex, username)):
        result=True
        
    else:
        result=False
        
    return result

In [8]:
is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')

True

In [9]:
is_valid_username('codeup')

True

In [10]:
is_valid_username('Codeup')

False

In [11]:
is_valid_username('codeup123')

True

In [12]:
is_valid_username('1codeup')

False

#### 3. Write a regular expression to capture phone numbers. It should match all of the following:

- (210) 867 5309
- +1 210.867.5309
- 867-5309
- 210-867-5309

In [13]:
# question mark metacharacter means the thing to the left of the ? is optional: ?[0-9]

In [14]:
# use findall to return all matches 
re.findall(r'[(]\d{3}[)]\s[0-9]{3}\s[0-9]{4}', '(210) 867 5309')

['(210) 867 5309']

In [15]:
# use findall to return all matches 
re.findall(r'[+]\d\s\d{3}[.][0-9]{3}[.][0-9]{4}', '+1 210.867.5309')

['+1 210.867.5309']

In [16]:
# use findall to return all matches 
re.findall(r'[0-9]{3}[-][0-9]{4}', '867-5309')

['867-5309']

In [17]:
# use findall to return all matches 
re.findall(r'[0-9]{3}[-][0-9]{4}', '867-5309')

['867-5309']

In [18]:
# use findall to return all matches 
re.findall(r'[0-9]{3}[-][0-9]{3}[-][0-9]{4}', '210-867-5309')

['210-867-5309']

#### 4. Use regular expressions to convert the dates below to the standardized year-month-day format.

- 02/04/19
- 02/05/19
- 02/06/19
- 02/07/19
- 02/08/19
- 02/09/19
- 02/10/19

In [19]:
# re.sub allows us to match a regex and substitute in a new substring for the match
# re.sub(pattern, repl, string, count=0, flags=0

In [20]:
# Create list of dates
dates = [
    '02/04/19',
    '02/05/19',
    '02/06/19',
    '02/07/19',
    '02/08/19',
    '02/09/19',
    '02/10/19']

In [21]:
[re.sub('(\d+)/(\d+)/(\d+)', r'20\3-\1-\2',date) for date in dates]

['2019-02-04',
 '2019-02-05',
 '2019-02-06',
 '2019-02-07',
 '2019-02-08',
 '2019-02-09',
 '2019-02-10']

#### 5. Write a regex to extract the various parts of these logfile lines:

- GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58

- POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58

- GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58

#### Bonus Exercise

#### You can find a list of words on your mac at /usr/share/dict/words. Use this file to answer the following questions:


- How many words have at least 3 vowels?
- How many words have at least 3 vowels in a row?
- How many words have at least 4 consonants in a row?
- How many words start and end with the same letter?
- How many words start and end with a vowel?
- How many words contain the same letter 3 times in a row?
- What other interesting patterns in words can you find?

#### Extra Regex Stuff: 

In [48]:
#https://docs.python.org/3/library/re.html
#>>> def dashrepl(matchobj):
#...     if matchobj.group(0) == '-': return ' '
#...     else: return '-'
#>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
#'pro--gram files'
#>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
#'Baked Beans & Spam'

In [49]:
def forward_slash_replace(matchobj):
    if matchobj.group(0) == '/': return '-'
    else: return ' '

In [50]:
re.sub('/{1,3}', forward_slash_replace, '02/04/19')

'02-04-19'