In [107]:
import re
import pandas as pd

**1) Write a function named is_vowel.** It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

In [13]:
def is_a_vowel(var):
    '''
    Takes in a string and returns true or false if it is a vowel.  
    Must by a single character to return True - ignores case.
    Prints error to screen if a string was not passed.
    '''
    if not isinstance(var,str): 
        print(f'Error: expected a string')
        return None
    #if returns match object (aka not None) & only one character 
    if re.search(r'[aeiou]',var,re.IGNORECASE) and len(var) == 1: return True
    else: return False

In [14]:
is_vowel('a')

True

In [15]:
is_vowel('A')

True

In [16]:
is_vowel('Ab')

False

In [20]:
is_vowel('aA')

False

In [18]:
is_vowel('b')

False

In [19]:
is_vowel('B')

False

**2) Write a function named is_valid_username** that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character. It should also be no longer than 32 characters. The function should return either True or False depending on whether the passed string is a valid username.

TEST:
- is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
False
- is_valid_username('codeup')
True
- is_valid_username('Codeup')
False
- is_valid_username('codeup123')
True
- is_valid_username('1codeup')
False


In [34]:
def is_valid_username(username):
    '''
    Takes in a string and checks if it is a valid username.
    Prints an error if a string is not passed.
    Returns True if a valid username, returns False if not valid and prints error message to screen
    
    A username is valid if:
        It starts with a lowercase letter
        It only contains lowercase letters, numbers and an underscore
        It is 32 characters or less.
    '''
    #error out if not passed a string
    if not isinstance(username,str): 
        print(f'Error: expected a string')
        return None
    #intitialize error message string
    err_msg = ''
    #if string doesn't start with lowercase letter
    if not username[0].islower():
        err_msg += "Username must start with a lowercase letter.\n"
    
    #If contains anything other than lower case letters, numbers or the _ character
    if re.search(r'[^0-9a-z_]',username):
        err_msg += "Username must only contain lowercase letters, numbers and the _ character (underscore).\n"
        
    #less than 32 characters
    if len(username) > 32: err_msg += "Username must be 32 characters or less.\n"
    
    #return true if we didn't find any errors
    if err_msg == '': return True
    #else print the error message and return false
    else: 
        print(err_msg)
        return False

In [24]:
is_valid_username('codeup_3')

True

In [32]:
is_valid_username('codeup123')

True

In [25]:
is_valid_username('Codeup')

Username must start with a lowercase letter.
Username can only contain lowercase letters, numbers and the _ character (underscore).



False

In [27]:
is_valid_username('codeup!')

Username can only contain lowercase letters, numbers and the _ character (underscore).



False

In [29]:
is_valid_username('1codeup')

Username must start with a lowercase letter.



False

In [31]:
is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')

Username must be 32 characters or less.



False

In [33]:
is_valid_username(2)

Error: expected a string


**3) Write a regular expression to capture phone numbers.** It should match all of the following:

In [92]:
phones = '''a (210) 867 5309\n
b+1 210.867.5309\n 
c867-5309:
d210-867-5309
e980 - 4356q\n
'''

In [36]:
def show_all_matches(regexes, subject, re_length=6):
    print('Sentence:')
    print()
    print('    {}'.format(subject))
    print()
    print(' regexp{} | matches'.format(' ' * (re_length - 6)))
    print(' ------{} | -------'.format(' ' * (re_length - 6)))
    for regexp in regexes:
        fmt = ' {:<%d} | {!r}' % re_length
        matches = re.findall(regexp, subject)
        if len(matches) > 8:
            matches = matches[:8] + ['...']
        print(fmt.format(regexp, matches))


In [98]:
# 1) test \d{3}[\.| \- | ]\d{4}     #NOT Bad, but doesn't find ' - '
# 2) try putting in 0-1 space option
# 3) Captures #2 which handles the last 7, and try to add first 3
show_all_matches([
    r'\d{3}',
    r'LAST 7',
    r'\d{3}[\.| \- | ]\d{4}',
    r'\d{3} {0,1}[\.|\-] {0,1}\d{4}',
    r'FIRST 3',
    r'\(\d{3}\)|\d{3}',
    r'COMBINE',
    r'(\(\d{3}\)|\d{3})?[ \.]?\d{3} {0,1}[\.|\-] {0,1}\d{4}',
    r'(\(\d{3}\)|\d{3})[ \.]\d{3} {0,1}[\.|\-] {0,1}\d{4}',
    r'BREAK',
    r'\(\d{3}\) \d{3}[\.| \- | ]\d{4}',
    r'\+1 \d{3}[ \.]\d{3}[\.| \- | ]\d{4}',
    r'\d{3}\-\d{4}',
    r'\d{3}\.\d{4}',
    r'\d{3}\-\d{3}\-\d{4}',
    r'(\+\d{1,2} )?(\(\d{3}\)|\d{3})?[ \.\-]?(\d{3} {0,1}[\.|\-] {0,1}\d{4})',
    r'CAPTURE EACH PIECE',
    r'(\+1 )?\(?(\d{3})?\)?[ \.\-]?(\d{3}) {0,1}[\.|\-] {0,1}(\d{4})'
], phones)
# For first (optional 3) >>> (\(\d{3}\)|\d{3})?
# went ahead and split this up to only capture the numbers. Don't want to split this up more to strip the '(' from the capture because then it would allow:
#   "(###"  or "###)"
# Also, we don't really need teh

Sentence:

    a (210) 867 5309

b+1 210.867.5309
 
c867-5309:
d210-867-5309
e980 - 4356q

b+34 210.867.5309

 regexp | matches
 ------ | -------
 \d{3}  | ['210', '867', '530', '210', '867', '530', '867', '530', '...']
 LAST 7 | []
 \d{3}[\.| \- | ]\d{4} | ['867 5309', '867.5309', '867-5309', '867-5309', '867.5309']
 \d{3} {0,1}[\.|\-] {0,1}\d{4} | ['867.5309', '867-5309', '867-5309', '980 - 4356', '867.5309']
 FIRST 3 | []
 \(\d{3}\)|\d{3} | ['(210)', '867', '530', '210', '867', '530', '867', '530', '...']
 COMBINE | []
 (\(\d{3}\)|\d{3})?[ \.]?\d{3} {0,1}[\.|\-] {0,1}\d{4} | ['210', '', '', '', '210']
 (\(\d{3}\)|\d{3})[ \.]\d{3} {0,1}[\.|\-] {0,1}\d{4} | ['210', '210']
 BREAK  | []
 \(\d{3}\) \d{3}[\.| \- | ]\d{4} | ['(210) 867 5309']
 \+1 \d{3}[ \.]\d{3}[\.| \- | ]\d{4} | ['+1 210.867.5309']
 \d{3}\-\d{4} | ['867-5309', '867-5309']
 \d{3}\.\d{4} | ['867.5309', '867.5309']
 \d{3}\-\d{3}\-\d{4} | ['210-867-5309']
 (\+\d{1,2} )?(\(\d{3}\)|\d{3})?[ \.\-]?(\d{3} {0,1}[\.|\-] {0,1}\d{4}

**4) Use regular expressions to convert the dates below to the standardized year-month-day format.**

- 02/04/19
- 02/05/19
- 02/06/19
- 02/07/19
- 02/08/19
- 02/09/19
- 02/10/19

In [105]:
test = '02/04/19'
mn, dy, yr = re.split(r'\/',test)

new_date = yr+'-'+mn+'-'+dy
new_date

'19-02-04'

In [112]:
#alt - if applying to a value in a series, try this
dates = pd.Series(['02/04/19', '02/05/19', '02/06/19','02/07/19', '02/08/19', '02/09/19', '02/10/19'])
#parses out into 3 pieces, then the "\3", "\2", "\1" in the second parameter tell where to put the pieces
dates.str.replace(r'(\d+)/(\d+)/(\d+)', r'20\3-\1-\2',regex=True)

0    2019-02-04
1    2019-02-05
2    2019-02-06
3    2019-02-07
4    2019-02-08
5    2019-02-09
6    2019-02-10
dtype: object

**5) Write a regex to extract the various parts of these logfile lines:**

GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58


In [115]:
lines = '''
GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58
'''
line = '''GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58'''

In [122]:
match = re.search('(?P<method>[^ ]*) (?P<path>[^ ]*) \[(?P<date_time>[^ ]*)\]',line)
match.groups()

('GET', '/api/v1/sales?page=86', '16/Apr/2019:193452+0000')

In [123]:
match.group('method')

'GET'

## TO DO - finish this log parser

**BONUS** You can find a list of words on your mac at /usr/share/dict/words. Use this file to answer the following questions:
- How many words have at least 3 vowels?
- How many words have at least 3 vowels in a row?
- How many words have at least 4 consonants in a row?
- How many words start and end with the same letter?
- How many words start and end with a vowel?
- How many words contain the same letter 3 times in a row?
- What other interesting patterns in words can you find?