# REGULAR EXPRESSIONS


In [1]:
import re
import pandas as pd

### Do your work for this exercise in a file named regex_exercises.

**1. Write a function named is_vowel. It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.**

In [2]:
re.search(r"^(a|e|i|o|u)$", "a", re.IGNORECASE)

<re.Match object; span=(0, 1), match='a'>

In [3]:
re.search(r"^(a|e|i|o|u)$", "aeiou", re.IGNORECASE)

In [4]:
 # Function to check if a character is a vowel
def is_vowel(string):
    return bool(re.search(r"^[aeiou]$", string, re.IGNORECASE))


assert is_vowel("a") == True
assert is_vowel("E") == True
assert is_vowel("aaa") == False
assert is_vowel("aeiou") == False
assert is_vowel("r") == False


# Print a message if all assertions passed
print("All assertions passed.")


All assertions passed.


**2. Write a function named is_valid_username that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character. It should also be no longer than 32 characters. The function should return either True or False depending on whether the passed string is a valid username.**


- is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
False
- is_valid_username('codeup')
True
- is_valid_username('Codeup')
False
- is_valid_username('codeup123')
True
- is_valid_username('1codeup')
False

In [5]:
# Function to validate a username
def is_valid_username(string):
    regexp_usnm = r"^[a-z][a-z0-9_]{,31}$"
    return bool(re.search(regexp_usnm, string))

# Test the function with example usernames
assert is_valid_username("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa") == False
assert is_valid_username("codeup") == True
assert is_valid_username("Codeup") == False
assert is_valid_username("codeup123") == True
assert is_valid_username("1codeup") == False

# Print a message if all assertions passed
print("All assertions passed.")


All assertions passed.


**3. Write a regular expression to capture phone numbers. It should match all of the following:**


- 210) 867 5309
- +1 210.867.5309
- 867-5309
- 210-867-5309

In [6]:
# Regular expression to capture phone numbers
phone_numbers = [
    '(210) 867 5309',
    '+1 210.867.5309',
    '867-5309',
    '210-867-5309'
]
phone_regex = r'^(\(\d{3}\)\s|\+?\d{1,2}\s)?\d{3}[-\s]?\d{4}$'

# Call the function with example phone numbers
for number in phone_numbers:
    if re.match(phone_regex, number):
        print(f"Valid phone number: {number}")
    else:
        print(f"Not a valid phone number: {number}")

Valid phone number: (210) 867 5309
Not a valid phone number: +1 210.867.5309
Valid phone number: 867-5309
Not a valid phone number: 210-867-5309


In [7]:
df_numbers = pd.DataFrame()
df_numbers['number'] = [
    '(210) 867 5309',
    '+1 210.867.5309',
    '867-5309',
    '210-867-5309',
    '2108675309',
]
df_numbers


Unnamed: 0,number
0,(210) 867 5309
1,+1 210.867.5309
2,867-5309
3,210-867-5309
4,2108675309


In [8]:
df_numbers.number.str.extract(phone_regex)

Unnamed: 0,0
0,(210)
1,
2,
3,
4,


In [9]:
df_numbers = pd.concat([df_numbers, df_numbers.number.str.extract(phone_regex)], axis=1)
df_numbers

Unnamed: 0,number,0
0,(210) 867 5309,(210)
1,+1 210.867.5309,
2,867-5309,
3,210-867-5309,
4,2108675309,


**4.Use regular expressions to convert the dates below to the standardized year-month-day format**.


- 02/04/19
- 02/05/19
- 02/06/19
- 02/07/19
- 02/08/19
- 02/09/19
- 02/10/19

In [14]:
#Convert dates to standardized format
dates_regexp = re.compile(r"""
    (?P<month>\d{2})/
    (?P<day>\d{2})/
    (?P<year>\d{2})
""", re.VERBOSE)

In [15]:
# Dates
dates = [
    "02/04/19",
    "02/05/19",
    "02/06/19",
    "02/07/19",
    "02/08/19",
    "02/09/19",
    "02/10/19"
]
# Call the function with example dates
for date in dates:
    match = re.search(dates_regexp, date)
    if match:
        year = "20" + match.group("year")
        formatted_date = f"{year}-{match.group('month')}-{match.group('day')}"
        print(formatted_date)


2019-02-04
2019-02-05
2019-02-06
2019-02-07
2019-02-08
2019-02-09
2019-02-10


**5.Write a regex to extract the various parts of these logfile lines:**


GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58

In [16]:
file_regexp = re.compile(r"""
    (?P<method>GET|POST) 
    \s
    (?P<path>/.+?)
    \s
    \[(?P<timestamp>.+)\]
    \s
    (?P<http_version>HTTP/\d+\.\d+)
    \s
    \{(?P<status_code>\d+)\}
    \s
    (?P<bytes>\d+)
    \s
    "(?P<user_agent>.+)"
    \s
    (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
    $
""", re.VERBOSE)

In [17]:
lines = """
GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58
"""
for lins in lines:
    match = re.search(file_regexp, lines)
    if match:
        print(match.groupdict())

{'method': 'GET', 'path': '/api/v1/items?page=3', 'timestamp': '16/Apr/2019:193453+0000', 'http_version': 'HTTP/1.1', 'status_code': '429', 'bytes': '3561', 'user_agent': 'python-requests/2.21.0', 'ip': '97.105.19.58'}
{'method': 'GET', 'path': '/api/v1/items?page=3', 'timestamp': '16/Apr/2019:193453+0000', 'http_version': 'HTTP/1.1', 'status_code': '429', 'bytes': '3561', 'user_agent': 'python-requests/2.21.0', 'ip': '97.105.19.58'}
{'method': 'GET', 'path': '/api/v1/items?page=3', 'timestamp': '16/Apr/2019:193453+0000', 'http_version': 'HTTP/1.1', 'status_code': '429', 'bytes': '3561', 'user_agent': 'python-requests/2.21.0', 'ip': '97.105.19.58'}
{'method': 'GET', 'path': '/api/v1/items?page=3', 'timestamp': '16/Apr/2019:193453+0000', 'http_version': 'HTTP/1.1', 'status_code': '429', 'bytes': '3561', 'user_agent': 'python-requests/2.21.0', 'ip': '97.105.19.58'}
{'method': 'GET', 'path': '/api/v1/items?page=3', 'timestamp': '16/Apr/2019:193453+0000', 'http_version': 'HTTP/1.1', 'statu

Bonus Exercise

You can find a list of words on your mac at /usr/share/dict/words. Use this file to answer the following questions:


- How many words have at least 3 vowels?
- How many words have at least 3 vowels in a row?
- How many words have at least 4 consonants in a row?
- How many words start and end with the same letter?
- How many words start and end with a vowel?
- How many words contain the same letter 3 times in a row?
- What other interesting patterns in words can you find?