# Strings and Regular Expressions

### What are the most common special characters used in Python regular expressions?

Python regular expressions utilize several core meta-characters to define search patterns, \
- including the dot (.) which matches any character, \
- the caret (^) and dollar sign ($) for anchoring to the start or end of a string, \
- and quantifiers like the asterisk (*) for zero or more repetitions, \
- the plus (+) for one or more, \
- and the question mark (?) for optionality. \
- Additionally, square brackets ([]) create character sets, \
- the pipe (|) functions as a logical OR, \
- and the backslash (\) is used to escape literal characters or introduce special sequences like \d for digits, \
- \w for alphanumeric characters, \
- and \s for whitespace. \

Using raw strings (e.g., r'\d') is the standard practice in Python to ensure these backslashes are interpreted correctly by the regex engine rather than as standard string escape sequences.

In [1]:
# Write a Python regular expression that matches a 10-digit phone number with hyphens.

import re

phone_regex = r'\d{3}-\d{3}-\d{4}'

# This pattern assumes a standard format like ###-###-####.

In [2]:
# Write a Python regular expression that matches a street address with a number and a street name, followed by ST or AVE.

address_regex = r'\d+\s+[A-Za-z\s]+(?:ST|AVE)'

In [3]:
# Write a Python regular expression that matches a full name with any common title like Mr or Mrs, followed by any number of names beginning with capital letters, possibly with hyphens between some names.

name_regex = r'(?:Mr|Mrs|Ms|Dr|Prof)\.?\s+[A-Z][a-z]+(?:[-\s][A-Z][a-z]+)*'

In [4]:
email_regex = r'[\w\.-]+@[\w\.-]+\.\w+'

### What is a raw string in Python?

A raw string is a string literal that treats the backslash character (\) as a literal character rather than an escape character. You create a raw string by prefixing the string with the letter r or R.

In [None]:
# write a function that does the same thing as the shell command !head.

def head_clone(input_filename, num_lines, output_filename=None):
    fin = open(input_filename, 'r')

    fout = None
    if output_filename is not None:
        fout = open(output_filename, 'w')

    count = 0
    for line in fin:
        if count >= num_lines:
            break

        if fout is None:
            print(line, end='')
        else:
            fout.write(line)

        count += 1

    fin.close()
    if fout is not None:
        fout.close()

In [None]:
# Write a function called check_word that takes a five-letter word and checks whether it could be the target word
def check_word(word):
    """Checks whether a five-letter word could be the target Wordle word.

    Constraints based on guesses SPADE and CLERK:
    - Must be exactly 5 letters.
    - Must not contain 's', 'p', 'a', 'd', 'c', 'l', 'r', or 'k'.
    - Must contain 'e'.
    - 'e' cannot be at index 4 or index 2.
    """
    if len(word) != 5:
        return False

    if not uses_none(word, 'spadclrk'):
        return False

    if not uses_all(word, 'e'):
        return False

    if word[4].lower() == 'e' or word[2].lower() == 'e':
        return False

    return True

In [None]:
# Continuing the previous exercise, suppose you guess the word TOTEM and learn that the E is still not in the right place, but the M is. How many words are left?

def check_word(word):
    """Checks if a word is valid given guesses SPADE, CLERK, and TOTEM.

    Constraints:
    - Length is 5.
    - Must not use letters in 'spadclrkto'.
    - Must contain 'e'.
    - Position 5 (index 4) must be 'm'.
    - 'e' cannot be at index 4, 2, or 3.
    """
    word = word.lower()
    if len(word) != 5:
        return False

    if not uses_none(word, 'spadclrkto'):
        return False

    if not uses_all(word, 'e'):
        return False

    # 'm' must be at the end
    if word[4] != 'm':
        return False

    # 'e' cannot be in the yellow spots (indices 4, 2, 3)
    if word[4] == 'e' or word[2] == 'e' or word[3] == 'e':
        return False

    return True

In [None]:
# letâ€™s count the number of times the word pale appears in any form, including pale, pales, paled, and paleness, as well as the related word pallor. Use a single regular expression that matches all of these words and no others.

def count_eco_words(text):
    """
    Counts specific variations of 'pale' and 'pallor' in the given text.

    The pattern matches:
    - 'pale'
    - 'pales'
    - 'paled'
    - 'paleness'
    - 'pallor'
    """
    pattern = r'\b(pale(s|d|ness)?|pallor)\b'
    return len(re.findall(pattern, text, re.IGNORECASE))