**Table of contents**<a id='toc0_'></a>    
- [Character Classes](#toc1_)    
  - [Predefined Character classes](#toc1_1_)    
  - [Custom Character Classes](#toc1_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Character Classes](#toc0_)

## <a id='toc1_1_'></a>[Predefined Character classes](#toc0_)

Character classes allow you to match any character from a specific set.

Predefined Character classes:

Python provides several predefined character classes:

- \d - Matches any digit (0-9)

-  \D - Matches any non-digit

- \w - Matches any alphanumeric character and underscore [a-zA-Z0-9_]

- \W - Matches any non-alphanumeric character

- \s - Matches any whitespace character (space, tab, newline)

- \S - Matches any non-whitespace character


In [6]:
import re

In [None]:
text = "The quick brown fox jumps over the lazy dog. It was born in 2019!"
print(f"Sample text: '{text}'")
print("-" * 80)

# Example of \d - matches any digit
digits = re.findall(r"\d", text)
print(f"\d (digits): {digits}")

# Example of \D - matches any non-digit
non_digits = re.findall(r"\D", text)
print(f"\D (non-digits): {non_digits[:10]}... (showing first 10 of {len(non_digits)})")

# Example of \w - matches any word character (alphanumeric + underscore)
word_chars = re.findall(r"\w", text)
print(f"\w (word characters): {word_chars[:15]}... (showing first 15 of {len(word_chars)})")

# Example of \W - matches any non-word character
non_word_chars = re.findall(r"\W", text)
print(f"\W (non-word characters): {non_word_chars}")

# Example of \s - matches any whitespace character
whitespace_chars = re.findall(r"\s", text)
print(f"\s (whitespace): {whitespace_chars} (represented as spaces)")

# Example of \S - matches any non-whitespace character
non_whitespace = re.findall(r"\S", text)
print(f"\S (non-whitespace): {non_whitespace[:15]}... (showing first 15 of {len(non_whitespace)})")

# Practical example: Finding all words in a text
words = re.findall(r"\b\w+\b", text)
print(f"\nAll words using \w: {words}")

# Practical example: Extracting all numbers from a text
numbers_text = "Order #12345 includes 2 items at $19.99 each for delivery on 04/23/2023"
numbers = re.findall(r"\d+", numbers_text)  
print(f"\nAll numbers using \d+: {numbers}")

Sample text: 'The quick brown fox jumps over the lazy dog. It was born in 2019!'
--------------------------------------------------------------------------------
\d (digits): ['2', '0', '1', '9']
\D (non-digits): ['T', 'h', 'e', ' ', 'q', 'u', 'i', 'c', 'k', ' ']... (showing first 10 of 61)
\w (word characters): ['T', 'h', 'e', 'q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o']... (showing first 15 of 50)
\W (non-word characters): [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.', ' ', ' ', ' ', ' ', ' ', '!']
\s (whitespace): [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '] (represented as spaces)
\S (non-whitespace): ['T', 'h', 'e', 'q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o']... (showing first 15 of 52)

All words using \w: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', 'It', 'was', 'born', 'in', '2019']

All numbers using \d+: ['12345', '2', '19', '99', '04', '23', '2023']


## <a id='toc1_2_'></a>[Custom Character Classes](#toc0_)

You can define your own character classes using square brackets []:

- [abc] - Matches any of the characters a, b, or c

- [a-z] - Matches any lowercase letter from a to z

- [^abc] - Matches any character except a, b, or c

In [10]:
text = "The quick brown fox jumps over the lazy dog. It was born in 2019!"
print(f"Sample text: '{text}'")
print("-" * 80)


# Basic custom character class - matches any vowel
vowels = re.findall(r"[aeiou]", text.lower())
print(f"Vowels [aeiou]: {vowels[:15]}... (showing first 15 of {len(vowels)})")

# Character ranges - matches any lowercase letter
lowercase_letters = re.findall(r"[a-z]", text)
print(f"Lowercase [a-z]: {lowercase_letters[:15]}... (showing first 15 of {len(lowercase_letters)})")

# Multiple ranges - matches any letter (uppercase or lowercase)
all_letters = re.findall(r"[a-zA-Z]", text)
print(f"All letters [a-zA-Z]: {all_letters[:15]}... (showing first 15 of {len(all_letters)})")

# Combining different types - matches letters and digits
alphanumeric = re.findall(r"[a-zA-Z0-9]", text)
print(f"Alphanumeric [a-zA-Z0-9]: {alphanumeric[:15]}... (showing first 15 of {len(alphanumeric)})")

# Combining individual characters and ranges
mixed_class = re.findall(r"[aeiouAEIOU0-9]", text)
print(f"Vowels and digits [aeiouAEIOU0-9]: {mixed_class}")

# Practical example: Finding words containing specific letters
words_with_z = re.findall(r"\b\w*[z]\w*\b", text.lower())
print(f"\nWords containing 'z': {words_with_z}")

# Practical example: Finding words starting with specific letters
words_starting_with_t = re.findall(r"\b[tT]\w+\b", text)
print(f"Words starting with 't' or 'T': {words_starting_with_t}")


Sample text: 'The quick brown fox jumps over the lazy dog. It was born in 2019!'
--------------------------------------------------------------------------------
Vowels [aeiou]: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o', 'i', 'a', 'o', 'i']... (showing first 15 of 15)
Lowercase [a-z]: ['h', 'e', 'q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o', 'x']... (showing first 15 of 44)
All letters [a-zA-Z]: ['T', 'h', 'e', 'q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o']... (showing first 15 of 46)
Alphanumeric [a-zA-Z0-9]: ['T', 'h', 'e', 'q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o']... (showing first 15 of 50)
Vowels and digits [aeiouAEIOU0-9]: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o', 'I', 'a', 'o', 'i', '2', '0', '1', '9']

Words containing 'z': ['lazy']
Words starting with 't' or 'T': ['The', 'the']


## Negated Character Classes

In [11]:
text = "The quick brown fox jumps over the lazy dog. It was born in 2019!"
print(f"Sample text: '{text}'")
print("-" * 80)


# Negated character class - matches any non-vowel
consonants = re.findall(r"[^aeiou\s\W\d]", text.lower())
print(f"Consonants [^aeiou\\s\\W\\d]: {consonants}")

# Negated ranges - matches any character that is not a letter
non_letters = re.findall(r"[^a-zA-Z]", text)
print(f"Non-letters [^a-zA-Z]: {non_letters}")

# Combining negations - matches any character that is not a letter or digit
non_alphanumeric = re.findall(r"[^a-zA-Z0-9]", text)
print(f"Non-alphanumeric [^a-zA-Z0-9]: {non_alphanumeric}")

# Important: ^ only means negation when it's the first character in a class
includes_caret = re.findall(r"[a^e]", "a^e")
print(f"Characters 'a', '^', or 'e' [a^e]: {includes_caret}")

# Practical example: Finding words without specific letters
words_without_e = re.findall(r"\b[^e]*\b", "The green tree seems fresh.")
print(f"\nWords without 'e': {[w for w in words_without_e if w]}")

# Practical example: Removing non-alphanumeric characters
text_with_symbols = "Hello, World! How's it going? 123"
clean_text = re.sub(r"[^a-zA-Z0-9\s]", "", text_with_symbols)
print(f"Original: '{text_with_symbols}'")
print(f"Cleaned: '{clean_text}'")

Sample text: 'The quick brown fox jumps over the lazy dog. It was born in 2019!'
--------------------------------------------------------------------------------
Consonants [^aeiou\s\W\d]: ['t', 'h', 'q', 'c', 'k', 'b', 'r', 'w', 'n', 'f', 'x', 'j', 'm', 'p', 's', 'v', 'r', 't', 'h', 'l', 'z', 'y', 'd', 'g', 't', 'w', 's', 'b', 'r', 'n', 'n']
Non-letters [^a-zA-Z]: [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.', ' ', ' ', ' ', ' ', ' ', '2', '0', '1', '9', '!']
Non-alphanumeric [^a-zA-Z0-9]: [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.', ' ', ' ', ' ', ' ', ' ', '!']
Characters 'a', '^', or 'e' [a^e]: ['a', '^', 'e']

Words without 'e': [' ', ' ', ' ', ' ']
Original: 'Hello, World! How's it going? 123'
Cleaned: 'Hello World Hows it going 123'


## COMBINING WITH QUANTIFIERS

In [12]:
# Matching sequences of digits
numbers_text = "Order #12345 includes 2 items at $19.99 each for delivery on 04/23/2023"
multi_digit_numbers = re.findall(r"\d+", numbers_text)
print(f"All number sequences: {multi_digit_numbers}")

# Finding words with specific length ranges using character classes and quantifiers
text = "The quick brown fox jumps over the lazy dog"
medium_words = re.findall(r"\b[a-zA-Z]{4,6}\b", text)
print(f"Words with 4-6 letters: {medium_words}")

# Find all words starting with a vowel
vowel_start_words = re.findall(r"\b[aeiou][a-z]*\b", text.lower())
print(f"Words starting with vowels: {vowel_start_words}")

# Practical example: Validating US phone numbers
phone_numbers = ["123-456-7890", "(123) 456-7890", "123.456.7890", "123 456 7890", 
                "1234567890", "12-34-56", "123-4567"]
valid_numbers = []

for phone in phone_numbers:
    # Match valid US phone number formats
    if re.fullmatch(r"\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}", phone):
        valid_numbers.append(phone)

print(f"\nValid US phone numbers: {valid_numbers}")

All number sequences: ['12345', '2', '19', '99', '04', '23', '2023']
Words with 4-6 letters: ['quick', 'brown', 'jumps', 'over', 'lazy']
Words starting with vowels: ['over']

Valid US phone numbers: ['123-456-7890', '(123) 456-7890', '123.456.7890', '123 456 7890', '1234567890']


In [14]:
# Example 1: Email validation
def is_valid_email(email):
    # Basic email validation pattern using character classes
    pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    return bool(re.fullmatch(pattern, email))

emails = ["user@example.com", "john.doe@company.co.uk", "invalid@", 
         "missing-domain@.com", "special#char@domain.com"]

print("Email validation:")
for email in emails:
    print(f"  {email}: {'Valid' if is_valid_email(email) else 'Invalid'}")

# Example 2: Password strength checker
def check_password_strength(password):
    # Check length (at least 8 characters)
    if len(password) < 8:
        return "Too short"
    
    # Check for at least one lowercase letter
    if not re.search(r"[a-z]", password):
        return "Missing lowercase letter"
    
    # Check for at least one uppercase letter
    if not re.search(r"[A-Z]", password):
        return "Missing uppercase letter"
    
    # Check for at least one digit
    if not re.search(r"[0-9]", password):
        return "Missing digit"
    
    # Check for at least one special character
    if not re.search(r"[!@#$%^&*(),.?\":{}|<>]", password):
        return "Missing special character"
    
    return "Strong"

passwords = ["password", "Password1", "p@ssW0rd", "StrongP@ss123"]
print("\nPassword strength checking:")
for password in passwords:
    strength = check_password_strength(password)
    print(f"  '{password}': {strength}")

# Example 3: Extract data from text
log_entry = "192.168.1.1 - - [21/Apr/2023:14:31:25 +0000] \"GET /index.html HTTP/1.1\" 200 1234"

# Extract IP address using character classes
ip_pattern = r"([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})"
ip_match = re.search(ip_pattern, log_entry)
ip_address = ip_match.group(1) if ip_match else "Not found"

# Extract date and time
date_pattern = r"\[([0-9]{2}/[A-Za-z]{3}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2})"
date_match = re.search(date_pattern, log_entry)
date_time = date_match.group(1) if date_match else "Not found"

print("\nLog file parsing:")
print(f"  IP Address: {ip_address}")
print(f"  Date/Time: {date_time}")

Email validation:
  user@example.com: Valid
  john.doe@company.co.uk: Valid
  invalid@: Invalid
  missing-domain@.com: Invalid
  special#char@domain.com: Invalid

Password strength checking:
  'password': Missing uppercase letter
  'Password1': Missing special character
  'p@ssW0rd': Strong
  'StrongP@ss123': Strong

Log file parsing:
  IP Address: 192.168.1.1
  Date/Time: 21/Apr/2023:14:31:25
