# MODULE 3: STRINGS AND TEXT PROCESSING 📝

## Comprehensive Guide to Python Strings

Welcome to Module 3! This module provides an in-depth exploration of Python's string handling capabilities, text processing techniques, and regular expressions.

---

## 📚 Module Overview

This module covers:
- **String Basics**: Creation, immutability, and operations
- **String Methods**: Comprehensive coverage of built-in methods
- **String Formatting**: Modern formatting techniques
- **Unicode and Encoding**: Working with international text
- **Regular Expressions**: Pattern matching and text processing
- **Text Processing**: Real-world applications

---

## 📑 Table of Contents

### 3.1 String Basics
### 3.2 String Indexing and Slicing
### 3.3 String Methods
### 3.4 String Formatting
### 3.5 Unicode and Encoding
### 3.6 Regular Expressions

---

Let's master Python strings! 🚀

# 3.1 String Basics

Python strings are immutable sequences of Unicode characters. Understanding their properties and behavior is fundamental to text processing in Python.

## 3.1.1 String Literals

Python offers multiple ways to create string literals, each suited for different use cases.

In [None]:
# Different ways to create strings
print("String Literal Types")
print("="*60)

# Single quotes
single = 'Hello, World!'
print(f"Single quotes: {single}")

# Double quotes
double = "Python Programming"
print(f"Double quotes: {double}")

# Triple quotes for multiline
multiline = '''This is a
multiline string that
preserves line breaks'''
print(f"\nMultiline string:\n{multiline}")

# Docstrings
def example_function():
    """
    This is a docstring.
    It documents the function's purpose.
    """
    pass

print(f"\nDocstring: {example_function.__doc__}")

# Mixing quotes to avoid escaping
mixed1 = "It's easy to use apostrophes"
mixed2 = 'She said, "Hello!"'
print(f"\nMixed quotes:")
print(f"  {mixed1}")
print(f"  {mixed2}")

# Escape sequences
escaped = "Line 1\nLine 2\tTabbed\\Backslash"
print(f"\nEscape sequences:\n{escaped}")

## 3.1.2 Raw Strings

Raw strings treat backslashes as literal characters, making them perfect for regular expressions and file paths.

In [None]:
# Raw strings demonstration
print("Raw Strings (r-strings)")
print("="*60)

# Normal vs raw string
normal = "Line 1\nLine 2\tTabbed"
raw = r"Line 1\nLine 2\tTabbed"

print("Normal string:")
print(normal)
print(f"\nRaw string:")
print(raw)

# Perfect for regex patterns
import re

# Without raw string - need double backslashes
pattern_normal = "\\d{3}-\\d{3}-\\d{4}"
# With raw string - single backslashes
pattern_raw = r"\d{3}-\d{3}-\d{4}"

print(f"\nRegex patterns:")
print(f"Normal: {pattern_normal}")
print(f"Raw: {pattern_raw}")

# Windows file paths
windows_path_normal = "C:\\Users\\Alice\\Documents\\file.txt"
windows_path_raw = r"C:\Users\Alice\Documents\file.txt"

print(f"\nFile paths:")
print(f"Normal: {windows_path_normal}")
print(f"Raw: {windows_path_raw}")

# Testing regex with raw string
phone = "123-456-7890"
if re.match(pattern_raw, phone):
    print(f"\n'{phone}' matches phone pattern!")

## 3.1.3 Unicode Strings

Python 3 strings are Unicode by default, supporting international characters and emojis.

In [None]:
# Unicode string handling
print("Unicode Strings")
print("="*60)

# Unicode text with various scripts
multilingual = "Hello 世界 مرحبا Здравствуй 🌍🐍"
print(f"Multilingual text: {multilingual}")
print(f"Length: {len(multilingual)} characters")

# Unicode escapes
print("\nUnicode escapes:")
omega = "\u03A9"  # Greek Omega
emoji = "\U0001F600"  # Grinning face
print(f"\\u03A9 = {omega}")
print(f"\\U0001F600 = {emoji}")

# Character information
import unicodedata

chars = ['A', '中', '🐍', 'Ω', '♠']
print("\nCharacter details:")
for char in chars:
    code = ord(char)
    name = unicodedata.name(char, 'UNKNOWN')
    category = unicodedata.category(char)
    print(f"{char}: U+{code:04X}, {name}, Category: {category}")

# Emoji support
emojis = "😀😁😂🤣😃😄😅😆😉😊"
print(f"\nEmojis: {emojis}")
print(f"Number of emoji: {len(emojis)}")

# Unicode normalization
text1 = "café"  # é as single character
text2 = "café"  # e + combining accent
print(f"\nLook same? '{text1}' == '{text2}': {text1 == text2}")
print(f"After normalization: {unicodedata.normalize('NFC', text1) == unicodedata.normalize('NFC', text2)}")

## 3.1.4 String Immutability

Strings cannot be modified in place. Any operation that appears to modify a string actually creates a new one.

In [None]:
# String immutability demonstration
print("String Immutability")
print("="*60)

# Strings cannot be modified
s = "Python"
print(f"Original string: {s}")
print(f"Original ID: {id(s)}")

# This creates a new string
s = s + " Programming"
print(f"After concatenation: {s}")
print(f"New ID: {id(s)}")
print("Note: Different ID means new object created")

# Cannot modify individual characters
text = "Hello"
# text[0] = 'J'  # This would raise TypeError
print(f"\nCannot do: text[0] = 'J'")
print("Must create new string instead:")
new_text = 'J' + text[1:]
print(f"Original: {text}")
print(f"New: {new_text}")

# Performance implications
import timeit

# Bad: String concatenation in loop
def bad_concat():
    result = ""
    for i in range(100):
        result += str(i)  # Creates new string each time
    return result

# Good: Use list and join
def good_concat():
    parts = []
    for i in range(100):
        parts.append(str(i))
    return ''.join(parts)

# Best: List comprehension
def best_concat():
    return ''.join(str(i) for i in range(100))

print("\nPerformance comparison (1000 iterations):")
print(f"Bad (+=): {timeit.timeit(bad_concat, number=1000):.4f}s")
print(f"Good (list+join): {timeit.timeit(good_concat, number=1000):.4f}s")
print(f"Best (comprehension): {timeit.timeit(best_concat, number=1000):.4f}s")

## 3.1.5 String Interning

Python optimizes memory by reusing immutable string objects through interning.

In [None]:
# String interning demonstration
import sys

print("String Interning")
print("="*60)

# Small strings are automatically interned
a = "hello"
b = "hello"
print("Small strings:")
print(f"a = '{a}', b = '{b}'")
print(f"a is b: {a is b}")
print(f"Same object: ID(a)={id(a)}, ID(b)={id(b)}")

# Longer strings might not be interned
c = "hello world " * 10
d = "hello world " * 10
print(f"\nLonger strings:")
print(f"c is d: {c is d}")

# Force interning with sys.intern()
e = sys.intern("hello world " * 10)
f = sys.intern("hello world " * 10)
print(f"\nAfter sys.intern():")
print(f"e is f: {e is f}")

# Memory savings with interning
print("\nMemory savings demonstration:")
# Without interning
strings_no_intern = ["Python" * 10 for _ in range(1000)]
unique_ids = len(set(id(s) for s in strings_no_intern))
print(f"Without intern: {unique_ids} unique objects")

# With interning
strings_interned = [sys.intern("Python" * 10) for _ in range(1000)]
unique_ids = len(set(id(s) for s in strings_interned))
print(f"With intern: {unique_ids} unique object")

# When to use interning
print("\nUse interning for:")
print("• Dictionary keys that repeat frequently")
print("• String constants used for comparison")
print("• Large datasets with repeated strings")

# 3.2 String Indexing and Slicing

Python provides powerful indexing and slicing operations for accessing and extracting parts of strings.

In [None]:
# String indexing
text = "Python Programming"
print("String Indexing")
print("="*60)
print(f"Text: '{text}'")
print(f"Length: {len(text)}")

# Positive indexing (0-based)
print("\nPositive indexing:")
print(f"text[0] = '{text[0]}'  (first)")
print(f"text[6] = '{text[6]}'  (7th character)")
print(f"text[17] = '{text[17]}'  (last)")

# Negative indexing
print("\nNegative indexing:")
print(f"text[-1] = '{text[-1]}'  (last)")
print(f"text[-2] = '{text[-2]}'  (second to last)")
print(f"text[-18] = '{text[-18]}'  (first)")

# Visualization
print("\nIndex visualization:")
print("  P  y  t  h  o  n     P  r  o  g  r  a  m  m  i  n  g")
print("  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17")
print("-18-17-16-15-14-13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1")

In [None]:
# String slicing
text = "Python Programming Language"
print("String Slicing")
print("="*60)
print(f"Text: '{text}'")

# Basic slicing [start:stop]
print("\nBasic slicing [start:stop]:")
print(f"text[0:6] = '{text[0:6]}'")
print(f"text[7:18] = '{text[7:18]}'")
print(f"text[19:27] = '{text[19:27]}'")

# Default values
print("\nDefault values:")
print(f"text[:6] = '{text[:6]}'  (from start)")
print(f"text[7:] = '{text[7:]}'  (to end)")
print(f"text[:] = '{text[:]}'  (complete copy)")

# Extended slicing [start:stop:step]
print("\nExtended slicing [start:stop:step]:")
print(f"text[::2] = '{text[::2]}'  (every 2nd)")
print(f"text[1::2] = '{text[1::2]}'  (odd positions)")
print(f"text[::-1] = '{text[::-1]}'  (reversed)")
print(f"text[::-2] = '{text[::-2]}'  (reversed, every 2nd)")

# Common patterns
print("\nCommon slicing patterns:")
print(f"First 5: '{text[:5]}'")
print(f"Last 5: '{text[-5:]}'")
print(f"Without first: '{text[1:]}'")
print(f"Without last: '{text[:-1]}'")
print(f"Middle 10: '{text[9:19]}'")

# 3.3 String Methods

Python strings come with a rich set of built-in methods for manipulation and analysis.

## 3.3.1 Case Methods

In [None]:
# Case conversion methods
text = "Python Programming Language"
print("Case Conversion Methods")
print("="*60)
print(f"Original: '{text}'")

# All case methods
print(f"\nupper(): '{text.upper()}'")
print(f"lower(): '{text.lower()}'")
print(f"capitalize(): '{text.capitalize()}'")
print(f"title(): '{text.title()}'")
print(f"swapcase(): '{text.swapcase()}'")

# casefold() for aggressive lowercase (better for comparisons)
german = "Straße"
print(f"\nCasefold vs lower for '{german}':")
print(f"lower(): '{german.lower()}'")
print(f"casefold(): '{german.casefold()}'  (ß → ss)")

# Practical use: case-insensitive comparison
def case_insensitive_compare(s1, s2):
    return s1.casefold() == s2.casefold()

print("\nCase-insensitive comparison:")
print(f"'HELLO' == 'hello': {case_insensitive_compare('HELLO', 'hello')}")
print(f"'Straße' == 'STRASSE': {case_insensitive_compare('Straße', 'STRASSE')}")

## 3.3.2 Search Methods

In [None]:
# String search methods
text = "Python is awesome. Python is powerful. Python is versatile."
print("String Search Methods")
print("="*60)
print(f"Text: '{text}'")

# find() method
print("\nfind() method:")
print(f"find('Python'): {text.find('Python')}")
print(f"find('Python', 10): {text.find('Python', 10)}  (start from 10)")
print(f"find('Java'): {text.find('Java')}  (not found = -1)")

# index() method
print("\nindex() method:")
print(f"index('Python'): {text.index('Python')}")
try:
    text.index('Java')
except ValueError as e:
    print(f"index('Java'): ValueError - {e}")

# rfind() and rindex() - search from right
print("\nRight-to-left search:")
print(f"rfind('Python'): {text.rfind('Python')}  (last occurrence)")
print(f"rindex('is'): {text.rindex('is')}")

# count() method
print("\ncount() method:")
print(f"count('Python'): {text.count('Python')}")
print(f"count('is'): {text.count('is')}")
print(f"count('o'): {text.count('o')}")

# Finding all occurrences
def find_all(text, substring):
    """Find all occurrences of substring in text."""
    positions = []
    start = 0
    while True:
        pos = text.find(substring, start)
        if pos == -1:
            break
        positions.append(pos)
        start = pos + 1
    return positions

print(f"\nAll positions of 'Python': {find_all(text, 'Python')}")
print(f"All positions of 'is': {find_all(text, 'is')}")

## 3.3.3 Test Methods

In [None]:
# String test methods
print("String Test Methods")
print("="*60)

# Character type tests
test_strings = ['Hello', '12345', 'Hello123', '  \t\n', 'UPPER', 'lower']

print("Character type tests:")
for s in test_strings:
    print(f"\n'{s}':")
    print(f"  isalpha(): {s.isalpha()}")
    print(f"  isdigit(): {s.isdigit()}")
    print(f"  isalnum(): {s.isalnum()}")
    print(f"  isspace(): {s.isspace()}")
    print(f"  isupper(): {s.isupper()}")
    print(f"  islower(): {s.islower()}")

# Additional test methods
print("\nAdditional tests:")
print(f"'Hello World'.istitle(): {'Hello World'.istitle()}")
print(f"'valid_identifier'.isidentifier(): {'valid_identifier'.isidentifier()}")
print(f"'2invalid'.isidentifier(): {'2invalid'.isidentifier()}")
print(f"'Hello'.isprintable(): {'Hello'.isprintable()}")
print(f"'Hello\\n'.isprintable(): {'Hello\n'.isprintable()}")
print(f"'Hello'.isascii(): {'Hello'.isascii()}")
print(f"'Hello 世界'.isascii(): {'Hello 世界'.isascii()}")

# startswith() and endswith()
filename = "document.pdf"
url = "https://example.com"

print("\nPrefix/Suffix tests:")
print(f"'{filename}'.endswith('.pdf'): {filename.endswith('.pdf')}")
print(f"'{filename}'.endswith(('.txt', '.pdf', '.doc')): {filename.endswith(('.txt', '.pdf', '.doc'))}")
print(f"'{url}'.startswith('https://'): {url.startswith('https://')}")
print(f"'{url}'.startswith(('http://', 'https://')): {url.startswith(('http://', 'https://'))}")

## 3.3.4 Modification Methods

In [None]:
# String modification methods
print("String Modification Methods")
print("="*60)

# strip(), lstrip(), rstrip()
text = "   Python Programming   "
print("Strip methods:")
print(f"Original: '{text}'")
print(f"strip(): '{text.strip()}'")
print(f"lstrip(): '{text.lstrip()}'")
print(f"rstrip(): '{text.rstrip()}'")

# Strip specific characters
text2 = "***Python***"
print(f"\nStrip '*' from '{text2}': '{text2.strip('*')}'")

# replace() method
text = "apple apple apple orange apple"
print("\nreplace() method:")
print(f"Original: '{text}'")
print(f"Replace all 'apple' with 'mango': '{text.replace('apple', 'mango')}'")
print(f"Replace first 2 'apple': '{text.replace('apple', 'mango', 2)}'")

# split() and join()
text = "Python,Java,JavaScript,C++"
print("\nsplit() method:")
languages = text.split(',')
print(f"Split by comma: {languages}")

print("\njoin() method:")
print(f"Join with ' | ': '{' | '.join(languages)}'")
print(f"Join with newline:")
print('\n'.join(languages))

# translate() and maketrans()
text = "Hello World"
trans_table = str.maketrans('aeiou', '12345')
print(f"\nTranslate vowels to numbers:")
print(f"Original: '{text}'")
print(f"Translated: '{text.translate(trans_table)}'")

# Remove characters with translate
remove_table = str.maketrans('', '', 'aeiou')
print(f"Remove vowels: '{text.translate(remove_table)}'")

# 3.4 String Formatting

Python offers multiple ways to format strings, from old-style % formatting to modern f-strings.

## 3.4.1 F-strings (Formatted String Literals) - Recommended

In [None]:
# F-strings - The modern way (Python 3.6+)
print("F-strings (Formatted String Literals)")
print("="*60)

# Basic f-string
name = "Alice"
age = 30
print(f"Hello, {name}! You are {age} years old.")

# Expressions in f-strings
print("\nExpressions in f-strings:")
print(f"2 + 2 = {2 + 2}")
print(f"Next year you'll be {age + 1}")
print(f"Your name in uppercase: {name.upper()}")
print(f"Name has {len(name)} characters")

# Format specifiers
pi = 3.14159265359
large_num = 1234567
print("\nFormat specifiers:")
print(f"Pi to 2 decimals: {pi:.2f}")
print(f"Pi to 4 decimals: {pi:.4f}")
print(f"Scientific notation: {pi:e}")
print(f"Percentage: {0.255:.1%}")
print(f"With commas: {large_num:,}")
print(f"Binary: {42:b}")
print(f"Hexadecimal: {255:x}")

# Alignment and padding
text = "Python"
print("\nAlignment and padding:")
print(f"Left aligned: '{text:<15}'")
print(f"Right aligned: '{text:>15}'")
print(f"Centered: '{text:^15}'")
print(f"Filled: '{text:*^15}'")
print(f"Zero padding: '{42:05d}'")

# Debug format (Python 3.8+)
x = 10
y = 20
print("\nDebug format (shows variable name):")
print(f"{x=}, {y=}")
print(f"{x + y=}")
print(f"{name.upper()=}")

# Nested formatting
width = 10
precision = 2
value = pi
print(f"\nNested formatting: {value:{width}.{precision}f}")

## 3.4.2 str.format() Method

In [None]:
# str.format() method
print("str.format() Method")
print("="*60)

# Basic format
print("Hello, {}! You are {} years old.".format("Bob", 25))

# Positional arguments
print("\nPositional arguments:")
print("{0} is {1} years old. {0} lives in {2}.".format("Charlie", 35, "NYC"))
print("{2} {1} {0}".format("World", "Hello", "Say"))

# Keyword arguments
print("\nKeyword arguments:")
print("{name} is {age} years old".format(name="David", age=40))

# Format specification
pi = 3.14159
print("\nFormat specification:")
print("Pi: {:.2f}".format(pi))
print("Binary: {:b}".format(42))
print("Percentage: {:.1%}".format(0.255))

# Accessing attributes
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

p = Point(3, 4)
print("\nAttribute access:")
print("Point: ({0.x}, {0.y})".format(p))

# Dictionary access
person = {'name': 'Eve', 'age': 28}
print("Person: {0[name]} is {0[age]} years old".format(person))

## 3.4.3 Template Strings

In [None]:
# Template strings for safer user input
from string import Template

print("Template Strings")
print("="*60)

# Basic template
t = Template("Hello, $name! You are $age years old.")
result = t.substitute(name="Frank", age=45)
print(result)

# Braces for disambiguation
t = Template("${noun}ing is important")
print(t.substitute(noun="Test"))

# Safe substitution (doesn't raise on missing keys)
t = Template("$name is $age years old and lives in $city")
result = t.safe_substitute(name="Grace", age=50)  # city is missing
print(f"\nSafe substitution: {result}")

# Custom delimiter
class MyTemplate(Template):
    delimiter = '#'

t = MyTemplate("Hello, #name!")
print(f"Custom delimiter: {t.substitute(name='Henry')}")

# Why use templates?
print("\nUse Template strings when:")
print("• Processing user-provided format strings")
print("• Security is a concern")
print("• Simple substitution is sufficient")

## 3.4.4 Performance Comparison

In [None]:
# Performance comparison of formatting methods
import timeit

name = "Alice"
age = 30

def percent_format():
    return "%s is %d years old" % (name, age)

def str_format():
    return "{} is {} years old".format(name, age)

def f_string():
    return f"{name} is {age} years old"

def template_format():
    t = Template("$name is $age years old")
    return t.substitute(name=name, age=age)

print("Performance Comparison (100,000 iterations)")
print("="*60)

methods = [
    ("% formatting", percent_format),
    ("str.format()", str_format),
    ("f-string", f_string),
    ("Template", template_format)
]

for name, func in methods:
    time = timeit.timeit(func, number=100000)
    print(f"{name:15}: {time:.4f}s")

print("\nRecommendation:")
print("• Use f-strings for Python 3.6+ (fastest and most readable)")
print("• Use str.format() for Python 2/3 compatibility")
print("• Use Template for user-provided format strings")
print("• Avoid % formatting (legacy)")

# 3.5 Unicode and Encoding

Understanding Unicode and character encoding is essential for handling international text and binary data.

In [None]:
# Unicode and encoding basics
print("Unicode and Encoding")
print("="*60)

# Unicode text
text = "Hello 世界 🐍"
print(f"Text: {text}")
print(f"Type: {type(text)}")
print(f"Length: {len(text)} characters")

# Character code points
print("\nCharacter code points:")
for char in text:
    if not char.isspace():
        print(f"'{char}': U+{ord(char):04X} (decimal: {ord(char)})")

# Encoding to bytes
print("\nEncoding to bytes:")
utf8_bytes = text.encode('utf-8')
utf16_bytes = text.encode('utf-16')
print(f"UTF-8: {utf8_bytes}")
print(f"UTF-8 length: {len(utf8_bytes)} bytes")
print(f"UTF-16: {utf16_bytes}")
print(f"UTF-16 length: {len(utf16_bytes)} bytes")

# Decoding bytes to string
print("\nDecoding from bytes:")
decoded = utf8_bytes.decode('utf-8')
print(f"Decoded: {decoded}")

# Handling encoding errors
text_with_emoji = "Python 🐍"
print("\nError handling strategies:")

# Different error handling
try:
    ascii_bytes = text_with_emoji.encode('ascii')
except UnicodeEncodeError:
    print("strict: UnicodeEncodeError (default)")

print(f"ignore: {text_with_emoji.encode('ascii', errors='ignore')}")
print(f"replace: {text_with_emoji.encode('ascii', errors='replace')}")
print(f"xmlcharrefreplace: {text_with_emoji.encode('ascii', errors='xmlcharrefreplace')}")
print(f"backslashreplace: {text_with_emoji.encode('ascii', errors='backslashreplace')}")

# 3.6 Regular Expressions

Regular expressions provide powerful pattern matching capabilities for text processing.

In [None]:
# Regular expressions basics
import re

print("Regular Expressions Basics")
print("="*60)

# Simple pattern matching
text = "The phone number is 123-456-7890 and email is user@example.com"

# Phone number pattern
phone_pattern = r"\d{3}-\d{3}-\d{4}"
phone_match = re.search(phone_pattern, text)
if phone_match:
    print(f"Found phone: {phone_match.group()}")

# Email pattern
email_pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
email_match = re.search(email_pattern, text)
if email_match:
    print(f"Found email: {email_match.group()}")

# Common regex operations
print("\nCommon regex operations:")

# findall - find all matches
text = "cat bat rat mat"
pattern = r".at"
matches = re.findall(pattern, text)
print(f"findall('.at'): {matches}")

# split - split by pattern
text = "apple,banana;orange|grape"
pattern = r"[,;|]"
parts = re.split(pattern, text)
print(f"split by delimiters: {parts}")

# sub - substitute pattern
text = "The year is 2024"
pattern = r"\d{4}"
result = re.sub(pattern, "YYYY", text)
print(f"substitute year: {result}")

# Groups and capturing
print("\nGroups and capturing:")
text = "John Smith (age: 30)"
pattern = r"(\w+) (\w+) \(age: (\d+)\)"
match = re.match(pattern, text)
if match:
    print(f"Full match: {match.group(0)}")
    print(f"First name: {match.group(1)}")
    print(f"Last name: {match.group(2)}")
    print(f"Age: {match.group(3)}")
    print(f"All groups: {match.groups()}")

In [None]:
# Advanced regex patterns
print("Advanced Regular Expressions")
print("="*60)

# Compile patterns for reuse
email_regex = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

# Validate emails
emails = [
    "valid@example.com",
    "also.valid+tag@domain.co.uk",
    "invalid@",
    "@invalid.com",
    "no-at-sign.com"
]

print("Email validation:")
for email in emails:
    is_valid = bool(email_regex.match(email))
    print(f"  {email:30} : {'Valid' if is_valid else 'Invalid'}")

# Named groups
print("\nNamed groups:")
log_pattern = r'(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) (?P<level>\w+): (?P<message>.+)'
log_line = "2024-01-15 14:30:45 ERROR: Database connection failed"

match = re.match(log_pattern, log_line)
if match:
    print(f"Date: {match.group('date')}")
    print(f"Time: {match.group('time')}")
    print(f"Level: {match.group('level')}")
    print(f"Message: {match.group('message')}")

# Lookahead and lookbehind
print("\nLookahead/Lookbehind:")
text = "price: $100, discount: $20, total: $80"

# Positive lookahead - find numbers preceded by $
pattern = r'(?<=\$)\d+'
prices = re.findall(pattern, text)
print(f"Prices (numbers after $): {prices}")

# Common regex patterns
print("\nCommon patterns library:")
patterns = {
    'URL': r'https?://(?:www\.)?[\w.-]+\.[a-zA-Z]{2,}(?:/[\w.-]*)*',
    'IPv4': r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b',
    'Date': r'\d{4}-\d{2}-\d{2}',
    'Time': r'\d{2}:\d{2}:\d{2}',
    'Hex Color': r'#[0-9A-Fa-f]{6}'
}

test_text = "Visit https://www.example.com at 192.168.1.1 on 2024-01-15 at 14:30:00. Color: #FF5733"

for name, pattern in patterns.items():
    matches = re.findall(pattern, test_text)
    if matches:
        print(f"{name}: {matches}")

# Module 3 Summary

## 🎯 Key Takeaways

You've completed Module 3: Strings and Text Processing! Here's what you've learned:

### String Basics
✅ Multiple literal formats for different needs  
✅ Raw strings for regex and paths  
✅ Unicode support by default  
✅ Immutability and its implications  
✅ String interning for memory optimization  

### String Operations
✅ Powerful indexing and slicing  
✅ Rich set of built-in methods  
✅ Efficient string building with join()  
✅ Pattern matching with regex  

### String Formatting
✅ F-strings for modern Python (fastest)  
✅ str.format() for compatibility  
✅ Template strings for user input  
✅ Performance considerations  

### Unicode and Encoding
✅ UTF-8 as standard encoding  
✅ Proper error handling  
✅ Bytes vs strings distinction  

## 🚀 Next Steps

With mastery of strings, you're ready for:
- **Module 4**: Data Structures (Lists, Tuples, Sets, Dictionaries)
- **Module 5**: Control Flow
- **Module 6**: Functions

## 💡 Best Practices

1. **Use f-strings**: Most readable and performant for Python 3.6+
2. **Raw strings for regex**: Avoid escaping nightmares
3. **Join for concatenation**: More efficient than repeated +
4. **Encode/decode explicitly**: Handle encoding at boundaries
5. **Compile regex patterns**: Reuse for better performance

## 📝 Practice Exercises

Try these exercises to reinforce your learning:

1. Build a text analyzer that counts words, sentences, and characters
2. Create a password validator with regex
3. Implement a simple markdown to HTML converter
4. Build a log file parser
5. Create a text-based game with string formatting

---

**Congratulations on completing Module 3!** 🎉

You now have comprehensive knowledge of Python's string handling capabilities. This foundation is essential for text processing, data manipulation, and building user interfaces.