# üêç Complete Python String Operations Guide

This comprehensive notebook covers all essential Python string operations with practical examples. Perfect for learning, reference, and real-world applications.

**Topics Covered:**
- String creation and formatting
- Indexing, slicing, and manipulation
- Case conversion and normalization
- Searching, splitting, and joining
- Character classification and validation
- Encoding and decoding
- Regular expressions
- Performance optimization
- Best practices

**Python Version:** 3.9+  
**Date:** November 21, 2025

## 1. Imports and Sample Data

Only a handful of standard library modules are required for the demonstrations. We also set up reusable sample strings to keep examples focused.

In [1]:
import re
import textwrap
import unicodedata
from string import Template

# Sample data for demonstrations
product_description = """Pro-sound Wireless Headphones
Model: WH-200
Features: Bluetooth 5.3, Active Noise Cancellation, 32hrs battery"""

support_email = "  Support@Example.COM  "
invoice_template = Template("""Invoice for ${customer}
Amount Due: $${amount}
Due Date: ${due_date}""")

log_lines = [
    "2025-11-21 09:15:24,123 INFO user_id=42 action=login status=success",
    "2025-11-21 09:16:02,532 WARNING user_id=42 action=checkout status=retry",
    "2025-11-21 09:16:45,889 ERROR user_id=42 action=payment status=failure",
]

print("‚úì Sample data loaded successfully")

‚úì Sample data loaded successfully


## 3. Length, Indexing, and Slicing

Indexing retrieves individual characters while slicing extracts substrings. Negative indices count from the end.

In [2]:
single = 'Hello'
double = "World"
triple = """Line1
Line2
Line3"""
raw_path = r"C:\new_folder\data.txt"

# F-strings (formatted string literals)
name = "Python"
version = 3.11
f_literal = f"Language: {name} {version}"

print(f"Single quotes: {single}")
print(f"Double quotes: {double}")
print(f"Triple quotes lines: {triple.splitlines()}")
print(f"Raw string path: {raw_path}")
print(f"F-string: {f_literal}")

Single quotes: Hello
Double quotes: World
Triple quotes lines: ['Line1', 'Line2', 'Line3']
Raw string path: C:\new_folder\data.txt
F-string: Language: Python 3.11


## 2. Creating Strings

String literals can be declared with single, double, or triple quotes. Raw strings are useful when working with paths or regular expressions.

In [3]:
title = "Wireless"
length = len(title)
first = title[0]
last = title[-1]
first_four = title[:4]
every_second = title[::2]
reversed_title = title[::-1]

print(f"Original: {title}")
print(f"Length: {length}")
print(f"First character: {first}")
print(f"Last character: {last}")
print(f"First four: {first_four}")
print(f"Every second char: {every_second}")
print(f"Reversed: {reversed_title}")

# Slicing with start, stop, step
text = "Python Programming"
print(f"\nSlicing examples on '{text}':")
print(f"[7:18]: {text[7:18]}")  # Programming
print(f"[:6]: {text[:6]}")      # Python
print(f"[7:]: {text[7:]}")      # Programming
print(f"[-11:]: {text[-11:]}")  # Programming
print(f"[::2]: {text[::2]}")    # Pto rgamn

Original: Wireless
Length: 8
First character: W
Last character: s
First four: Wire
Every second char: Wrls
Reversed: sseleriW

Slicing examples on 'Python Programming':
[7:18]: Programming
[:6]: Python
[7:]: Programming
[-11:]: Programming
[::2]: Pto rgamn


## 4. Immutability

Strings are immutable. Methods return new objects rather than modifying the original.

In [4]:
device = "WH-200"
print(f"Original: {device}")

try:
    device[0] = "X"  # This will raise TypeError
except TypeError as exc:
    print(f"Error: {exc}")

# Correct way: create a new string
updated_device = "X" + device[1:]
print(f"Original (unchanged): {device}")
print(f"Updated (new string): {updated_device}")

# Another example
word = "Hello"
# word[0] = 'J'  # Would fail
new_word = 'J' + word[1:]
print(f"\n'{word}' -> '{new_word}'")

Original: WH-200
Error: 'str' object does not support item assignment
Original (unchanged): WH-200
Updated (new string): XH-200

'Hello' -> 'Jello'


## 5. Concatenation and Repetition

Combine smaller pieces with concatenation or repeat substrings to build labels quickly.

In [8]:
customer = "Avery Sky"
amount = 249.99
due_date = "2025-12-01"

# 1. F-strings (modern, recommended)
f_string = f"Invoice for {customer} totals ${amount:.2f} due {due_date}"
print("F-string:", f_string)

# 2. str.format() method
format_method = "Invoice for {} totals ${:.2f} due {}".format(customer, amount, due_date)
print("format():", format_method)

# 3. Template strings (safe for user input)
template_based = invoice_template.substitute(customer=customer, amount=f"{amount:,.2f}", due_date=due_date)
print("Template:", template_based)

# More f-string formatting examples
pi = 3.14159265359
quantity = 42
percentage = 0.856


print(f"\nAdvanced formatting:")
print(f"Pi to 2 decimals: {pi:.2f}")
print(f"Pi to 4 decimals: {pi:.4f}")
print(f"Quantity padded: {quantity:05d}")
print(f"Percentage: {percentage:.1%}")
print(f"Right aligned: {customer:>20}")
print(f"Left aligned: {customer:<20}")
print(f"Centered: {customer:^20}")
print(f"Amount with comma: ${amount:,.2f}")

# Expression in f-strings
items = 5
price = 29.99
print(f"Total: ${items * price:.2f}")

F-string: Invoice for Avery Sky totals $249.99 due 2025-12-01
format(): Invoice for Avery Sky totals $249.99 due 2025-12-01
Template: Invoice for Avery Sky
Amount Due: ${amount}
Due Date: 2025-12-01

Advanced formatting:
Pi to 2 decimals: 3.14
Pi to 4 decimals: 3.1416
Quantity padded: 00042
Percentage: 85.6%
Right aligned:            Avery Sky
Left aligned: Avery Sky           
Centered:      Avery Sky      
Amount with comma: $249.99
Total: $149.95


## 6. Formatting Strings

Python supports several formatting styles: f-strings, `str.format`, and `string.Template` cover most use-cases.

In [9]:
model = "WH-" + "200"
tagline = "Noise Free " * 2
parts = ["Bluetooth", "Noise Cancellation", "32hrs battery"]
feature_line = ", ".join(parts)

print(f"Concatenation: {model}")
print(f"Repetition: {tagline}")
print(f"Join list: {feature_line}")

# More practical examples
separator = "=" * 50
print(f"\n{separator}")
print("Report Header".center(50))
print(separator)

# Building paths
base = "/home/user"
folder = "documents"
filename = "report.pdf"
full_path = base + "/" + folder + "/" + filename
print(f"\nPath: {full_path}")

# Better way with join
full_path_better = "/".join([base, folder, filename])
print(f"Path (using join): {full_path_better}")

Concatenation: WH-200
Repetition: Noise Free Noise Free 
Join list: Bluetooth, Noise Cancellation, 32hrs battery

                  Report Header                   

Path: /home/user/documents/report.pdf
Path (using join): /home/user/documents/report.pdf


## 7. Case Conversion and Normalization

Use casing helpers to standardize text, and `unicodedata` when combining characters need normalization.

In [10]:
email_clean = support_email.strip().lower()
headline = product_description.splitlines()[0].title()
normalized = unicodedata.normalize("NFC", "Cafe\u0301")

print(f"Original email: '{support_email}'")
print(f"Cleaned email: '{email_clean}'")
print(f"Headline (title case): {headline}")
print(f"Normalized: {normalized}")

# More case conversion examples
text = "Python Programming Language"
print(f"\nOriginal: {text}")
print(f"upper(): {text.upper()}")
print(f"lower(): {text.lower()}")
print(f"title(): {text.title()}")
print(f"capitalize(): {text.capitalize()}")
print(f"swapcase(): {text.swapcase()}")

# Case checking
word1 = "HELLO"
word2 = "hello"
word3 = "Hello"
print(f"\n'{word1}'.isupper(): {word1.isupper()}")
print(f"'{word2}'.islower(): {word2.islower()}")
print(f"'{word3}'.istitle(): {word3.istitle()}")

# Practical example: normalizing user input
user_input = "  JoHn.DoE@EmAiL.CoM  "
normalized_email = user_input.strip().lower()
print(f"\nUser input: '{user_input}'")
print(f"Normalized: '{normalized_email}'")

Original email: '  Support@Example.COM  '
Cleaned email: 'support@example.com'
Headline (title case): Pro-Sound Wireless Headphones
Normalized: Caf√©

Original: Python Programming Language
upper(): PYTHON PROGRAMMING LANGUAGE
lower(): python programming language
title(): Python Programming Language
capitalize(): Python programming language
swapcase(): pYTHON pROGRAMMING lANGUAGE

'HELLO'.isupper(): True
'hello'.islower(): True
'Hello'.istitle(): True

User input: '  JoHn.DoE@EmAiL.CoM  '
Normalized: 'john.doe@email.com'


## 8. Stripping and Cleaning

Trim whitespace or specific characters from both ends, then combine with `replace` for simple cleaning pipelines.

In [11]:
dirty = "\t  \nLaunch Offer: 20% OFF!   \n"
stripped = dirty.strip()
left_stripped = dirty.lstrip()
right_stripped = dirty.rstrip()
cleaned = stripped.replace("%", " percent")

print(f"Original: '{dirty}'")
print(f"strip(): '{stripped}'")
print(f"lstrip(): '{left_stripped}'")
print(f"rstrip(): '{right_stripped}'")
print(f"replace('%', ' percent'): '{cleaned}'")

# Stripping specific characters
url = "https://example.com/"
clean_url = url.rstrip("/")
print(f"\nOriginal URL: {url}")
print(f"Without trailing slash: {clean_url}")

# Remove specific characters from both ends
text = "...Hello World!!!"
result = text.strip(".!")
print(f"\nOriginal: '{text}'")
print(f"strip('.!'): '{result}'")

# Practical example: cleaning CSV data
csv_value = "  $1,234.56  "
numeric = csv_value.strip().replace("$", "").replace(",", "")
print(f"\nCSV value: '{csv_value}'")
print(f"Cleaned: '{numeric}' -> {float(numeric)}")

Original: '	  
Launch Offer: 20% OFF!   
'
strip(): 'Launch Offer: 20% OFF!'
lstrip(): 'Launch Offer: 20% OFF!   
'
rstrip(): '	  
Launch Offer: 20% OFF!'
replace('%', ' percent'): 'Launch Offer: 20 percent OFF!'

Original URL: https://example.com/
Without trailing slash: https://example.com

Original: '...Hello World!!!'
strip('.!'): 'Hello World'

CSV value: '  $1,234.56  '
Cleaned: '1234.56' -> 1234.56


## 9. Searching and Counting

Use `in`, `find`, `count`, and `startswith` / `endswith` to inspect text without reaching for regular expressions.

In [12]:
description = product_description.lower()
bluetooth_index = description.find("bluetooth")
has_noise = "noise" in description
count_hours = description.count("hrs")
starts_with_model = product_description.startswith("Pro-sound")
ends_with_hours = product_description.endswith("battery")

print(f"Description: {product_description[:50]}...")
print(f"'bluetooth' found at index: {bluetooth_index}")
print(f"Contains 'noise': {has_noise}")
print(f"Count of 'hrs': {count_hours}")
print(f"Starts with 'Pro-sound': {starts_with_model}")
print(f"Ends with 'battery': {ends_with_hours}")

# More search examples
text = "Python is awesome. Python is powerful."
print(f"\nText: {text}")
print(f"First 'Python' at: {text.find('Python')}")
print(f"Last 'Python' at: {text.rfind('Python')}")
print(f"'Python' appears {text.count('Python')} times")

# find vs index
print(f"\nfind('Java'): {text.find('Java')}")  # Returns -1 if not found
try:
    text.index('Java')  # Raises ValueError if not found
except ValueError:
    print("index('Java'): Raised ValueError - not found")

# startswith and endswith with tuples
filename = "report.pdf"
print(f"\nFilename: {filename}")
print(f"Is document: {filename.endswith(('.pdf', '.doc', '.docx'))}")
print(f"Is temp file: {filename.startswith(('temp_', 'tmp_', '~'))}")

# Case-insensitive search
text2 = "Hello World"
search = "hello"
print(f"\nCase-insensitive search for '{search}' in '{text2}':")
print(f"Found: {search.lower() in text2.lower()}")

Description: Pro-sound Wireless Headphones
Model: WH-200
Featur...
'bluetooth' found at index: 54
Contains 'noise': True
Count of 'hrs': 1
Starts with 'Pro-sound': True
Ends with 'battery': True

Text: Python is awesome. Python is powerful.
First 'Python' at: 0
Last 'Python' at: 19
'Python' appears 2 times

find('Java'): -1
index('Java'): Raised ValueError - not found

Filename: report.pdf
Is document: True
Is temp file: False

Case-insensitive search for 'hello' in 'Hello World':
Found: True


## 10. Replacing and Translation

`str.replace` handles direct substitutions, while `str.translate` lets you map multiple characters efficiently.

In [None]:
text = "SKU: wh-200"
upper_replaced = text.replace("wh", "WH")
table = str.maketrans({"-": "_", " ": ""})
translated = text.translate(table)

print(f"Original: {text}")
print(f"replace('wh', 'WH'): {upper_replaced}")
print(f"translate (- to _, remove spaces): {translated}")

# Replace with limit
sentence = "the cat and the dog and the bird"
print(f"\nOriginal: {sentence}")
print(f"Replace all 'the': {sentence.replace('the', 'a')}")
print(f"Replace first 2 'the': {sentence.replace('the', 'a', 2)}")

# Remove characters with replace
phone = "(555) 123-4567"
clean_phone = phone.replace("(", "").replace(")", "").replace(" ", "").replace("-", "")
print(f"\nOriginal phone: {phone}")
print(f"Cleaned: {clean_phone}")

# Using translate for efficiency
# Remove all vowels
text2 = "Hello World"
vowels = "aeiouAEIOU"
remove_vowels = str.maketrans("", "", vowels)
no_vowels = text2.translate(remove_vowels)
print(f"\nOriginal: {text2}")
print(f"Without vowels: {no_vowels}")

# Character mapping with translate
leetspeak = str.maketrans("aeilost", "4311057")
message = "leetspeak"
leet = message.translate(leetspeak)
print(f"\nOriginal: {message}")
print(f"Leetspeak: {leet}")

# Practical: sanitize filename
filename = "My Report: 2025 (Final).txt"
safe_chars = str.maketrans({":": "-", " ": "_", "(": "", ")": ""})
safe_filename = filename.translate(safe_chars)
print(f"\nUnsafe filename: {filename}")
print(f"Safe filename: {safe_filename}")

## 11. Splitting and Joining

Transform strings into lists (or vice versa) with `split`, `splitlines`, `partition`, and `join`.

In [None]:
lines = product_description.splitlines()
features = lines[2].replace("Features: ", "").split(", ")
pipe_separated = " | ".join(features)
path = "products/2025/wearables"
head, sep, tail = path.partition("/2025/")

print(f"Splitlines: {lines}")
print(f"Features list: {features}")
print(f"Pipe separated: {pipe_separated}")
print(f"Partition result: head='{head}', sep='{sep}', tail='{tail}'")

# More split examples
csv_line = "Alice,30,Engineer,New York"
fields = csv_line.split(",")
print(f"\nCSV: {csv_line}")
print(f"Fields: {fields}")

# Split with maxsplit
url = "https://example.com/api/v1/users/123"
protocol, rest = url.split("://", 1)
print(f"\nURL: {url}")
print(f"Protocol: {protocol}, Rest: {rest}")

# rsplit (split from right)
filename = "archive.tar.gz"
name, ext = filename.rsplit(".", 1)
print(f"\nFilename: {filename}")
print(f"Name: {name}, Extension: {ext}")

# splitlines with different line endings
multiline = "Line1\nLine2\r\nLine3\rLine4"
print(f"\nSplitlines: {multiline.splitlines()}")
print(f"Splitlines (keep ends): {multiline.splitlines(keepends=True)}")

# join examples
words = ["Python", "is", "awesome"]
sentence = " ".join(words)
print(f"\nWords: {words}")
print(f"Sentence: {sentence}")

# join with different separator
hyphenated = "-".join(words)
print(f"Hyphenated: {hyphenated}")

# join path components
path_parts = ["home", "user", "documents", "file.txt"]
file_path = "/".join(path_parts)
print(f"\nPath: {file_path}")

# partition and rpartition
email = "user@example.com"
username, at, domain = email.partition("@")
print(f"\nEmail: {email}")
print(f"Username: {username}, Domain: {domain}")

full_domain = "mail.example.com"
subdomain, dot, base = full_domain.rpartition(".")
print(f"\nDomain: {full_domain}")
print(f"Base: {base}, Subdomain: {subdomain}")

## 12. Alignment and Padding

`ljust`, `rjust`, `center`, and `zfill` help format tabular text without external libraries.

In [None]:
label = "Total"
left = label.ljust(10, '.')
right = label.rjust(10)
centered = label.center(10, '-')
padded_number = "42".zfill(6)

print(f"Original: '{label}'")
print(f"ljust(10, '.'): '{left}'")
print(f"rjust(10): '{right}'")
print(f"center(10, '-'): '{centered}'")
print(f"'42'.zfill(6): '{padded_number}'")

# Practical example: formatted receipt
print("\n" + "=" * 40)
print("RECEIPT".center(40))
print("=" * 40)
items = [("Coffee", 3.50), ("Sandwich", 7.25), ("Cookie", 2.00)]
for item, price in items:
    print(f"{item.ljust(30, '.')} ${price:6.2f}")
print("-" * 40)
total = sum(price for _, price in items)
print(f"{'TOTAL'.ljust(30, '.')} ${total:6.2f}")
print("=" * 40)

# Zero-fill for IDs
order_id = 127
invoice_no = str(order_id).zfill(8)
print(f"\nOrder ID: {order_id}")
print(f"Invoice Number: INV-{invoice_no}")

# Aligning columns
print("\nEmployee Report:")
print(f"{'Name':<15} {'Department':<15} {'Salary':>10}")
print("-" * 40)
employees = [
    ("Alice", "Engineering", 95000),
    ("Bob", "Marketing", 72000),
    ("Charlie", "Sales", 68000),
]
for name, dept, salary in employees:
    print(f"{name:<15} {dept:<15} ${salary:>9,}")

## 13. Character Classification

`str.is*` helpers make validation rules easy to express.

In [None]:
serial = "WH200"
price = "249.99"
promo = "Save20"
username = "john_doe"
spaces = "   "

print(f"'{serial}'.isalnum(): {serial.isalnum()}")
print(f"'{price}'.isdigit(): {price.isdigit()}")
print(f"'{price}'.replace('.', '', 1).isdigit(): {price.replace('.', '', 1).isdigit()}")
print(f"'{promo}'.isalpha(): {promo.isalpha()}")
print(f"'{promo}'.islower(): {promo.islower()}")

# More classification methods
print(f"\n'{username}'.isalnum(): {username.isalnum()}")  # False (has underscore)
print(f"'{username}'.isidentifier(): {username.isidentifier()}")  # True
print(f"'{spaces}'.isspace(): {spaces.isspace()}")

# Checking different types
text1 = "HELLO"
text2 = "hello"
text3 = "Hello World"
text4 = "123"
text5 = "\t\n"

print(f"\n'{text1}'.isupper(): {text1.isupper()}")
print(f"'{text2}'.islower(): {text2.islower()}")
print(f"'{text3}'.istitle(): {text3.istitle()}")
print(f"'{text4}'.isnumeric(): {text4.isnumeric()}")
print(f"'{text5}'.isspace(): {text5.isspace()}")

# isdecimal, isdigit, isnumeric differences
num1 = "123"
num2 = "¬Ω"
num3 = "¬≤"

print(f"\n'{num1}' -> decimal: {num1.isdecimal()}, digit: {num1.isdigit()}, numeric: {num1.isnumeric()}")
print(f"'{num2}' -> decimal: {num2.isdecimal()}, digit: {num2.isdigit()}, numeric: {num2.isnumeric()}")
print(f"'{num3}' -> decimal: {num3.isdecimal()}, digit: {num3.isdigit()}, numeric: {num3.isnumeric()}")

# Practical validation
def validate_username(name):
    if not name:
        return False, "Username cannot be empty"
    if not name[0].isalpha():
        return False, "Must start with a letter"
    if not name.replace("_", "").isalnum():
        return False, "Can only contain letters, numbers, and underscores"
    return True, "Valid username"

test_names = ["john_doe", "123user", "user@name", "Valid_User_1"]
print("\nUsername validation:")
for name in test_names:
    valid, msg = validate_username(name)
    print(f"  '{name}': {msg}")

## 14. Encoding and Decoding

Convert between `str` and `bytes` and handle errors explicitly.

In [None]:
message = "Noise cancellation"
encoded = message.encode("utf-8")
decoded = encoded.decode("utf-8")
ascii_safe = message.encode("ascii", errors="ignore")

print(f"Original: {message}")
print(f"Encoded (UTF-8): {encoded}")
print(f"Decoded: {decoded}")
print(f"ASCII safe: {ascii_safe}")

# Different encodings
text = "Caf√©"
print(f"\nOriginal text: {text}")
print(f"UTF-8: {text.encode('utf-8')}")
print(f"Latin-1: {text.encode('latin-1')}")
print(f"ASCII (ignore): {text.encode('ascii', errors='ignore')}")
print(f"ASCII (replace): {text.encode('ascii', errors='replace')}")

# Unicode handling
emoji_text = "Python üêç is awesome!"
print(f"\nText with emoji: {emoji_text}")
print(f"Encoded: {emoji_text.encode('utf-8')}")
print(f"Length in chars: {len(emoji_text)}")
print(f"Length in bytes: {len(emoji_text.encode('utf-8'))}")

# Different error handlers
special = "Hello ¬© World"
print(f"\nOriginal: {special}")
try:
    special.encode('ascii')  # Will fail
except UnicodeEncodeError as e:
    print(f"Strict mode error: {e}")

print(f"Ignore errors: {special.encode('ascii', errors='ignore')}")
print(f"Replace errors: {special.encode('ascii', errors='replace')}")
print(f"XML char refs: {special.encode('ascii', errors='xmlcharrefreplace')}")

# Decoding with error handling
bad_bytes = b"Hello \xff World"
print(f"\nBytes: {bad_bytes}")
try:
    bad_bytes.decode('utf-8')
except UnicodeDecodeError as e:
    print(f"Strict decode error: {e}")
    
print(f"Ignore: {bad_bytes.decode('utf-8', errors='ignore')}")
print(f"Replace: {bad_bytes.decode('utf-8', errors='replace')}")

## 15. Regular Expression Helpers

Regular expressions shine when structured text needs to be parsed into fields.

In [None]:
pattern = re.compile(r"(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),\d+ (?P<level>\w+) user_id=(?P<user_id>\d+) action=(?P<action>\w+) status=(?P<status>\w+)")
parsed_logs = [pattern.search(line).groupdict() for line in log_lines]

print("Parsed log entries:")
for log in parsed_logs:
    print(log)

# More regex examples
# Email validation
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
emails = ["user@example.com", "invalid.email", "test@domain.co.uk"]
print("\nEmail validation:")
for email in emails:
    print(f"  {email}: {bool(email_pattern.match(email))}")

# Extract phone numbers
text = "Contact: (555) 123-4567 or 555-987-6543"
phone_pattern = re.compile(r'\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}')
phones = phone_pattern.findall(text)
print(f"\nPhone numbers found: {phones}")

# Find all words
sentence = "Python 3.11 is awesome! It's really great."
words = re.findall(r'\b\w+\b', sentence)
print(f"\nWords in sentence: {words}")

# Search and replace
text2 = "The price is $50 and the tax is $5."
updated = re.sub(r'\$(\d+)', r'USD \1.00', text2)
print(f"\nOriginal: {text2}")
print(f"Updated: {updated}")

# Split on multiple delimiters
data = "apple,banana;cherry:date|elderberry"
fruits = re.split(r'[,;:|]', data)
print(f"\nFruits: {fruits}")

# Extract dates
text3 = "Events on 2025-11-21 and 2025-12-25"
dates = re.findall(r'\d{4}-\d{2}-\d{2}', text3)
print(f"\nDates found: {dates}")

# Capture groups
log_entry = "2025-11-21 ERROR: Database connection failed"
match = re.search(r'(\d{4}-\d{2}-\d{2}) (\w+): (.+)', log_entry)
if match:
    date, level, message = match.groups()
    print(f"\nParsed log:")
    print(f"  Date: {date}")
    print(f"  Level: {level}")
    print(f"  Message: {message}")

## 16. Utility Function Example

Combine string helpers inside small utilities to standardize data consistently.

In [None]:
def normalize_email(value: str) -> str:
    """Normalize email address to lowercase."""
    local, domain = value.strip().split('@')
    return f"{local.lower()}@{domain.lower()}"

result = normalize_email(support_email)
print(f"Normalized email: {result}")

# More utility functions
def slugify(text: str) -> str:
    """Convert text to URL-friendly slug."""
    import re
    text = text.lower().strip()
    text = re.sub(r'[^\w\s-]', '', text)
    text = re.sub(r'[\s_-]+', '-', text)
    text = re.sub(r'^-+|-+$', '', text)
    return text

titles = ["Hello World!", "Python 3.11: New Features", "Data Science & AI"]
print("\nSlugify examples:")
for title in titles:
    print(f"  '{title}' -> '{slugify(title)}'")

def extract_initials(name: str) -> str:
    """Extract initials from a name."""
    parts = name.strip().split()
    return ''.join(p[0].upper() for p in parts if p)

names = ["John Doe", "Mary Jane Watson", "bob"]
print("\nExtract initials:")
for name in names:
    print(f"  '{name}' -> '{extract_initials(name)}'")

def mask_credit_card(card: str) -> str:
    """Mask credit card number, showing only last 4 digits."""
    digits = card.replace(" ", "").replace("-", "")
    return f"****-****-****-{digits[-4:]}"

cards = ["1234567890123456", "1234-5678-9012-3456"]
print("\nMask credit cards:")
for card in cards:
    print(f"  {card} -> {mask_credit_card(card)}")

def parse_csv_line(line: str, delimiter: str = ',') -> list:
    """Parse CSV line handling quoted values."""
    import csv
    import io
    reader = csv.reader(io.StringIO(line), delimiter=delimiter)
    return next(reader)

csv_lines = [
    'Alice,30,Engineer',
    '"Smith, John",25,"Developer, Senior"'
]
print("\nParse CSV lines:")
for line in csv_lines:
    print(f"  {parse_csv_line(line)}")

## 17. Text Wrapping

`textwrap` keeps console output readable without manual line breaks.

In [None]:
wrapped = textwrap.fill(product_description, width=40)
print("Wrapped text (width=40):")
print(wrapped)

# More textwrap examples
long_text = "Python is a high-level, interpreted programming language known for its simplicity and readability. It supports multiple programming paradigms and has a comprehensive standard library."

print("\nWrapped to 60 characters:")
print(textwrap.fill(long_text, width=60))

# Dedent: remove common leading whitespace
indented = """
    def hello():
        print("Hello")
        print("World")
"""
dedented = textwrap.dedent(indented)
print("\nOriginal (indented):")
print(repr(indented))
print("\nDedented:")
print(repr(dedented))

# Indent: add prefix to lines
text_to_indent = "Line 1\nLine 2\nLine 3"
indented_result = textwrap.indent(text_to_indent, "    ")
print("\nIndented with 4 spaces:")
print(indented_result)

# Custom predicate for indent
code = "import os\nprint('hello')\n# comment"
indented_code = textwrap.indent(code, ">>> ", predicate=lambda line: not line.startswith("#"))
print("\nCode with >>> prefix (except comments):")
print(indented_code)

# Shorten text
long_desc = "This is a very long description that needs to be shortened to fit in a limited space like a preview or tooltip."
shortened = textwrap.shorten(long_desc, width=50, placeholder="...")
print(f"\nOriginal: {long_desc}")
print(f"Shortened (50 chars): {shortened}")

# Wrap with subsequent indent
email_body = "Hello, this is a long email message that will be wrapped with a hanging indent for better readability."
wrapped_email = textwrap.fill(email_body, width=40, initial_indent="From: ", subsequent_indent="      ")
print("\nEmail with hanging indent:")
print(wrapped_email)

## 19. Additional String Methods

Explore other useful string methods like `removeprefix`, `removesuffix`, `expandtabs`, and more.

## 22. Summary and Quick Reference

Quick reference guide for Python string operations.

In [None]:
# removeprefix and removesuffix (Python 3.9+)
filename = "report_2025.pdf"
without_ext = filename.removesuffix(".pdf")
print(f"Original: {filename}")
print(f"Without extension: {without_ext}")

url = "https://example.com"
without_protocol = url.removeprefix("https://")
print(f"\nURL: {url}")
print(f"Without protocol: {without_protocol}")

# expandtabs
tabbed = "Name\tAge\tCity"
expanded = tabbed.expandtabs(15)
print(f"\nTabbed: {repr(tabbed)}")
print(f"Expanded: {expanded}")

# casefold - more aggressive lowercase
german = "√ü"
print(f"\nGerman √ü:")
print(f"  lower(): {german.lower()}")
print(f"  casefold(): {german.casefold()}")

# ASCII check
text1 = "Hello"
text2 = "Caf√©"
print(f"\n'{text1}'.isascii(): {text1.isascii()}")
print(f"'{text2}'.isascii(): {text2.isascii()}")

# format_map
data = {"name": "Alice", "age": 30}
template = "Name: {name}, Age: {age}"
result = template.format_map(data)
print(f"\nformat_map: {result}")

# Practical examples
def clean_extension(filename: str, ext: str) -> str:
    """Remove extension if present."""
    return filename.removesuffix(ext)

print("\nClean extensions:")
files = ["image.jpg", "document.pdf", "script.py.txt"]
for f in files:
    print(f"  {f} -> {clean_extension(f, '.jpg')}")

# Case-insensitive comparison using casefold
def case_insensitive_equal(s1: str, s2: str) -> bool:
    return s1.casefold() == s2.casefold()

pairs = [("Python", "python"), ("Stra√üe", "STRASSE")]
print("\nCase-insensitive comparison:")
for a, b in pairs:
    print(f"  '{a}' == '{b}': {case_insensitive_equal(a, b)}")

## 20. String Comparison and Sorting

Understanding how Python compares strings and sorts them.

In [None]:
# Lexicographic comparison
print("Comparison operators:")
print(f"'apple' < 'banana': {'apple' < 'banana'}")
print(f"'Apple' < 'apple': {'Apple' < 'apple'}")  # uppercase comes first
print(f"'10' < '2': {'10' < '2'}")  # lexicographic, not numeric

# Sorting strings
fruits = ["banana", "Apple", "cherry", "date"]
print(f"\nOriginal: {fruits}")
print(f"Sorted (default): {sorted(fruits)}")
print(f"Sorted (case-insensitive): {sorted(fruits, key=str.lower)}")
print(f"Sorted (by length): {sorted(fruits, key=len)}")
print(f"Sorted (reverse): {sorted(fruits, reverse=True)}")

# Natural sorting (with numbers)
files = ["file1.txt", "file10.txt", "file2.txt", "file20.txt"]
print(f"\nFiles: {files}")
print(f"Regular sort: {sorted(files)}")

import re
def natural_sort_key(s):
    return [int(c) if c.isdigit() else c.lower() for c in re.split(r'(\d+)', s)]

print(f"Natural sort: {sorted(files, key=natural_sort_key)}")

# Comparing with operators
s1 = "hello"
s2 = "world"
print(f"\n'{s1}' vs '{s2}':")
print(f"  ==: {s1 == s2}")
print(f"  !=: {s1 != s2}")
print(f"  <: {s1 < s2}")
print(f"  >: {s1 > s2}")

# Case-insensitive comparison
name1 = "Alice"
name2 = "alice"
print(f"\nCase-sensitive: {name1 == name2}")
print(f"Case-insensitive: {name1.lower() == name2.lower()}")

# Locale-aware sorting
words = ["√©l√®ve", "√©cole", "√™tre"]
print(f"\nFrench words: {words}")
print(f"Default sort: {sorted(words)}")
# For proper locale sorting, use locale.strcoll() as key

## 21. Performance Tips and Best Practices

Efficient string operations and common patterns.

In [None]:
import time

# 1. Use join() instead of concatenation in loops
print("Performance comparison: concatenation vs join")

# Bad: concatenation in loop
start = time.perf_counter()
result = ""
for i in range(1000):
    result += str(i)
time_concat = time.perf_counter() - start

# Good: using join
start = time.perf_counter()
result = "".join(str(i) for i in range(1000))
time_join = time.perf_counter() - start

print(f"Concatenation: {time_concat:.6f}s")
print(f"Join: {time_join:.6f}s")
print(f"Join is {time_concat/time_join:.2f}x faster")

# 2. Use f-strings for formatting (they're fast!)
name = "Alice"
age = 30
print(f"\nF-string: {f'Name: {name}, Age: {age}'}")

# 3. Use 'in' for substring checking (faster than find)
text = "Hello World"
print(f"\nUse 'in': {'World' in text}")

# 4. Use str methods instead of regex when possible
# Good for simple cases
email = "user@example.com"
if "@" in email and "." in email.split("@")[1]:
    print("Valid email (simple check)")

# 5. Reuse compiled regex patterns
pattern = re.compile(r'\d+')
numbers = pattern.findall("abc 123 def 456")
print(f"\nReused pattern: {numbers}")

# 6. Use string constants
from string import ascii_lowercase, ascii_uppercase, digits, punctuation
print(f"\nLowercase: {ascii_lowercase}")
print(f"Uppercase: {ascii_uppercase}")
print(f"Digits: {digits}")
print(f"Punctuation: {punctuation[:20]}...")

# 7. String interning for repeated comparisons
# Python automatically interns short strings
s1 = "hello"
s2 = "hello"
print(f"\nIntering: s1 is s2: {s1 is s2}")

# 8. Use appropriate methods
# Bad: multiple replace calls
text = "a b c d"
result = text.replace("a", "").replace("b", "").replace("c", "")

# Better: use translate
trans = str.maketrans("", "", "abc")
result = text.translate(trans)
print(f"\nTranslate result: '{result}'")

# 9. Generator expressions for large datasets
large_data = [f"line_{i}" for i in range(1000)]
# Process lazily
uppercase_lines = (line.upper() for line in large_data)
print(f"Generator created: {uppercase_lines}")

# 10. Best practices summary
print("\nBest Practices:")
print("  ‚úì Use f-strings for readability and performance")
print("  ‚úì Use join() for building strings from sequences")
print("  ‚úì Use 'in' for substring checks")
print("  ‚úì Compile regex patterns if using multiple times")
print("  ‚úì Use str methods over regex for simple operations")
print("  ‚úì Remember strings are immutable")
print("  ‚úì Use string constants from 'string' module")
print("  ‚úì Consider generator expressions for memory efficiency")

In [None]:
print("PYTHON STRING OPERATIONS QUICK REFERENCE")
print("=" * 60)

print("\nüìù CREATION & FORMATTING")
print("  'text', \"text\", '''text''', r'raw'")
print("  f'{var}', '{}'.format(val), Template('$var')")

print("\nüîç SEARCHING & TESTING")
print("  'sub' in s, s.find('sub'), s.index('sub')")
print("  s.startswith('pre'), s.endswith('suf')")
print("  s.count('sub')")

print("\n‚úÇÔ∏è SPLITTING & JOINING")
print("  s.split(), s.split(','), s.splitlines()")
print("  s.partition('sep'), s.rpartition('sep')")
print("  ', '.join(list)")

print("\nüîÑ TRANSFORMING")
print("  s.upper(), s.lower(), s.title(), s.capitalize()")
print("  s.replace('old', 'new'), s.translate(table)")
print("  s.strip(), s.lstrip(), s.rstrip()")
print("  s.removeprefix('pre'), s.removesuffix('suf')")

print("\nüìè ALIGNMENT & PADDING")
print("  s.ljust(n), s.rjust(n), s.center(n)")
print("  s.zfill(n)")

print("\n‚úÖ VALIDATION")
print("  s.isalpha(), s.isdigit(), s.isalnum()")
print("  s.isupper(), s.islower(), s.istitle()")
print("  s.isspace(), s.isidentifier(), s.isascii()")

print("\nüî¢ INDEXING & SLICING")
print("  s[i], s[-i], s[start:end:step]")
print("  len(s), s[::-1] (reverse)")

print("\nüíæ ENCODING")
print("  s.encode('utf-8'), bytes.decode('utf-8')")

print("\nüéØ REGULAR EXPRESSIONS")
print("  re.search(pattern, s), re.findall(pattern, s)")
print("  re.sub(pattern, repl, s), re.split(pattern, s)")

print("\nüì¶ STRING MODULE")
print("  from string import ascii_letters, digits, punctuation")
print("  from string import Template")

print("\n" + "=" * 60)
print("Remember: Strings are IMMUTABLE - methods return new strings!")
print("=" * 60)