# Chapter 1: Pythonic Thinking

This notebook covers essential Python concepts with comprehensive code examples and explanations.

---

## Item 1: Know Which Version of Python You're Using

### Why Version Matters

Python has two major versions: Python 2 (end-of-life January 1, 2020) and Python 3 (actively maintained). Always use Python 3 for new projects.

### Python 2 vs Python 3 Comparison Table

| Feature | Python 2 | Python 3 |
|---------|----------|----------|
| Support Status | Ended Jan 1, 2020 | Actively maintained |
| Library Support | Limited compatibility | Full compatibility |
| Migration Tools | 2to3, six | Native features |
| Future Proof | No updates | Regular improvements |
| Print Statement | `print "hello"` | `print("hello")` |
| Division | `5/2 = 2` | `5/2 = 2.5` |
| Unicode | Separate unicode type | Strings are Unicode by default |

**Recommendation**: Use Python 3 for all projects

### Checking Your Python Version

In [None]:
import sys

# Check Python version at runtime
print("Version Info:", sys.version_info)
print("Full Version:", sys.version)
print("\nMajor Version:", sys.version_info.major)
print("Minor Version:", sys.version_info.minor)
print("Micro Version:", sys.version_info.micro)

In [None]:
# Conditional code based on Python version
if sys.version_info.major >= 3:
    print("✓ You're using Python 3!")
else:
    print("⚠ Warning: Python 2 is no longer supported!")

### Key Differences Example: Division Behavior

In [None]:
# Python 3 division behavior
print("Regular division (/):", 5 / 2)      # Returns float: 2.5
print("Floor division (//):", 5 // 2)     # Returns int: 2
print("Modulo (%):", 5 % 2)               # Returns remainder: 1

# Note: In Python 2, 5/2 would return 2 (integer division)
# In Python 3, you must explicitly use // for floor division

### Key Takeaways

- Python 3 is the most up-to-date and well-supported version
- Always verify the Python version on your system
- Avoid Python 2 as it's no longer maintained

---

## Item 2: Follow the PEP 8 Style Guide

### What is PEP 8?

PEP 8 is the official style guide for Python code. Following it makes your code:
- More readable
- Easier to maintain
- Consistent with community standards

### Whitespace Rules

| Rule | Guideline | Example |
|------|-----------|----------|
| Indentation | 4 spaces per level | `def func():` |
| Line Length | 79 characters max | Break long lines |
| Function Spacing | 2 blank lines | Between functions |
| Method Spacing | 1 blank line | Between methods |
| Assignment | Space around `=` | `x = 5` |
| Operators | Space around operators | `x + y` |
| Commas | Space after comma | `[1, 2, 3]` |

In [None]:
# GOOD: Proper spacing and indentation
def calculate_area(length, width):
    """Calculate rectangle area."""
    area = length * width
    return area


def calculate_volume(length, width, height):
    """Calculate box volume."""
    volume = length * width * height
    return volume


# Test the functions
print("Area:", calculate_area(5, 3))
print("Volume:", calculate_volume(5, 3, 2))

In [None]:
# BAD: Poor spacing (for comparison - don't do this!)
def bad_function(x,y):
    result=x+y  # No spaces around operators
    return result

# GOOD: Proper spacing
def good_function(x, y):
    result = x + y  # Spaces around operators
    return result

print("Bad function:", bad_function(3, 4))
print("Good function:", good_function(3, 4))

### Naming Conventions

| Element | Convention | Example |
|---------|-----------|----------|
| Functions | lowercase_underscore | `calculate_total()` |
| Variables | lowercase_underscore | `user_name` |
| Classes | CapitalizedWord | `DataProcessor` |
| Constants | ALL_CAPS | `MAX_SIZE` |
| Protected | _leading_underscore | `_internal` |
| Private | __double_underscore | `__private` |

# Private vs Protected Attributes in Python

## Conceptual Overview

The distinction between **protected** and **private** attributes represents fundamentally different encapsulation philosophies within object-oriented access control systems.

---

## Comprehensive Comparison

| Aspect | Protected (`_attribute`) | Private (`__attribute`) |
|:-------|:------------------------|:------------------------|
| **Syntax Convention** | Single underscore prefix | Double underscore prefix |
| **Access Control Mechanism** | Convention-based (social contract) | Interpreter-enforced name mangling |
| **Subclass Access** | Directly accessible in derived classes | Not directly accessible (requires mangled name) |
| **Name Transformation** | No transformation by interpreter | Transformed to `_ClassName__attribute` |
| **Intended Use Case** | Internal implementation for class family | Strictly internal to specific class only |
| **Encapsulation Strength** | Weak (advisory only) | Strong (mechanistically enforced) |
| **External Access** | Possible but discouraged by convention | Technically possible via mangled name but highly discouraged |
| **Inheritance Philosophy** | Trust-based: subclasses expected to respect boundaries | Zero-trust: explicit barrier against inheritance access |
| **Namespace Collision Prevention** | No protection | Prevents attribute name conflicts in complex hierarchies |
| **Refactoring Impact** | Lower barrier to modification | Higher stability guarantee for internal changes |
| **Design Signal** | "Extension point—use with care" | "Implementation detail—do not touch" |
| **Python Philosophy Alignment** | "We're all consenting adults" | Rare enforcement mechanism in otherwise permissive language |
| **Typical Application** | Helper methods, extensible state, inheritance contracts | Financial calculations, invariants, algorithm internals |
| **Example from ShoppingCart** | `self._items` (list of cart items) | `self.__total` (cumulative price value) |
| **Access Pattern** | `object._attribute` works directly | `object.__attribute` raises `AttributeError` |
| **Workaround Difficulty** | Trivial (ignore convention) | Moderate (requires knowledge of mangling scheme) |

---

## Key Distinction

**Protected attributes** rely on developer discipline and naming conventions, while **private attributes** employ Python's name mangling mechanism to create genuine (though not absolute) access barriers at the interpreter level.

---

## Architectural Implications

### When to Use Protected (`_`)
- Attributes intended for subclass extension points
- Implementation details stable enough for inheritance contracts
- Data requiring controlled but flexible access within class families
- Balance between encapsulation and extensibility requirements

### When to Use Private (`__`)
- Attributes whose modification would violate class invariants
- Implementation details subject to change without interface impact
- Data requiring integrity guarantees within class boundaries
- Prevention of namespace collision in complex inheritance hierarchies

---

## Practical Example from ShoppingCart

```python
class ShoppingCart:
    def __init__(self):
        self._items = []      # Protected: subclasses can extend item handling
        self.__total = 0      # Private: maintains financial integrity
    
    def add_item(self, item, price):
        self._items.append((item, price))
        self.__total += price
    
    def get_total(self):
        return self.__total
```

### Protected Access Pattern
```python
class ExtendedCart(ShoppingCart):
    def get_item_count(self):
        return len(self._items)  # ✓ Direct access works
```

### Private Access Pattern
```python
class ExtendedCart(ShoppingCart):
    def inspect_total(self):
        return self.__total  # ✗ AttributeError
        # Must use: self._ShoppingCart__total (discouraged)
```

---

## Epistemological Consideration

This design choice represents a fundamental judgment about **anticipated modification patterns** and **trust relationships** within your software's evolutionary trajectory. The selection between these modifiers reflects whether you prioritize strict implementation hiding (private) or inheritance-aware extensibility (protected).

In [None]:
# Naming examples demonstrating different conventions

# Functions and variables: lowercase_underscore
def calculate_total_price(item_price, quantity):
    """Calculate total price for multiple items."""
    total_price = item_price * quantity
    return total_price


# Classes: CapitalizedWord (PascalCase)
class ShoppingCart:
    """A shopping cart that holds items."""
    
    def __init__(self):
        self._items = []          # Protected attribute
        self.__total = 0          # Private attribute
    
    def add_item(self, item, price):
        """Add an item to the cart."""
        self._items.append((item, price))
        self.__total += price
    
    def get_total(self):
        """Get the cart total."""
        return self.__total


# Constants: ALL_CAPS
MAX_CART_ITEMS = 100
DEFAULT_DISCOUNT = 0.1
SALES_TAX_RATE = 0.08

# Example usage
cart = ShoppingCart()
cart.add_item("Apple", 1.50)
cart.add_item("Banana", 0.75)

print(f"Cart total: ${cart.get_total():.2f}")
print(f"Max items allowed: {MAX_CART_ITEMS}")
print(f"Sales tax rate: {SALES_TAX_RATE * 100}%")

### More Naming Examples

In [None]:
# Example showing different naming conventions in action

# Constants (configuration values)
DATABASE_URL = "postgresql://localhost/mydb"
API_TIMEOUT = 30
MAX_RETRIES = 3

# Class with various attribute types
class UserAccount:
    """Represents a user account."""
    
    # Class variable (shared across instances)
    total_accounts = 0
    
    def __init__(self, username, email):
        # Public attributes
        self.username = username
        self.email = email
        
        # Protected attribute (internal use, but accessible)
        self._created_at = "2025-01-01"
        
        # Private attribute (name mangling applied)
        self.__password_hash = "secret_hash"
        
        UserAccount.total_accounts += 1
    
    def get_account_info(self):
        """Return account information."""
        return f"User: {self.username}, Email: {self.email}"
    
    def _internal_method(self):
        """Protected method (by convention, for internal use)."""
        return "This is for internal use"
    
    def __private_method(self):
        """Private method (name mangled)."""
        return "This is truly private"


# Usage
user1 = UserAccount("john_doe", "john@example.com")
user2 = UserAccount("jane_smith", "jane@example.com")

print(user1.get_account_info())
print(f"Total accounts created: {UserAccount.total_accounts}")

# Accessing different attribute types
print(f"Public attribute: {user1.username}")
print(f"Protected attribute: {user1._created_at}")  # Accessible but not recommended
# print(user1.__password_hash)  # This would raise AttributeError!

### Expressions and Statements Best Practices

In [None]:
# GOOD: Use inline negation
a = 10
b = 20
if a is not b:
    print("Different objects")

# BAD: Negation of positive expression (don't do this!)
#if not a is b:
    # print("Different objects")

Different objects
Different objects


In [None]:
# GOOD: Check for empty containers using truthiness
my_list = []
if not my_list:
    print("List is empty")

# BAD: Checking length explicitly (unnecessarily verbose)
# if len(my_list) == 0:
#     print("List is empty")

# Empty Container Verification Methods in Python

## Comprehensive Comparison

| Method | Pythonic? | Performance | Readability | Type Safety | Works Across Containers | Limitations |
|:-------|:----------|:------------|:------------|:------------|:------------------------|:------------|
| **Truthiness Check** (`not container`) | ✓ Highly idiomatic | Fastest | Excellent | Implicit | Yes | Treats `None` as empty |
| **Length Comparison** (`len(container) == 0`) | Less idiomatic | Slightly slower | Verbose but explicit | Explicit | Yes | More ceremonious |
| **Direct Equality** (`container == []`) | ✗ Anti-pattern | Slowest | Poor | Type-specific | No | Only works for specific type; breaks polymorphism |
| **Length Truthiness** (`not len(container)`) | Redundant | Slower | Confusing | Implicit | Yes | Unnecessarily verbose |
| **Boolean Constructor** (`bool(container) == False`) | ✗ Anti-pattern | Slower | Verbose | Explicit but awkward | Yes | Overly ceremonious |

---

## Detailed Analysis

### Truthiness Check (Your Approach) ✓ **RECOMMENDED**

```python
if not my_list:
    print("List is empty")
```

**Advantages:**
- Canonical Python idiom endorsed by PEP 8 style guide
- Optimal performance through single boolean evaluation
- Works polymorphically across all container types (lists, tuples, sets, dicts, strings)
- Minimal cognitive overhead
- Leverages Python's data model design philosophy
- Most readable to experienced Python developers

**Considerations:**
- Treats `None` as empty (may require explicit None check if uninitialized containers are possible)
- Relies on understanding Python's truthiness protocol

**When to Use:**
- Default choice for all container emptiness checks
- When working with guaranteed initialized containers
- When polymorphic behavior is desired

---

### Length Comparison

```python
if len(my_list) == 0:
    print("List is empty")
```

**Advantages:**
- Explicit intent for readers unfamiliar with truthiness
- Clear numerical comparison
- Works across all container types
- Immune to None confusion (raises TypeError if None)

**Disadvantages:**
- More verbose than truthiness check
- Slightly slower (function call + comparison vs single boolean check)
- Less idiomatic in Python community
- Against PEP 8 recommendation

**When to Use:**
- Code bases prioritizing explicitness over idiomaticity
- Teaching contexts where truthiness may be unfamiliar
- When explicit TypeError on None is desired

---

### Direct Equality Comparison ✗ **ANTI-PATTERN**

```python
if my_list == []:
    print("List is empty")
```

**Disadvantages:**
- Type-specific: breaks if container type changes (list → set → tuple)
- Slowest performance (constructs empty container, performs equality comparison)
- Violates polymorphism principles
- Not idiomatic Python
- Creates unnecessary object allocation

**Why It's Problematic:**
- Function accepting "any iterable" will fail if passed non-list
- Code maintenance burden when refactoring container types
- Signals unfamiliarity with Python conventions

**Never Use This Approach**

---

### Length Truthiness ✗ **REDUNDANT**

```python
if not len(my_list):
    print("List is empty")
```

**Disadvantages:**
- Redundant boolean conversion
- Unnecessarily calls `len()` when container already has boolean protocol
- Less readable than direct truthiness check
- Slower than direct approach
- Violates principle of parsimony

**Why It's Redundant:**
Containers already evaluate to False when empty; wrapping in `len()` adds no semantic value

---

### Boolean Constructor ✗ **ANTI-PATTERN**

```python
if bool(my_list) == False:
    print("List is empty")
```

**Disadvantages:**
- Overly ceremonious and verbose
- Explicit False comparison is non-Pythonic
- Slower than direct truthiness
- Signals unfamiliarity with Python idioms
- Violates PEP 8 guidance against explicit True/False comparisons

**Never Use This Approach**

---

## Performance Hierarchy

From fastest to slowest:

1. **Truthiness check**: `if not container:` — Single boolean evaluation
2. **Length comparison**: `len(container) == 0` — Function call + comparison
3. **Length truthiness**: `not len(container)` — Function call + boolean conversion
4. **Boolean constructor**: `bool(container) == False` — Constructor + comparison
5. **Direct equality**: `container == []` — Object construction + equality check

---

## Container Type Compatibility

### Works with All Container Types:
```python
my_list = []
my_tuple = ()
my_set = set()
my_dict = {}
my_string = ""

# All evaluate to False in boolean context
if not my_list:     # True
if not my_tuple:    # True
if not my_set:      # True
if not my_dict:     # True
if not my_string:   # True
```

### Type-Specific Failures:
```python
my_set = set()

# This works:
if not my_set:          # Correct

# This fails:
if my_set == []:        # False (set never equals list, even when empty)
if my_set == set():     # Works, but still anti-pattern
```

---

## Handling None vs Empty

### Problem Scenario:
```python
my_list = None

if not my_list:  # True (None is falsy)
    print("This executes for None!")
```

### Solution for None-Safe Checking:
```python
# Explicit None check first
if my_list is None:
    print("Uninitialized")
elif not my_list:
    print("Empty but initialized")
else:
    print("Contains items")

# Alternative: default to empty container
my_list = my_list or []
if not my_list:
    print("Empty")
```

---

## PEP 8 Official Guidance

**From PEP 8 - Style Guide for Python Code:**

> "For sequences (strings, lists, tuples), use the fact that empty sequences are false."

**Recommended:**
```python
if not seq:
if seq:
```

**Not Recommended:**
```python
if len(seq):
if not len(seq):
```

---

## Conclusion

**Your implementation is correct and represents best practice.** The truthiness check `if not my_list:` is the canonical, performant, and idiomatic approach for empty container verification in Python. This method should be your default choice unless specific requirements (None handling, explicit intent for beginners) necessitate alternatives.

The approach leverages Python's design philosophy of "obvious to experienced practitioners" while maintaining optimal performance through the language's built-in boolean evaluation protocol.

In [None]:
# GOOD: Check for non-empty containers
my_list = [1, 2, 3]
if my_list:
    print("List has items")
    print(f"Items: {my_list}")

# Demonstrating truthiness with different container types
empty_dict = {}
filled_dict = {"key": "value"}
empty_string = ""
filled_string = "hello"

print(f"Empty dict is falsy: {not empty_dict}")
print(f"Filled dict is truthy: {bool(filled_dict)}")
print(f"Empty string is falsy: {not empty_string}")
print(f"Filled string is truthy: {bool(filled_string)}")

### Import Organization

**Order of imports:**
1. Standard library modules
2. Third-party modules
3. Your own modules

Each section should be in alphabetical order and separated by a blank line.

In [None]:
# GOOD import organization

# Standard library imports (alphabetical)
import os
import sys
from collections import defaultdict
from datetime import datetime

# Third-party imports (would go here if available)
# import numpy as np
# import pandas as pd
# import requests

# Your own module imports
# from mypackage import mymodule
# from myproject.utils import helper_function

print("Imports organized correctly!")
print(f"Current directory: {os.getcwd()}")
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")

### Line Length and Continuation

In [None]:
# GOOD: Breaking long lines with parentheses (preferred)
long_variable_name = (
    "This is a very long string that would exceed "
    "the 79 character limit if written on one line"
)

# GOOD: Function calls with many arguments
def complex_function(arg1, arg2, arg3, arg4, arg5):
    return arg1 + arg2 + arg3 + arg4 + arg5

result = complex_function(
    arg1=10,
    arg2=20,
    arg3=30,
    arg4=40,
    arg5=50
)

print(long_variable_name)
print(f"Result: {result}")

In [None]:
# GOOD: List comprehensions and long expressions
long_list_comp = [
    item * 2 
    for item in range(10) 
    if item % 2 == 0
]

# GOOD: Dictionary with many key-value pairs
configuration = {
    "database_url": "postgresql://localhost/mydb",
    "api_key": "your_api_key_here",
    "timeout": 30,
    "max_retries": 3,
    "debug_mode": False,
}

print(f"Even numbers doubled: {long_list_comp}")
print(f"Config keys: {list(configuration.keys())}")

### Key Takeaways

- Always follow PEP 8 style guide
- Consistent style facilitates collaboration
- Use tools like Pylint or Black for automatic enforcement
- Well-formatted code is easier to maintain
- Readability counts!

---

## Item 3: Know the Differences Between bytes and str

### bytes vs str Overview

| Type | Contains | Example | Usage |
|------|----------|---------|-------|
| bytes | Raw 8-bit values | `b'hello'` | Binary data, files |
| str | Unicode code points | `'hello'` | Text data |
| Conversion | `.encode()` / `.decode()` | - | Between types |

### Understanding bytes

In [None]:
# bytes contain raw 8-bit values
a = b'h\x65llo'  # \x65 is hexadecimal for 'e'
print("Bytes as list:", list(a))  # Shows numeric values
print("Bytes repr:", a)           # Shows bytes representation

# Each element is an integer (0-255)
for byte in a:
    print(f"Byte value: {byte}, Character: {chr(byte)}")

In [None]:
# More bytes examples
binary_data = bytes([72, 101, 108, 108, 111])  # ASCII values for "Hello"
print("From integers:", binary_data)

# Creating bytes from a string
text_bytes = b'Python 3'
print("Text bytes:", text_bytes)
print("Length:", len(text_bytes))
print("First byte:", text_bytes[0])  # Returns integer

### Understanding str

In [None]:
# str contains Unicode code points
a = 'a\u0300 propos'  # \u0300 is a combining grave accent
print("String as list:", list(a))  # Shows individual characters
print("String repr:", a)           # Shows the rendered string

# Unicode examples
unicode_string = 'Hello 世界 🌍'  # Mixed scripts and emoji
print("\nUnicode string:", unicode_string)
print("Length (code points):", len(unicode_string))
for char in unicode_string:
    print(f"Character: '{char}', Unicode: U+{ord(char):04X}")

### Converting Between Types

In [None]:
# str to bytes: use encode()
text = 'hello'
data = text.encode('utf-8')
print(f"String: {text!r}")
print(f"Bytes: {data!r}")
print(f"Type: {type(data)}")

# bytes to str: use decode()
decoded = data.decode('utf-8')
print(f"\nDecoded: {decoded!r}")
print(f"Type: {type(decoded)}")

# Verify they're equal
assert text == decoded
print("\n✓ Encoding and decoding are symmetric")

In [None]:
# Helper functions to ensure correct types
def to_str(bytes_or_str):
    """Convert bytes or str to str."""
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of str


def to_bytes(bytes_or_str):
    """Convert bytes or str to bytes."""
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of bytes


# Test the helper functions
print("to_str tests:")
print(repr(to_str(b'foo')))
print(repr(to_str('bar')))

print("\nto_bytes tests:")
print(repr(to_bytes(b'foo')))
print(repr(to_bytes('bar')))

### Different Encodings Example

In [None]:
# Demonstrating different encodings
text = 'こんにちは'  # "Hello" in Japanese

# UTF-8 encoding (variable-length, 1-4 bytes)
utf8_bytes = text.encode('utf-8')
print(f"UTF-8: {utf8_bytes}")
print(f"UTF-8 length: {len(utf8_bytes)} bytes")

# UTF-16 encoding (fixed 2 or 4 bytes)
utf16_bytes = text.encode('utf-16')
print(f"\nUTF-16: {utf16_bytes}")
print(f"UTF-16 length: {len(utf16_bytes)} bytes")

# ASCII would fail for non-ASCII characters
try:
    ascii_bytes = text.encode('ascii')
except UnicodeEncodeError as e:
    print(f"\nASCII encoding failed: {e}")

# But ASCII works for simple English text
english = "Hello"
ascii_bytes = english.encode('ascii')
print(f"\nASCII works for English: {ascii_bytes}")

### Common Gotchas: Incompatibility Between bytes and str

In [None]:
# GOTCHA 1: Can't concatenate bytes and str
print("✓ bytes + bytes:", b'one' + b'two')
print("✓ str + str:", 'one' + 'two')

try:
    result = b'one' + 'two'
except TypeError as e:
    print(f"\n✗ bytes + str fails: {e}")

try:
    result = 'one' + b'two'
except TypeError as e:
    print(f"✗ str + bytes fails: {e}")

In [None]:
# GOTCHA 2: Can't compare bytes and str
print("✓ bytes comparison:", b'red' > b'blue')
print("✓ str comparison:", 'red' > 'blue')

try:
    result = 'red' > b'blue'
except TypeError as e:
    print(f"\n✗ str > bytes fails: {e}")

# Equality always returns False (no exception)
print(f"\n✗ bytes == str: {b'foo' == 'foo'}  (always False!)")

In [None]:
# GOTCHA 3: Format strings behave differently
print("✓ bytes formatting:", b'red %s' % b'blue')
print("✓ str formatting:", 'red %s' % 'blue')

try:
    result = b'red %s' % 'blue'
except TypeError as e:
    print(f"\n✗ bytes format with str: {e}")

# str format with bytes calls __repr__
result = 'red %s' % b'blue'
print(f"\n⚠ str format with bytes: {result!r}  (includes b'' prefix!)")

### File Operations with bytes and str

In [None]:
import tempfile
import os

# Create a temporary directory for our examples
temp_dir = tempfile.mkdtemp()
binary_file = os.path.join(temp_dir, 'data.bin')
text_file = os.path.join(temp_dir, 'data.txt')

# Writing binary data (mode 'wb')
with open(binary_file, 'wb') as f:
    f.write(b'\xf1\xf2\xf3\xf4\xf5')

print("✓ Binary file written successfully")

# Reading binary data (mode 'rb')
with open(binary_file, 'rb') as f:
    data = f.read()

print(f"✓ Binary data read: {data}")
assert data == b'\xf1\xf2\xf3\xf4\xf5'

In [None]:
# Writing text with specific encoding
with open(text_file, 'w', encoding='utf-8') as f:
    f.write('Hello 世界')

print("✓ Text file written with UTF-8 encoding")

# Reading text with specific encoding
with open(text_file, 'r', encoding='utf-8') as f:
    text = f.read()

print(f"✓ Text data read: {text}")

# Cleanup
import shutil
shutil.rmtree(temp_dir)
print("✓ Temporary files cleaned up")

### The Unicode Sandwich Pattern

In [None]:
# Unicode Sandwich: Decode at boundaries, use str internally, encode at boundaries

def process_data(input_bytes):
    """
    Demonstrates the Unicode Sandwich pattern.
    
    1. Decode bytes to str at input boundary
    2. Process as str internally
    3. Encode str to bytes at output boundary
    """
    # Decode at input boundary
    text = input_bytes.decode('utf-8')
    print(f"1. Decoded input: {text!r}")
    
    # Process as str (all string operations)
    text = text.upper()
    text = text.replace('HELLO', 'HI')
    print(f"2. Processed: {text!r}")
    
    # Encode at output boundary
    output_bytes = text.encode('utf-8')
    print(f"3. Encoded output: {output_bytes!r}")
    
    return output_bytes


# Example usage
input_data = b'hello world'
output_data = process_data(input_data)
print(f"\nFinal result: {output_data}")

### Key Takeaways

- `bytes` contains sequences of 8-bit values, `str` contains Unicode code points
- Use helper functions to ensure correct types
- `bytes` and `str` can't be used together with operators like `>`, `==`, `+`, and `%`
- Always use binary mode (`'rb'` or `'wb'`) for binary data
- Always specify encoding when reading/writing text files
- Follow the Unicode Sandwich pattern: decode early, process as str, encode late

---

## Item 4: Prefer Interpolated F-Strings Over C-style Format Strings and str.format

### Evolution of String Formatting in Python

| Method | Introduced | Status | Recommendation |
|--------|-----------|--------|----------------|
| C-style (%) | Python 1.0 | Legacy | Avoid |
| str.format() | Python 3.0 | Verbose | Avoid |
| F-strings | Python 3.6 | Modern | Use this |

### Problem 1: Type Conversion Errors with C-Style Formatting

In [None]:
"""
Comprehensive Guide to C-Style String Formatting in Python
===========================================================

This module demonstrates the old-style (printf-style) string formatting
using the % operator, which originates from C's printf() function.

While this formatting style is considered legacy (superseded by str.format()
and f-strings), it remains in use in legacy codebases and certain contexts.
"""

# ============================================================================
# SECTION 1: Basic C-Style Formatting
# ============================================================================

print("=" * 70)
print("SECTION 1: Basic C-Style Formatting")
print("=" * 70)

# Simple string substitution with %s (string formatter)
name = "Alice"
greeting = "Hello, %s!" % name
print(f"String substitution: {greeting}")

# Simple integer substitution with %d (decimal integer formatter)
age = 30
age_statement = "Age: %d years old" % age
print(f"Integer substitution: {age_statement}")

# Simple float substitution with %f (floating-point formatter)
price = 19.99
price_statement = "Price: %f" % price
print(f"Float substitution: {price_statement}")

print()

# ============================================================================
# SECTION 2: Multiple Value Substitution (Tuple Required)
# ============================================================================

print("=" * 70)
print("SECTION 2: Multiple Value Substitution")
print("=" * 70)

# When substituting multiple values, they MUST be provided as a tuple
# The order in the tuple MUST match the order of format specifiers
key = 'my_var'
value = 1.234

# Correct: Values provided in matching order
formatted = '%-10s = %.2f' % (key, value)
print(f"Correct order: {formatted}")

# CRITICAL: The order matters! Swapping causes type mismatch
try:
    # This fails because:
    # - First specifier %-10s expects string, receives float (1.234)
    # - Second specifier %.2f expects float, receives string ('my_var')
    wrong_order = '%-10s = %.2f' % (value, key)
except TypeError as e:
    print(f"\nError with swapped values: {e}")

# The format string specifiers must also match the value types
try:
    # This fails because:
    # - First specifier %.2f expects float, receives string ('my_var')
    # - Second specifier %-10s expects string, receives float (1.234)
    wrong_format = '%.2f = %-10s' % (key, value)
except TypeError as e:
    print(f"Error with swapped format: {e}")

print()

# ============================================================================
# SECTION 3: Format Specifier Anatomy
# ============================================================================

print("=" * 70)
print("SECTION 3: Format Specifier Anatomy")
print("=" * 70)

# Format specifier structure: %[flags][width][.precision]type
# Let's break down %-10s and %.2f

print("\nAnalyzing %-10s:")
print("  % : Format specifier marker")
print("  - : Left-align flag (default is right-align)")
print("  10: Minimum field width (pad to 10 characters)")
print("  s : Type specifier (string)")

test_string = "test"
result = "%-10s|" % test_string
print(f"  Result: '{result}' (string padded with spaces on right)")

print("\nAnalyzing %.2f:")
print("  % : Format specifier marker")
print("  .2: Precision (2 decimal places)")
print("  f : Type specifier (floating-point)")

test_float = 3.14159
result = "%.2f" % test_float
print(f"  Result: '{result}' (rounded to 2 decimals)")

print()

# ============================================================================
# SECTION 4: Common Type Specifiers
# ============================================================================

print("=" * 70)
print("SECTION 4: Common Type Specifiers")
print("=" * 70)

number = 42
decimal = 3.14159
text = "Python"

# %s - String (accepts any type, converts to string)
print(f"%s (string):           '%s'" % ("Text", text))

# %d or %i - Signed decimal integer
print(f"%d (decimal int):      '%d'" % ("Integer", number))

# %f - Floating-point decimal
print(f"%f (float default):    '%f'" % ("Float", decimal))

# %e - Scientific notation (lowercase e)
print(f"%e (scientific):       '%e'" % ("Scientific", decimal))

# %E - Scientific notation (uppercase E)
print(f"%E (scientific upper): '%E'" % ("Scientific", decimal))

# %g - Shortest representation (%e or %f)
print(f"%g (shortest):         '%g'" % ("Shortest", decimal))

# %x - Hexadecimal (lowercase)
print(f"%x (hex lowercase):    '%x'" % ("Hex", number))

# %X - Hexadecimal (uppercase)
print(f"%X (hex uppercase):    '%X'" % ("Hex", number))

# %o - Octal
print(f"%o (octal):            '%o'" % ("Octal", number))

# %c - Single character
print(f"%c (character):        '%c'" % ("Char", 65))  # ASCII 65 = 'A'

# %% - Literal percent sign
print(f"%%%% (literal percent): '%%'")

print()

# ============================================================================
# SECTION 5: Width and Alignment Control
# ============================================================================

print("=" * 70)
print("SECTION 5: Width and Alignment Control")
print("=" * 70)

word = "test"

# Right-aligned (default) with width 10
print("Right-aligned (default):")
print("'%10s'" % word)  # Spaces on left

# Left-aligned with width 10
print("\nLeft-aligned with - flag:")
print("'%-10s'" % word)  # Spaces on right

# Center alignment not directly supported in %-formatting
# (use str.format() or f-strings for center alignment)

# Minimum width with numbers
num = 42
print("\nNumber formatting with width:")
print("'%5d'" % num)   # Right-aligned number
print("'%-5d'" % num)  # Left-aligned number

# Width with floats
pi = 3.14159
print("\nFloat with width and precision:")
print("'%10.2f'" % pi)   # Width 10, 2 decimals, right-aligned
print("'%-10.2f'" % pi)  # Width 10, 2 decimals, left-aligned

print()

# ============================================================================
# SECTION 6: Precision Control
# ============================================================================

print("=" * 70)
print("SECTION 6: Precision Control")
print("=" * 70)

pi = 3.14159265359

# Different precision levels for floats
print("Float precision variations:")
print(f"%.1f  (1 decimal):  {pi:.1f}")
print(f"%.2f  (2 decimals): {pi:.2f}")
print(f"%.5f  (5 decimals): {pi:.5f}")
print(f"%.10f (10 decimals): {pi:.10f}")

# Precision with strings (truncates to max length)
long_text = "Python Programming"
print("\nString precision (truncation):")
print("'%.6s' from '%s' = '%s'" % (long_text, long_text, "%.6s" % long_text))
print("'%.10s' from '%s' = '%s'" % (long_text, long_text, "%.10s" % long_text))

print()

# ============================================================================
# SECTION 7: Padding with Zeros
# ============================================================================

print("=" * 70)
print("SECTION 7: Zero-Padding")
print("=" * 70)

# Use 0 flag for zero-padding (primarily for numbers)
number = 42

print("Zero-padding for numbers:")
print(f"'%05d' formats {number} as '%05d'" % number)  # 00042
print(f"'%08d' formats {number} as '%08d'" % number)  # 00000042

# Combining zero-padding with negative numbers
negative = -7
print(f"\n'%05d' formats {negative} as '%05d'" % negative)  # -0007

# Zero-padding with floats
decimal_num = 3.14
print(f"\n'%08.2f' formats {decimal_num} as '%08.2f'" % decimal_num)  # 00003.14

print()

# ============================================================================
# SECTION 8: Sign Control
# ============================================================================

print("=" * 70)
print("SECTION 8: Sign Display Control")
print("=" * 70)

positive = 42
negative = -42

# Default: only show sign for negative numbers
print("Default sign behavior:")
print(f"Positive: '%d'" % positive)
print(f"Negative: '%d'" % negative)

# + flag: always show sign
print("\nAlways show sign with + flag:")
print(f"Positive: '%+d'" % positive)
print(f"Negative: '%+d'" % negative)

# Space flag: space for positive, minus for negative
print("\nSpace for positive, sign for negative:")
print(f"Positive: '% d'" % positive)
print(f"Negative: '% d'" % negative)

print()

# ============================================================================
# SECTION 9: Dictionary-Based Formatting
# ============================================================================

print("=" * 70)
print("SECTION 9: Dictionary-Based Formatting (Named Placeholders)")
print("=" * 70)

# Instead of positional tuples, can use dictionary with named keys
# Format: %(key)s where 'key' is the dictionary key

person = {
    'name': 'Bob',
    'age': 35,
    'salary': 75000.50
}

# Using named placeholders - order doesn't matter!
statement = "%(name)s is %(age)d years old and earns $%(salary).2f" % person
print(f"Dictionary formatting: {statement}")

# Major advantage: can reference same key multiple times
repeated = "%(name)s says: 'Hi, I'm %(name)s!'" % person
print(f"Repeated keys: {repeated}")

# Order independence demonstration
config = {'host': 'localhost', 'port': 8080, 'protocol': 'https'}
url1 = "%(protocol)s://%(host)s:%(port)d" % config
url2 = "%(host)s uses %(protocol)s on port %(port)d" % config
print(f"\nFlexible ordering 1: {url1}")
print(f"Flexible ordering 2: {url2}")

print()

# ============================================================================
# SECTION 10: Practical Real-World Examples
# ============================================================================

print("=" * 70)
print("SECTION 10: Real-World Applications")
print("=" * 70)

# Example 1: Formatted table output
print("\nExample 1: Data Table")
print("-" * 40)

products = [
    ('Apple', 1.20, 50),
    ('Banana', 0.80, 120),
    ('Orange', 1.50, 75)
]

# Header
print("%-15s %8s %10s" % ("Product", "Price", "Quantity"))
print("-" * 40)

# Data rows
for product, price, qty in products:
    print("%-15s $%7.2f %10d" % (product, price, qty))

# Example 2: Log message formatting
print("\n\nExample 2: Log Messages")
print("-" * 40)

log_level = "ERROR"
timestamp = "2025-10-28 14:32:15"
message = "Connection timeout"
error_code = 504

log_entry = "[%-5s] %s - %s (Code: %d)" % (log_level, timestamp, message, error_code)
print(log_entry)

# Example 3: Progress indicator
print("\n\nExample 3: Progress Indicator")
print("-" * 40)

total = 100
for completed in [0, 25, 50, 75, 100]:
    percentage = (completed / total) * 100
    progress = "Progress: %3d%% [%3d/%3d]" % (percentage, completed, total)
    print(progress)

# Example 4: Scientific data presentation
print("\n\nExample 4: Scientific Notation")
print("-" * 40)

measurements = [
    ('Planck constant', 6.62607015e-34, 'J⋅s'),
    ('Speed of light', 299792458, 'm/s'),
    ('Avogadro number', 6.02214076e23, 'mol⁻¹')
]

for name, value, unit in measurements:
    print("%-20s: %12.6e %-10s" % (name, value, unit))

print()

# ============================================================================
# SECTION 11: Common Pitfalls and Error Prevention
# ============================================================================

print("=" * 70)
print("SECTION 11: Common Pitfalls")
print("=" * 70)

# Pitfall 1: Forgetting tuple for multiple values
print("\nPitfall 1: Missing tuple parentheses")
try:
    # This interprets as format % first_value, then tries to do something with second_value
    result = "%s %s" % "first", "second"
except TypeError as e:
    print(f"Error: {e}")
    print("Solution: Use tuple: '%s %s' % ('first', 'second')")

# Pitfall 2: Type mismatch
print("\nPitfall 2: Type mismatch")
try:
    result = "%d" % "not a number"
except TypeError as e:
    print(f"Error: {e}")
    print("Solution: Use correct type or %s for automatic conversion")

# Pitfall 3: Missing value
print("\nPitfall 3: Not enough values")
try:
    result = "%s %s %s" % ("one", "two")
except TypeError as e:
    print(f"Error: {e}")
    print("Solution: Ensure tuple has correct number of values")

# Pitfall 4: Too many values
print("\nPitfall 4: Too many values")
try:
    result = "%s" % ("one", "two")
except TypeError as e:
    print(f"Error: {e}")
    print("Solution: Match format specifiers to value count")

print()

# ============================================================================
# SECTION 12: Modern Alternatives Comparison
# ============================================================================

print("=" * 70)
print("SECTION 12: Comparison with Modern Methods")
print("=" * 70)

name = "Charlie"
age = 28
score = 95.7

# Old style (%-formatting)
old_style = "Name: %-10s | Age: %3d | Score: %5.1f" % (name, age, score)
print(f"Old style (%-format):  {old_style}")

# str.format() method
format_method = "Name: {:<10} | Age: {:3d} | Score: {:5.1f}".format(name, age, score)
print(f"str.format() method:   {format_method}")

# f-strings (Python 3.6+)
f_string = f"Name: {name:<10} | Age: {age:3d} | Score: {score:5.1f}"
print(f"f-string (modern):     {f_string}")

print("\nRecommendation: Use f-strings for new code (more readable, faster)")
print("Use %-formatting only when maintaining legacy code or in specific")
print("contexts where it's required (e.g., logging configuration).")

print()

# ============================================================================
# SECTION 13: Summary and Best Practices
# ============================================================================

print("=" * 70)
print("SECTION 13: Best Practices Summary")
print("=" * 70)

print("""
C-Style (%-formatting) Best Practices:
======================================

1. ORDER MATTERS: Values must match format specifiers exactly
   ✓ Correct: '%-10s = %.2f' % (string_var, float_var)
   ✗ Wrong:   '%-10s = %.2f' % (float_var, string_var)

2. USE TUPLES: Multiple values require tuple syntax
   ✓ Correct: '%s %s' % (val1, val2)
   ✗ Wrong:   '%s %s' % val1, val2

3. TYPE SAFETY: Match type specifiers to value types
   ✓ %s for strings, %d for integers, %f for floats
   ✗ Avoid: %d for strings, %f for integers (causes TypeError)

4. DICTIONARY STYLE: Use for named, order-independent formatting
   ✓ '%(name)s is %(age)d' % {'name': 'Alice', 'age': 30}

5. PREFER MODERN: Use f-strings for new Python 3.6+ code
   ✓ f"{name} is {age}"  (more readable, faster, safer)

6. LEGACY CONTEXTS: Only use %-formatting when:
   - Maintaining existing codebases
   - Required by framework (e.g., logging config)
   - Interfacing with C libraries

7. PRECISION CONTROL: Use .N for decimal places
   ✓ %.2f for currency (12.50)
   ✓ %.4f for scientific precision

8. WIDTH ALIGNMENT: Use width and flags for columnar output
   ✓ %-10s for left-aligned columns
   ✓ %10s for right-aligned columns
""")

print("=" * 70)
print("END OF COMPREHENSIVE GUIDE")
print("=" * 70)

### Problem 2: Readability with Complex Modifications

In [None]:
# Pantry inventory example
pantry = [
    ('avocados', 1.25),
    ('bananas', 2.5),
    ('cherries', 15),
]

# Simple version (without modifications)
print("Simple formatting:")
for i, (item, count) in enumerate(pantry):
    print('#%d: %-10s = %.2f' % (i, item, count))

In [None]:
# Complex version (with inline modifications)
print("\nWith modifications (harder to read):")
for i, (item, count) in enumerate(pantry):
    print('#%d: %-10s = %d' % (
        i + 1,              # Add 1 to index
        item.title(),       # Capitalize item name
        round(count)))      # Round the count

# The tuple becomes very long and splits across multiple lines
print("\n✗ This style is hard to read!")

### Problem 3: Repetition with Multiple References

In [None]:
# Using the same value multiple times
template = '%s loves food. See %s cook.'
name = 'Max'

# Must repeat the value
formatted = template % (name, name)
print(formatted)

# Easy to forget or make mistakes
name = 'brad'
formatted = template % (name.title(), name.title())  # Error-prone!
print(formatted)

### Dictionary Formatting (Partial Solution)

In [None]:
# Dictionary formatting reduces some problems
key = 'my_var'
value = 1.234

# Can swap order in the dictionary without errors
old_way = '%-10s = %.2f' % (key, value)
new_way = '%(key)-10s = %(value).2f' % {'key': key, 'value': value}
reordered = '%(key)-10s = %(value).2f' % {'value': value, 'key': key}

assert old_way == new_way == reordered
print(f"All equal: {old_way}")

In [None]:
# Solves repetition problem
template = '%(name)s loves food. See %(name)s cook.'
formatted = template % {'name': 'Max'}
print(formatted)

# But introduces verbosity
soup = 'lentil'
formatted = "Today's soup is %(soup)s." % {'soup': soup}
print(formatted)
print("\n✗ Very verbose with redundant keys!")

### str.format() Method

In [None]:
# The format() built-in function
a = 1234.5678
formatted = format(a, ',.2f')
print(f"Formatted number: {formatted}")

b = 'my string'
formatted = format(b, '^20s')
print(f"Centered: '*{formatted}*'")

In [None]:
# The str.format() method
key = 'my_var'
value = 1.234

# Basic usage with positional arguments
formatted = '{} = {}'.format(key, value)
print(formatted)

# With format specifiers
formatted = '{:<10} = {:.2f}'.format(key, value)
print(formatted)

# Can reference positions multiple times
formatted = '{0} loves food. See {0} cook.'.format('Max')
print(formatted)

### F-Strings: The Modern Solution

In [None]:
# F-strings are concise and readable
key = 'my_var'
value = 1.234

formatted = f'{key} = {value}'
print(formatted)

# With format specifiers
formatted = f'{key!r:<10} = {value:.2f}'
print(formatted)

# Comparison of all methods
f_string  = f'{key:<10} = {value:.2f}'
c_tuple   = '%-10s = %.2f' % (key, value)
str_args  = '{:<10} = {:.2f}'.format(key, value)
str_kw    = '{key:<10} = {value:.2f}'.format(key=key, value=value)
c_dict    = '%(key)-10s = %(value).2f' % {'key': key, 'value': value}

assert c_tuple == c_dict == f_string == str_args == str_kw
print(f"\n✓ F-string is the shortest: {f_string}")

### F-Strings with Expressions

In [None]:
# F-strings allow full Python expressions
pantry = [
    ('avocados', 1.25),
    ('bananas', 2.5),
    ('cherries', 15),
]

print("F-string with inline expressions:")
for i, (item, count) in enumerate(pantry):
    # All modifications inline - clear and concise!
    print(f'#{i+1}: {item.title():<10s} = {round(count)}')

In [None]:
# Complex expressions in f-strings
import math

x = 5
y = 10

# Mathematical operations
print(f'Sum: {x + y}')
print(f'Product: {x * y}')
print(f'Square root of {x}: {math.sqrt(x):.2f}')

# Conditional expressions
print(f'x is {"even" if x % 2 == 0 else "odd"}')

# Method calls and list operations
words = ['hello', 'world']
print(f'Joined: {", ".join(words)}')
print(f'Uppercase: {[w.upper() for w in words]}')

### Common Format Specifiers

| Specifier | Description | Example | Output |
|-----------|-------------|---------|--------|
| :.2f | 2 decimal float | f'{3.14159:.2f}' | 3.14 |
| :<10 | Left align, width 10 | f'{"hi":<10}' | 'hi        ' |
| :>10 | Right align | f'{"hi":>10}' | '        hi' |
| :^10 | Center align | f'{"hi":^10}' | '    hi    ' |
| :, | Thousands separator | f'{1234567:,}' | '1,234,567' |
| !r | Repr format | f'{"hi"!r}' | "'hi'" |
| !s | Str format | f'{obj!s}' | str(obj) |
| !a | ASCII format | f'{obj!a}' | ascii(obj) |

In [None]:
# Format specifier examples
value = 1234.5678
text = "hello"

print("Number formatting:")
print(f"2 decimals: {value:.2f}")
print(f"Thousands: {value:,.2f}")
print(f"Percentage: {0.1234:.1%}")
print(f"Scientific: {value:.2e}")

print("\nString formatting:")
print(f"Left align:  '{text:<10}'")
print(f"Right align: '{text:>10}'")
print(f"Center:      '{text:^10}'")
print(f"Repr:        {text!r}")

### Dynamic Format Specifiers

In [None]:
# Format specifiers can be variables
places = 3
number = 1.23456

print(f'My number is {number:.{places}f}')

# Dynamic width
width = 15
text = "Python"
print(f'Centered: |{text:^{width}}|')

# Both dynamic
precision = 4
field_width = 12
value = 123.456789
print(f'Custom: |{value:{field_width}.{precision}f}|')

### Key Takeaways

- C-style format strings suffer from gotchas and verbosity
- str.format() is better but still repetitive
- F-strings are succinct, powerful, and Pythonic
- F-strings allow arbitrary Python expressions
- Always prefer f-strings for string formatting in modern Python