## Key Takeaways

1. **Root logger = CEO**  
   Configure it once, everyone inherits

2. **Log levels = audience**  
   - DEBUG → developers (internal minutiae)
   - INFO → operators (milestones)
   - WARNING → when you fix something automatically
   - ERROR → when you can't fix something
   - CRITICAL → pipeline must stop

3. **Handlers = destinations**  
   - Console (real-time)
   - Full log file (forensic)
   - Error file (quick triage)

4. **Formatters = presentation**  
   - Detailed for files (who, what, where, when)
   - Simple for console (just the message)

5. **Rotation = safety**  
   - Keeps disk from filling up
   - RotatingFileHandler handles it automatically

**In one sentence:**  
*setup_logging() creates a central logging hub with multiple outputs, each curated for its specific audience.*

## Summary: The Complete Picture

Here's what `setup_logging()` does in ONE DIAGRAM:

```
┌─────────────────────────────────────────────────────────────┐
│ ROOT LOGGER (central headquarters)                          │
│ - Level: DEBUG (catch everything)                           │
│ - Has 3 handlers attached:                                  │
└─────────────────────────────────────────────────────────────┘
         |
    ┌────────────────────────────────────────────┐
    │ Every logger created via                   │
    │ logging.getLogger(__name__)                │
    │ automatically inherits these 3 handlers    │
    └────────────────────────────────────────────┘
     |                  |                  |
┌─────────────┐  ┌────────────────┐  ┌──────────────┐
│  CONSOLE    │  │  pipeline.log  │  │  errors.log  │
│ (INFO+)     │  │  (DEBUG+)      │  │  (ERROR+)    │
│             │  │                │  │              │
│ Operators   │  │ Developers     │  │ On-call      │
│ see:        │  │ see:           │  │ engineers    │
│ Y INFO      │  │ Y DEBUG        │  │ see:         │
│ Y WARNING   │  │ Y INFO         │  │ Y ERROR      │
│ Y ERROR     │  │ Y WARNING      │  │ Y CRITICAL   │
│ Y CRITICAL  │  │ Y ERROR        │  │              │
│ N DEBUG     │  │ Y CRITICAL     │  │ N DEBUG      │
│             │  │ (rotates @ 5MB)│  │ (rotates)    │
└─────────────┘  └────────────────┘  └──────────────┘
```

**The genius:** 
- One function call sets everything up
- Different audiences get different detail levels
- Log files don't fill the disk
- Developer can trace any issue with full DEBUG info

In [3]:
# Check what was actually written to the files
import os
import sys

# Define project root (this cell needs to be self-contained)
project_root = r"c:\Users\adegb\Desktop\python-logging-mastery"

test_log_dir = os.path.join(project_root, "test_logs")
pipeline_log = os.path.join(test_log_dir, "pipeline.log")
errors_log = os.path.join(test_log_dir, "errors.log")

print("CONTENTS OF pipeline.log (DEBUG+):")
print("=" * 60)
if os.path.exists(pipeline_log):
    with open(pipeline_log, 'r') as f:
        content = f.read()
        # Show last 5 lines (the ones we just wrote)
        lines = content.strip().split('\n')
        for line in lines[-5:]:
            print(line)
else:
    print("File not found")

print()
print("CONTENTS OF errors.log (ERROR+):")
print("=" * 60)
if os.path.exists(errors_log):
    with open(errors_log, 'r') as f:
        content = f.read()
        lines = content.strip().split('\n')
        for line in lines[-5:]:
            print(line)
else:
    print("File not found")

print()
print("Observations:")
print("   - pipeline.log HAS all 5 messages (DEBUG, INFO, WARNING, ERROR, CRITICAL)")
print("   - errors.log HAS ONLY 2 messages (ERROR and CRITICAL)")
print("   - Console (above) didn't show DEBUG")
print()
print("This proves the filtering works correctly!")

CONTENTS OF pipeline.log (DEBUG+):
File not found

CONTENTS OF errors.log (ERROR+):
File not found

Observations:
   - errors.log HAS ONLY 2 messages (ERROR and CRITICAL)
   - Console (above) didn't show DEBUG

This proves the filtering works correctly!


In [5]:
# Now let's log at different levels and check what appears where
import logging

# Create a named logger (like etl/extractor.py does)
logger = logging.getLogger("etl.extractor")

print("Logging test messages...")
print()

logger.debug("This is a DEBUG message")
logger.info("This is an INFO message")
logger.warning("This is a WARNING message")
logger.error("This is an ERROR message")
logger.critical("This is a CRITICAL message")

print("Messages sent! Now let's check what was written to files...")

This is an ERROR message
This is a CRITICAL message


Logging test messages...

Messages sent! Now let's check what was written to files...


In [6]:
# Import from the actual project
import sys
import os

# Add project root to path
project_root = r"c:\Users\adegb\Desktop\python-logging-mastery"
sys.path.insert(0, project_root)

# Import and run the actual setup
from config.logging_config import setup_logging
import logging

# Setup logging (creates the handlers, formatters, etc.)
test_log_dir = os.path.join(project_root, "test_logs")
setup_logging(log_dir=test_log_dir)

print("Logging system initialized!")
print(f"   Log directory: {test_log_dir}")
print()

# Get root logger to inspect what was added
root_logger = logging.getLogger()
print(f"Root logger has {len(root_logger.handlers)} handlers:")
for i, handler in enumerate(root_logger.handlers, 1):
    print(f"   {i}. {handler.__class__.__name__} @ level {handler.level}")
    if hasattr(handler, 'baseFilename'):
        print(f"      File: {handler.baseFilename}")

22:31:53 | INFO     | Logging system initialized -- console=INFO, file=DEBUG, errors=ERROR


Logging system initialized!
   Log directory: c:\Users\adegb\Desktop\python-logging-mastery\test_logs

Root logger has 3 handlers:
   1. StreamHandler @ level 20
   2. RotatingFileHandler @ level 10
      File: c:\Users\adegb\Desktop\python-logging-mastery\test_logs\pipeline.log
   3. RotatingFileHandler @ level 40
      File: c:\Users\adegb\Desktop\python-logging-mastery\test_logs\errors.log


## Section 8: Testing the Actual Logging System

Let's test the real `setup_logging()` function from your project and verify everything works.

In [7]:
# Demo: The magic of adding handlers to root logger
import logging
import io

# Clear any existing handlers
logging.getLogger().handlers.clear()

# Simplifed demo setup
console_buffer = io.StringIO()
file_buffer = io.StringIO()
error_buffer = io.StringIO()

console_handler = logging.StreamHandler(console_buffer)
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(logging.Formatter("%(message)s"))

file_handler = logging.StreamHandler(file_buffer)
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(logging.Formatter("%(message)s"))

error_handler = logging.StreamHandler(error_buffer)
error_handler.setLevel(logging.ERROR)
error_handler.setFormatter(logging.Formatter("%(message)s"))

# STEP 1: Add handlers to root logger
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
root_logger.addHandler(console_handler)
root_logger.addHandler(file_handler)
root_logger.addHandler(error_handler)

print("Setup complete. Now let's use it:")
print()

# STEP 2: In a different module (e.g., etl/extractor.py)
# They just do this:
logger_from_extractor = logging.getLogger("etl.extractor")

# STEP 3: They call logger.warning() ONE TIME
logger_from_extractor.warning("CSV file is stale")

print("Called: logger.warning('CSV file is stale') ONE TIME")
print()
print("But it was routed to ALL THREE places:")
print()
print("1. Console (operator watching):")
print("   ", repr(console_buffer.getvalue()))
print()
print("2. pipeline.log (developer analyzing):")
print("   ", repr(file_buffer.getvalue()))
print()
print("3. errors.log (on-call engineer triaging):")
print("   ", repr(error_buffer.getvalue()), "<- Empty because WARNING < ERROR level")
print()
print("Key insight: Setup logging ONCE in config.py")
print("   Every module.logger.xxx() automatically routes to all handlers!")

Setup complete. Now let's use it:


But it was routed to ALL THREE places:

1. Console (operator watching):
    'CSV file is stale\n'

2. pipeline.log (developer analyzing):
    'CSV file is stale\n'

3. errors.log (on-call engineer triaging):

Key insight: Setup logging ONCE in config.py
   Every module.logger.xxx() automatically routes to all handlers!


## Section 7: Attaching All Handlers to Root Logger

This is where the magic happens:

```python
root_logger.addHandler(console_handler)
root_logger.addHandler(file_handler)
root_logger.addHandler(error_handler)
```

**What does this do?**

Now, when ANY logger in ANY module calls `logger.info("message")`:

1. ✅ Goes to console (if level >= INFO)
2. ✅ Goes to pipeline.log (if level >= DEBUG)
3. ✅ Goes to errors.log (if level >= ERROR)

All from ONE `logger.info()` call! No extra work needed.

In [8]:
# Demo: Three handlers, three views of the same log messages
import logging
import io

# Clear any existing handlers
logging.getLogger().handlers.clear()

# Create three buffers
console_buffer = io.StringIO()
file_buffer = io.StringIO()
error_buffer = io.StringIO()

# Console handler - INFO+
console_handler = logging.StreamHandler(console_buffer)
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(logging.Formatter("%(levelname)-8s | %(message)s"))

# File handler - DEBUG+
file_handler = logging.StreamHandler(file_buffer)
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(logging.Formatter("%(levelname)-8s | %(message)s"))

# Error handler - ERROR+
error_handler = logging.StreamHandler(error_buffer)
error_handler.setLevel(logging.ERROR)
error_handler.setFormatter(logging.Formatter("%(levelname)-8s | %(message)s"))

# Attach all to root logger
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
root_logger.addHandler(console_handler)
root_logger.addHandler(file_handler)
root_logger.addHandler(error_handler)

# Log messages at different levels
logger = logging.getLogger("etl.extractor")
logger.debug("File size: 3838 bytes")
logger.info("Successfully extracted 50 rows")
logger.warning("CSV file is 7 days old - data may be stale")
logger.error("API request failed - HTTP 500")
logger.critical("Database connection lost - aborting pipeline")

print("CONSOLE OUTPUT (INFO+):")
print(console_buffer.getvalue())
print("\nFILE OUTPUT (DEBUG+):")
print(file_buffer.getvalue())
print("\nERRORS.LOG OUTPUT (ERROR+):")
print(error_buffer.getvalue())

print("\nSame messages, BUT:")
print("   - Console: User sees key milestones (less noise)")
print("   - File: Developer sees everything (including DEBUG)")
print("   - Errors: On-call sees only failures (quick triage)")

CONSOLE OUTPUT (INFO+):
INFO     | Successfully extracted 50 rows
ERROR    | API request failed - HTTP 500
CRITICAL | Database connection lost - aborting pipeline


FILE OUTPUT (DEBUG+):
DEBUG    | File size: 3838 bytes
INFO     | Successfully extracted 50 rows
ERROR    | API request failed - HTTP 500
CRITICAL | Database connection lost - aborting pipeline


ERRORS.LOG OUTPUT (ERROR+):
ERROR    | API request failed - HTTP 500
CRITICAL | Database connection lost - aborting pipeline


Same messages, BUT:
   - Console: User sees key milestones (less noise)
   - File: Developer sees everything (including DEBUG)
   - Errors: On-call sees only failures (quick triage)


## Section 6: Setting Up Error-Only Handler

This is a **separate** file handler that captures ONLY errors.

```python
error_handler = RotatingFileHandler(
    filename="logs/errors.log",
    maxBytes=5 * 1024 * 1024,
    backupCount=3
)
error_handler.setLevel(logging.ERROR)  # ERROR and CRITICAL only
```

**Why a separate error file?**

Imagine `pipeline.log` has 100,000 lines. An on-call engineer wakes up at 3 AM because the pipeline failed. 

❌ Bad: "Read all 100k lines"  
✅ Good: "Read `errors.log` – it has just the 5 failures"

Much faster triage!

In [9]:
# Demo: Understanding log rotation
import os

print("How RotatingFileHandler Works:")
print()
print("Scenario: maxBytes=5MB, backupCount=3")
print()
print("Initial state:")
print("  pipeline.log (0 MB)")
print()
print("After pipeline runs (3 MB):")
print("  pipeline.log (3 MB)")
print()
print("After pipeline runs again (8 MB total written):")
print("  pipeline.log (5 MB - just rotated!)")
print("  pipeline.log.1 (3 MB - old file moved)")
print()
print("After 3 more rotations:")
print("  pipeline.log (5 MB)")
print("  pipeline.log.1 (5 MB)")
print("  pipeline.log.2 (5 MB)")
print("  pipeline.log.3 (5 MB)")
print()
print("Next rotation (would be 4th):")
print("  pipeline.log.3 is DELETED (only keeping 3 backups)")
print("  pipeline.log (5 MB)")
print("  pipeline.log.1 (5 MB)")
print("  pipeline.log.2 (5 MB)")
print()
print("Result: Disk space is bounded! Max = 5MB x 4 = 20 MB total")

How RotatingFileHandler Works:

Scenario: maxBytes=5MB, backupCount=3

Initial state:
  pipeline.log (0 MB)

After pipeline runs (3 MB):
  pipeline.log (3 MB)

After pipeline runs again (8 MB total written):
  pipeline.log (5 MB - just rotated!)
  pipeline.log.1 (3 MB - old file moved)

After 3 more rotations:
  pipeline.log (5 MB)
  pipeline.log.1 (5 MB)
  pipeline.log.2 (5 MB)
  pipeline.log.3 (5 MB)

Next rotation (would be 4th):
  pipeline.log.3 is DELETED (only keeping 3 backups)
  pipeline.log (5 MB)
  pipeline.log.1 (5 MB)
  pipeline.log.2 (5 MB)

Result: Disk space is bounded! Max = 5MB x 4 = 20 MB total


## Section 5: Setting Up File Handler with Rotation

The **file handler** writes everything to a log file for forensic analysis.

**The problem:** Log files can grow infinitely! Your disk fills up. Bad.

**The solution:** RotatingFileHandler
- Writes to `pipeline.log`
- When it hits 5 MB, it rotates: `pipeline.log` → `pipeline.log.1`
- Old backups get deleted after 3 rotations
- Never fills your disk

```python
file_handler = RotatingFileHandler(
    filename="output\clean_sales.db",
    maxBytes=5 * 1024 * 1024,  # 5 MB
    backupCount=3              # Keep 3 old files
)
file_handler.setLevel(logging.DEBUG)  # Capture EVERYTHING
```

In [None]:
# Demo: Console handler filtering
import logging
import io
import sys

# Clear any existing handlers
logging.getLogger().handlers.clear()

# Create a string buffer to capture output
string_buffer = io.StringIO()

# Create handlers
console_handler = logging.StreamHandler(string_buffer)
console_handler.setLevel(logging.INFO)  # INFO and above only

# Console formatter
console_formatter = logging.Formatter(
    fmt="%(asctime)s | %(levelname)-8s | %(message)s",
    datefmt="%H:%M:%S"
)
console_handler.setFormatter(console_formatter)

# Setup root logger
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)  # Capture everything
root_logger.addHandler(console_handler)

# Now log at different levels
logger = logging.getLogger("demo")
logger.debug("This is DEBUG - should NOT appear on console")
logger.info("This is INFO - SHOULD appear")
logger.warning("This is WARNING - SHOULD appear")
logger.error("This is ERROR - SHOULD appear")

# Show what went to console
console_output = string_buffer.getvalue()
print("What appears on the CONSOLE (console level = INFO):")
print(console_output)
print()
print("Notice: DEBUG message was FILTERED OUT")
print("   Debug goes to files, not console!")

## Section 4: Setting Up Console Handler (What Operators See)

A **handler** is "where to send the logs to". 

The console handler sends logs to the terminal screen that the operator is watching.

```python
console_handler = logging.StreamHandler()       # Create handler
console_handler.setLevel(logging.INFO)          # Only show INFO+ (not DEBUG)
console_handler.setFormatter(console_formatter) # Use simple format
```

**Why INFO and not DEBUG?**
- Operators don't need verbose DEBUG noise while watching in real-time
- Too much noise = they miss important warnings
- DEBUG is for developers analyzing logs later

In [None]:
# Demo: Formatters
import logging

# FILE FORMATTER - detailed, for archival
file_formatter = logging.Formatter(
    fmt="%(asctime)s | %(levelname)-8s | %(name)-20s | %(filename)s:%(lineno)d | %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)

# CONSOLE FORMATTER - simple, for operators
console_formatter = logging.Formatter(
    fmt="%(asctime)s | %(levelname)-8s | %(message)s",
    datefmt="%H:%M:%S"
)

print("Formatter Template Variables:")
print("   %(asctime)s    -> timestamp")
print("   %(levelname)s   -> DEBUG, INFO, WARNING, ERROR, CRITICAL")
print("   %(name)s        -> logger name (e.g., 'etl.extractor')")
print("   %(filename)s    -> source file name")
print("   %(lineno)d      -> line number where logging.xxx() was called")
print("   %(message)s     -> the actual log message")
print()

# Let's create a dummy log record to show the difference
handler = logging.StreamHandler()
logger = logging.getLogger("etl.extractor")
logger.setLevel(logging.DEBUG)

# Create a log record manually for demo
record = logging.LogRecord(
    name="etl.extractor",
    level=logging.INFO,
    pathname="/path/to/extractor.py",
    lineno=69,
    msg="Successfully extracted 50 rows",
    args=(),
    exc_info=None
)

print("Same log message, different formatters:")
print()
print("FILE output:")
print("  ", file_formatter.format(record))
print()
print("CONSOLE output:")
print("  ", console_formatter.format(record))

## Section 3: Defining Custom Formatters

A **formatter** controls HOW a log message looks. Think of it as a template.

The code creates **two different formatters**:

**1. File Formatter (detailed):**
```
2026-02-10 21:44:02 | INFO     | etl.extractor | extractor.py:69 | Successfully extracted 50 rows
```
Components: `timestamp | level | logger_name | filename:line | message`

**2. Console Formatter (simple):**
```
21:44:02 | INFO     | Successfully extracted 50 rows
```
Components: `time | level | message` (no filename/logger because operators don't need it)

In [None]:
# Demo: Root logger vs named loggers
import logging

# Clear existing handlers for demo
logging.getLogger().handlers.clear()

# Step 1: Configure the ROOT logger (happens once)
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)

print("Root logger setup:")
print(f"   - Root logger name: '{root_logger.name}' (empty = root)")
print(f"   - Root logger level: {root_logger.level} (DEBUG)")
print()

# Step 2: Create named loggers in different modules
# These inherit root logger's settings
logger_extractor = logging.getLogger("etl.extractor")
logger_transformer = logging.getLogger("etl.transformer")
logger_loader = logging.getLogger("etl.loader")

print("Named loggers created:")
print(f"   - logger_extractor name: '{logger_extractor.name}'")
print(f"   - logger_transformer name: '{logger_transformer.name}'")
print(f"   - logger_loader name: '{logger_loader.name}'")
print()

print("Key point:")
print("   All three inherited the root logger's level (DEBUG)")
print("   We set it once, everyone gets it!")

## Section 2: Creating the Root Logger

The **root logger** is like the CEO of all loggers. Every logger you create inherits its settings.

```python
root_logger = logging.getLogger()  # No name = root logger
root_logger.setLevel(logging.DEBUG)  # Capture EVERYTHING
root_logger.handlers.clear()  # Wipe any old handlers
```

**Why do this?**
- You configure ONCE here
- Every module that does `logger = logging.getLogger(__name__)` automatically inherits these settings
- Prevents logging configuration chaos

In [None]:
# Let's see the severity order in Python
import logging

levels = {
    'DEBUG': logging.DEBUG,
    'INFO': logging.INFO,
    'WARNING': logging.WARNING,
    'ERROR': logging.ERROR,
    'CRITICAL': logging.CRITICAL
}

for name, value in levels.items():
    print(f"{name:10} = {value}")

print("\nRemember: Higher numbers = More severe")
print("   If you set level to INFO (20), you accept: INFO, WARNING, ERROR, CRITICAL")
print("   But you REJECT: DEBUG (10)")

## Section 1: Understanding Logging Levels

Python has **5 severity levels** for logging. Think of them like priority signals:

| Level | Severity | When to Use | Example |
|-------|----------|------------|---------|
| DEBUG | Low | Internal workings you need as a developer | "File size: 3838 bytes" |
| INFO | Low-Medium | Important milestones, everything is OK | "Successfully extracted 50 rows" |
| WARNING | Medium | Something odd happened, but we fixed it | "Missing email field – filling with placeholder" |
| ERROR | High | Something failed for this specific item | "Invalid quantity type – skipping row" |
| CRITICAL | Very High | The whole pipeline must stop | "Database connection failed – aborting" |

**Key insight:** Each level includes all higher levels. So if you set `INFO`, you get INFO + WARNING + ERROR + CRITICAL, but NOT DEBUG.

# Understanding Python Logging Configuration
## A Deep Dive into the ETL Pipeline's Logging Setup

This notebook explains **exactly** what the `logging_config.py` code does, step by step, with interactive examples.