# Python Exception Handling (try / except / else / finally / raise)

Exception handling is a **non-negotiable** skill for data engineers.

In data pipelines, one failure (one bad file / one bad table) should **not crash** the entire job.

## Topics
- Why exception handling matters (pipeline resilience)
- `try / except` basics
- Catching specific errors vs generic `except`
- `Exception as e` for error details
- `else` block (runs only if no exception)
- `finally` block (runs no matter what)
- `return` + `finally` caveat (finally still runs)
- `raise` for custom validation
- Custom exceptions (domain/business rules)
- Real-world pattern: looping through many tables/files with dynamic logging
- Extra: `traceback` for full stack trace
- Extra: simple retry pattern


## 1) Why Exception Handling?

Normally, if any line throws an error, Python stops the whole program.

In data engineering:
- You may process **100+ tables**
- One table might fail due to schema mismatch, nulls, type issues, permissions

✅ Exception handling lets you:
- **Avoid script termination**
- **Log error details**
- **Continue processing** remaining items


## 2) Basic try/except

### Idea
- **Try** the risky block
- If it fails → **Except** runs

⚠️ Avoid using bare `except:` in production unless you re-raise or log carefully.


In [None]:
try:
    x = "hello"
    # This will raise TypeError because you cannot compare str and int
    if x > 10:
        print("Greater")
except:
    print("Hey, you have an error!")


## 3) Better: capture technical error details (`Exception as e`)

This prints the **actual interpreter message**, which is crucial for debugging/logging.

In [None]:
try:
    x = "hello"
    if x > 10:
        print("Greater")
except Exception as e:
    print(f"Error details: {e}")


## 4) Catch specific exceptions (recommended)

Catching specific exceptions is safer than `except Exception`.

Common exception types:
- `TypeError`
- `ValueError`
- `KeyError`
- `IndexError`
- `ZeroDivisionError`
- `FileNotFoundError`


In [None]:
try:
    a = 10
    b = 0
    print(a / b)
except ZeroDivisionError as e:
    print("Cannot divide by zero!")
    print("Details:", e)


## 5) Multiple `except` blocks

Use different handlers for different error types.

In [None]:
def parse_int(s):
    try:
        return int(s)
    except ValueError as e:
        return f"ValueError: cannot convert '{s}' to int"
    except TypeError as e:
        return "TypeError: input must be a string or number"

print(parse_int("123"))
print(parse_int("12.3"))
print(parse_int(None))


## 6) `try / except / else`

`else` runs **only if try succeeded** (no exception).

✅ Use `else` for logic that should only run after success.

In [None]:
try:
    value = int("50")
except ValueError as e:
    print("Conversion failed:", e)
else:
    print("Conversion succeeded. value * 2 =", value * 2)


## 7) `finally` block (runs no matter what)

`finally` executes whether:
- try succeeded
- try failed
- exception was handled or not

✅ Used for cleanup: closing files, releasing connections, printing final logs.

In [None]:
try:
    print("Try block running...")
    x = 1 / 0
except ZeroDivisionError as e:
    print("Except caught:", e)
finally:
    print("Finally: I always run!")


## 8) `return` + `finally` caveat (important interview concept)

Even if the function hits `return`, the `finally` block still runs.


In [None]:
def my_function(px):
    try:
        if px % 2 == 0:
            return 1
        return 0
    except Exception as e:
        return e
    finally:
        print("I will always run! (finally)")

print("Return:", my_function(10))
print("Return:", my_function(11))


## 9) Raising custom exceptions (`raise`)

Sometimes Python code is valid, but **business logic** says it should fail.

Example: value must be >= 100.

In [None]:
x = 99
try:
    if x < 100:
        raise ValueError("Value less than 100 is not allowed")
    print("Valid value")
except ValueError as e:
    print("Validation failed:", e)


## 10) Custom Exception Classes (cleaner for pipelines)

Custom exceptions help you distinguish:
- business validation errors
- data quality failures
- external system errors


In [None]:
class DataQualityError(Exception):
    """Raised when data fails validation rules."""

def validate_age(age):
    if age is None:
        raise DataQualityError("Age is missing")
    if age < 0:
        raise DataQualityError("Age cannot be negative")
    return True

for a in [25, None, -1]:
    try:
        validate_age(a)
        print(a, "-> ok")
    except DataQualityError as e:
        print(a, "-> data quality issue:", e)


# ✅ Real-world Data Engineering Pattern

## 11) Process many tables/files without crashing

We simulate processing tables where some fail.

Goal:
- try to load/process each table
- if one fails, log error with table name
- continue with the next table


In [None]:
tables = ["orders", "products", "customers", "bad_table", "logs"]

def process_table(table_name):
    # Simulate failures
    if table_name == "bad_table":
        raise RuntimeError("Schema mismatch: missing required column")
    if table_name == "logs":
        raise PermissionError("Permission denied for logs")
    return f"{table_name} processed successfully"

success = []
failures = []

for t in tables:
    try:
        msg = process_table(t)
        print("✅", msg)
        success.append(t)
    except Exception as e:
        # Dynamic logging using f-strings
        print(f"❌ Error processing table '{t}': {e}")
        failures.append((t, str(e)))

print("\nSummary")
print("Success:", success)
print("Failures:", failures)


## 12) Extra: capture full stack trace using `traceback`

Printing just `e` gives the message.

For debugging production failures, a stack trace is often needed.


In [None]:
import traceback

def risky():
    return 1 / 0

try:
    risky()
except Exception as e:
    print("Error:", e)
    print("\nFull traceback:")
    print(traceback.format_exc())


## 13) Extra: simple retry pattern

In pipelines, you might retry transient issues:
- network timeouts
- temporary service errors

Below is a simple retry wrapper.

In [None]:
import time

def retry(func, retries=3, delay_s=0.2, allowed_exceptions=(Exception,)):
    last_err = None
    for attempt in range(1, retries + 1):
        try:
            return func()
        except allowed_exceptions as e:
            last_err = e
            print(f"Attempt {attempt}/{retries} failed: {e}")
            if attempt < retries:
                time.sleep(delay_s)
    # After retries exhausted
    raise last_err

# Simulated flaky function
state = {"calls": 0}
def flaky_job():
    state["calls"] += 1
    if state["calls"] < 3:
        raise TimeoutError("Temporary timeout")
    return "Success on attempt 3"

print(retry(flaky_job, retries=5, allowed_exceptions=(TimeoutError,)))


## 14) Best Practices (quick checklist)

✅ Prefer:
- Catch **specific exceptions** when possible
- Log enough context (file/table name, key ids)
- Use `finally` for cleanup
- Use `raise` for business validation
- Use `traceback` for full debug logs

⚠️ Avoid:
- `except:` silently (it can hide bugs)
- Catching broad exceptions without logging
- Ignoring failures without tracking (collect failures list)


---
## ✅ Final Recap

- `try/except` prevents full script crashes.
- `Exception as e` captures the technical message.
- `else` runs only when no exception happens.
- `finally` always runs (even after `return`).
- `raise` is used for custom business rules.
- In data engineering, loop + try/except enables resilient multi-table/file processing.
