# 🛠️ Python Standard Library Essentials: Power Tools for Everyday Tasks

**Welcome!** This notebook is your guide to some of the most powerful and frequently used modules within Python's extensive Standard Library: `datetime`, `math`, `re` (Regular Expressions), and `functools`. Mastering these modules unlocks significant capabilities for handling time, calculations, text patterns, and function manipulation, forming the bedrock of many Python applications.

**Target Audience:** Beginner to Intermediate Python developers looking to leverage the built-in capabilities of Python for common programming tasks.

**Learning Objectives:**
*   Manipulate dates, times, durations, and timezones effectively using `datetime`.
*   Perform advanced mathematical calculations with the `math` module.
*   Master the art of pattern matching in strings using regular expressions (`re`).
*   Enhance function capabilities with higher-order functions and decorators using `functools`.
*   Understand best practices, performance considerations, and common pitfalls for each module.
*   Apply these modules to solve practical problems.

## 1. `datetime`: Working with Dates and Times

**Introduction:** Handling dates and times is fundamental in many applications, from logging events and scheduling tasks to calculating durations and analyzing time-series data. Python's `datetime` module provides a comprehensive set of classes for these tasks.

**Real-world Use Cases:**
*   **Logging:** Timestamping events accurately.
*   **Web Applications:** Handling user session timeouts, displaying post dates, calculating durations.
*   **Data Analysis:** Parsing date strings, calculating time differences, working with time series.
*   **Scheduling:** Determining when tasks should run (e.g., cron jobs).
*   **Finance:** Calculating interest periods, expiration dates.

**Analogy: The Timekeeper's Toolkit**
Think of the `datetime` module as a sophisticated toolkit used by a master timekeeper. It contains:
*   A precise **Calendar** (`date`): To represent specific days.
*   An accurate **Clock** (`time`): To represent time within a day (independent of date).
*   A combined **Chronometer** (`datetime`): To pinpoint an exact moment in time (date and time).
*   A flexible **Stopwatch/Timer** (`timedelta`): To measure durations or calculate future/past points.
*   A **World Atlas of Time Zones** (`timezone`, `tzinfo`, often used with `zoneinfo` or `pytz`): To handle different time zones correctly.

### 1.1 Core Classes

*   `datetime.date`: Represents a date (year, month, day).
*   `datetime.time`: Represents a time (hour, minute, second, microsecond), independent of day. Can be *naive* or *aware* of time zones.
*   `datetime.datetime`: Represents a specific point in time (combines date and time). Can be *naive* or *aware*.
*   `datetime.timedelta`: Represents a duration or difference between two dates or times.
*   `datetime.timezone`: Represents a timezone offset from UTC (e.g., `timezone(timedelta(hours=-5))`). Introduced in Python 3.2.
*   `datetime.tzinfo`: Abstract base class for timezone information. You typically use concrete implementations like those from `zoneinfo` or `pytz`.

### 1.2 Getting Current Date and Time

**Important Distinction:**
*   **Naive:** Objects don't have enough information to determine their offset from UTC or handle DST unambiguously.
*   **Aware:** Objects *do* contain timezone information.

In [1]:
from datetime import date, time, datetime, timedelta, timezone
import time as time_module # To avoid name collision

# --- Current Date --- 
today = date.today()
print(f"Today's date: {today}")
print(f"Year: {today.year}, Month: {today.month}, Day: {today.day}")
print(f"Weekday (0=Mon, 6=Sun): {today.weekday()}")
print(f"ISO Weekday (1=Mon, 7=Sun): {today.isoweekday()}")

# --- Current Date and Time --- 
# datetime.now() - Local time (can be naive or aware depending on tz arg)
now_local_naive = datetime.now() # Naive local time by default
print(f"\nNow (local, naive): {now_local_naive}")

# datetime.utcnow() - UTC time, BUT **NAIVE** (legacy, avoid for new code)
now_utc_naive_legacy = datetime.utcnow()
print(f"Now (UTC, naive - Legacy): {now_utc_naive_legacy}") 

# **Modern Best Practice: Get Aware UTC Time**
now_utc_aware = datetime.now(timezone.utc)
print(f"Now (UTC, aware - Recommended): {now_utc_aware}") 
print(f"Is aware? {now_utc_aware.tzinfo is not None}")

# --- Unix Timestamp --- 
# Seconds since the Epoch (Jan 1, 1970 UTC)
timestamp = time_module.time()
print(f"\nCurrent Unix Timestamp: {timestamp}")

# Convert timestamp to aware datetime object
dt_from_timestamp = datetime.fromtimestamp(timestamp, tz=timezone.utc)
print(f"Datetime from timestamp (UTC): {dt_from_timestamp}")

# Convert aware datetime back to timestamp
timestamp_from_dt = now_utc_aware.timestamp()
print(f"Timestamp from aware datetime: {timestamp_from_dt}")

Today's date: 2025-04-20
Year: 2025, Month: 4, Day: 20
Weekday (0=Mon, 6=Sun): 6
ISO Weekday (1=Mon, 7=Sun): 7

Now (local, naive): 2025-04-20 16:11:53.066694
Now (UTC, naive - Legacy): 2025-04-20 10:41:53.066901
Now (UTC, aware - Recommended): 2025-04-20 10:41:53.067024+00:00
Is aware? True

Current Unix Timestamp: 1745145713.0672643
Datetime from timestamp (UTC): 2025-04-20 10:41:53.067264+00:00
Timestamp from aware datetime: 1745145713.067024


  now_utc_naive_legacy = datetime.utcnow()


### 1.3 Creating Specific Dates and Times

In [2]:
from datetime import date, time, datetime

d1 = date(2024, 7, 15)
print(f"Specific date: {d1}")

t1 = time(14, 30, 5) # 2:30:05 PM (naive)
print(f"Specific time (naive): {t1}")

dt1 = datetime(2024, 7, 15, 14, 30, 5, 123456) # Naive datetime
print(f"Specific datetime (naive): {dt1}")

# Combine date and time
dt2 = datetime.combine(d1, t1)
print(f"Combined datetime (naive): {dt2}")

Specific date: 2024-07-15
Specific time (naive): 14:30:05
Specific datetime (naive): 2024-07-15 14:30:05.123456
Combined datetime (naive): 2024-07-15 14:30:05


### 1.4 Time Deltas (`timedelta`)

Used for representing durations and performing date/time arithmetic.

In [3]:
from datetime import datetime, timedelta

delta1 = timedelta(days=5, hours=3, minutes=15, seconds=10)
print(f"Time delta: {delta1}")
print(f"Total seconds in delta: {delta1.total_seconds()}")

now = datetime.now() # Naive local
print(f"\nCurrent time: {now}")

# Add timedelta
future_time = now + delta1
print(f"Future time: {future_time}")

# Subtract timedelta
past_time = now - timedelta(weeks=2)
print(f"Past time: {past_time}")

# Calculate difference between two datetimes
dt_start = datetime(2023, 1, 1, 10, 0, 0)
dt_end = datetime(2023, 1, 5, 14, 30, 0)
duration = dt_end - dt_start
print(f"\nDuration between {dt_start} and {dt_end}: {duration}")
print(f"Duration in days: {duration.days}")
print(f"Duration total seconds: {duration.total_seconds()}")

Time delta: 5 days, 3:15:10
Total seconds in delta: 443710.0

Current time: 2025-04-20 16:11:53.092456
Future time: 2025-04-25 19:27:03.092456
Past time: 2025-04-06 16:11:53.092456

Duration between 2023-01-01 10:00:00 and 2023-01-05 14:30:00: 4 days, 4:30:00
Duration in days: 4
Duration total seconds: 361800.0


### 1.5 Formatting (`strftime`) and Parsing (`strptime`)

*   `strftime` (String **Format** Time): Converts `datetime` object to a formatted string.
*   `strptime` (String **Parse** Time): Converts a string into a `datetime` object based on a format.

Common Format Codes (Full list in Python docs):
*   `%Y`: Year with century (e.g., 2023)
*   `%y`: Year without century (e.g., 23)
*   `%m`: Month as zero-padded decimal (01, ..., 12)
*   `%d`: Day of the month as zero-padded decimal (01, ..., 31)
*   `%H`: Hour (24-hour clock) as zero-padded decimal (00, ..., 23)
*   `%I`: Hour (12-hour clock) as zero-padded decimal (01, ..., 12)
*   `%M`: Minute as zero-padded decimal (00, ..., 59)
*   `%S`: Second as zero-padded decimal (00, ..., 59)
*   `%f`: Microsecond as decimal number (000000-999999)
*   `%a`: Abbreviated weekday name (Sun, Mon, ...)
*   `%A`: Full weekday name (Sunday, Monday, ...)
*   `%b`: Abbreviated month name (Jan, Feb, ...)
*   `%B`: Full month name (January, February, ...)
*   `%Z`: Time zone name (if timezone is aware).
*   `%z`: UTC offset in the form ±HHMM[SS[.ffffff]] (e.g., +0000, -0400).

In [4]:
from datetime import datetime

now = datetime.now()

# --- Formatting (datetime to string) --- 
formatted_str1 = now.strftime("%Y-%m-%d %H:%M:%S")
print(f"Formatted (YYYY-MM-DD HH:MM:SS): {formatted_str1}")

formatted_str2 = now.strftime("%A, %B %d, %Y - %I:%M %p") # More verbose
print(f"Formatted (Verbose): {formatted_str2}")

# --- Parsing (string to datetime) --- 
date_string1 = "2023-10-26 15:45:30"
format_code1 = "%Y-%m-%d %H:%M:%S"
try:
    parsed_dt1 = datetime.strptime(date_string1, format_code1)
    print(f"\nParsed '{date_string1}' using '{format_code1}': {parsed_dt1}")
    print(f"Parsed object type: {type(parsed_dt1)}")
except ValueError as e:
    print(f"Error parsing '{date_string1}': {e}")

date_string2 = "Thu, 26 Oct 2023"
format_code2 = "%a, %d %b %Y"
try:
    parsed_dt2 = datetime.strptime(date_string2, format_code2)
    # Note: time part defaults to 00:00:00 when not in format string
    print(f"Parsed '{date_string2}' using '{format_code2}': {parsed_dt2}") 
except ValueError as e:
    print(f"Error parsing '{date_string2}': {e}")

# Pitfall: Ambiguous formats or incorrect format codes will raise ValueError

Formatted (YYYY-MM-DD HH:MM:SS): 2025-04-20 16:11:53
Formatted (Verbose): Sunday, April 20, 2025 - 04:11 PM

Parsed '2023-10-26 15:45:30' using '%Y-%m-%d %H:%M:%S': 2023-10-26 15:45:30
Parsed object type: <class 'datetime.datetime'>
Parsed 'Thu, 26 Oct 2023' using '%a, %d %b %Y': 2023-10-26 00:00:00


### 1.6 ISO 8601 Format (Modern Standard)

ISO 8601 is the international standard format for representing dates and times (e.g., `2023-10-26T15:50:00.123456+00:00`). Python has built-in support:
*   `datetime.isoformat()`: Converts `datetime` object to ISO 8601 string.
*   `datetime.fromisoformat(date_string)`: Parses an ISO 8601 string back into a `datetime` object (handles timezone offsets automatically!).

**Best Practice:** Use ISO 8601 for data interchange (APIs, file storage) whenever possible. It's unambiguous and machine-friendly.

In [5]:
from datetime import datetime, timezone, timedelta

now_utc_aware = datetime.now(timezone.utc)
now_offset_aware = datetime.now(timezone(timedelta(hours=-5))) # Example timezone

# --- Formatting to ISO 8601 --- 
iso_utc = now_utc_aware.isoformat()
iso_offset = now_offset_aware.isoformat()
print(f"ISO 8601 (UTC): {iso_utc}")
print(f"ISO 8601 (Offset): {iso_offset}")

# --- Parsing from ISO 8601 --- 
iso_string_utc = "2023-10-26T10:30:00+00:00"
iso_string_offset = "2023-10-26T05:30:00-05:00"
iso_string_naive = "2023-10-26T12:00:00" # Naive (no tz info)

parsed_iso_utc = datetime.fromisoformat(iso_string_utc)
parsed_iso_offset = datetime.fromisoformat(iso_string_offset)
parsed_iso_naive = datetime.fromisoformat(iso_string_naive)

print(f"\nParsed ISO (UTC): {parsed_iso_utc}, Timezone: {parsed_iso_utc.tzinfo}")
print(f"Parsed ISO (Offset): {parsed_iso_offset}, Timezone: {parsed_iso_offset.tzinfo}")
print(f"Parsed ISO (Naive): {parsed_iso_naive}, Timezone: {parsed_iso_naive.tzinfo}")

# Verify they represent the same point in time (if aware)
print(f"Do UTC and Offset strings represent the same time? {parsed_iso_utc == parsed_iso_offset}")

ISO 8601 (UTC): 2025-04-20T10:41:53.115766+00:00
ISO 8601 (Offset): 2025-04-20T05:41:53.115816-05:00

Parsed ISO (UTC): 2023-10-26 10:30:00+00:00, Timezone: UTC
Parsed ISO (Offset): 2023-10-26 05:30:00-05:00, Timezone: UTC-05:00
Parsed ISO (Naive): 2023-10-26 12:00:00, Timezone: None
Do UTC and Offset strings represent the same time? True


### 1.7 Time Zones (`zoneinfo` - Python 3.9+)

Handling time zones correctly, especially with Daylight Saving Time (DST), is complex.
*   **Legacy:** `pytz` was the standard third-party library.
*   **Modern (Python 3.9+):** The `zoneinfo` module provides access to the IANA Time Zone Database included with Python.

**Best Practice:** Use `zoneinfo` if your Python version supports it. Always work with **aware** datetime objects, preferably storing and calculating internally in **UTC** and converting to local time zones only for display.

In [6]:
import sys
from datetime import datetime

# zoneinfo requires Python 3.9+ and the tzdata package on some systems
# If running on older Python or tzdata not installed, this block will fail.
# On Linux: sudo apt install tzdata
# On Windows/macOS: pip install tzdata
try:
    from zoneinfo import ZoneInfo
    ZONEINFO_AVAILABLE = True
except ImportError:
    ZONEINFO_AVAILABLE = False
    print("zoneinfo module not available (requires Python 3.9+ and potentially 'tzdata' package). Skipping timezone examples.")

if ZONEINFO_AVAILABLE:
    # --- Create aware datetime objects in specific time zones ---
    dt_naive = datetime(2023, 11, 1, 10, 0, 0) # A naive datetime
    
    tz_london = ZoneInfo("Europe/London")
    tz_newyork = ZoneInfo("America/New_York")
    tz_tokyo = ZoneInfo("Asia/Tokyo")
    
    # Make a naive datetime aware by assuming it *is* in a specific timezone
    dt_london = dt_naive.replace(tzinfo=tz_london)
    print(f"Naive datetime assumed to be in London: {dt_london}")
    
    # Get current time in a specific timezone
    now_newyork = datetime.now(tz_newyork)
    print(f"Current time in New York: {now_newyork}")

    # --- Convert between time zones --- 
    # Start with an aware object (best practice: start from UTC)
    utc_now = datetime.now(ZoneInfo("UTC"))
    print(f"\nCurrent time (UTC): {utc_now}")
    
    london_time = utc_now.astimezone(tz_london)
    newyork_time = utc_now.astimezone(tz_newyork)
    tokyo_time = utc_now.astimezone(tz_tokyo)
    
    print(f"Converted to London: {london_time}")
    print(f"Converted to New York: {newyork_time}")
    print(f"Converted to Tokyo: {tokyo_time}")
    
    # --- DST Handling --- 
    # zoneinfo handles DST transitions automatically
    # Example: DST change in New York (usually March)
    dst_start_naive = datetime(2024, 3, 10, 1, 59, 59)
    dst_start_aware = dst_start_naive.replace(tzinfo=tz_newyork)
    dst_after = dst_start_aware + timedelta(seconds=1)
    print(f"\nTime just before NY DST start: {dst_start_aware}")
    print(f"Time just after NY DST start:  {dst_after}") # Note the jump to 3 AM

    # Ambiguous/Non-existent times during DST changes require careful handling
    # naive_dt_in_gap = datetime(2024, 3, 10, 2, 30, 0)
    # try:
    #    aware_dt = naive_dt_in_gap.replace(tzinfo=tz_newyork) # Raises NonExistentTimeError
    # except Exception as e:
    #    print(f"Error handling non-existent time: {e}")
    
    # Use fold=1 during DST fall-back to specify the second occurrence of an hour
    # naive_dt_ambiguous = datetime(2024, 11, 3, 1, 30, 0)
    # aware_dt_first = naive_dt_ambiguous.replace(tzinfo=tz_newyork, fold=0)
    # aware_dt_second = naive_dt_ambiguous.replace(tzinfo=tz_newyork, fold=1)
    # print(f"Ambiguous time (first hour): {aware_dt_first}")
    # print(f"Ambiguous time (second hour): {aware_dt_second}")

Naive datetime assumed to be in London: 2023-11-01 10:00:00+00:00
Current time in New York: 2025-04-20 06:41:53.134106-04:00

Current time (UTC): 2025-04-20 10:41:53.134545+00:00
Converted to London: 2025-04-20 11:41:53.134545+01:00
Converted to New York: 2025-04-20 06:41:53.134545-04:00
Converted to Tokyo: 2025-04-20 19:41:53.134545+09:00

Time just before NY DST start: 2024-03-10 01:59:59-05:00
Time just after NY DST start:  2024-03-10 02:00:00-05:00


### 1.8 `datetime` Best Practices & Pitfalls

**Best Practices:**
*   **Be Timezone Aware:** Store and perform calculations using timezone-aware `datetime` objects, preferably in UTC.
*   **Use `zoneinfo` (Python 3.9+):** Prefer `zoneinfo` over `pytz` for modern timezone handling.
*   **Use ISO 8601:** Use `isoformat()` and `fromisoformat()` for reliable data exchange.
*   **Explicit is Better:** Be explicit about whether a `datetime` is naive or aware, and which timezone it represents.
*   **Careful Arithmetic:** Understand that adding `timedelta` to a naive `datetime` yields a naive `datetime`.

**Pitfalls:**
*   **Naive vs. Aware Confusion:** Mixing naive and aware objects in comparisons or calculations often leads to `TypeError` or incorrect results.
*   **DST Ambiguity:** Incorrectly handling the hour that repeats or is skipped during DST transitions.
*   **`utcnow()` is Naive:** Using the legacy `utcnow()` thinking it's aware.
*   **Parsing Ambiguity:** Relying on `strptime` with potentially ambiguous formats without strict validation.
*   **Leap Seconds:** Standard `datetime` does not handle leap seconds (most applications don't need to, but be aware if you work in specialized fields).

### 1.9 `datetime` Interview Questions

1.  What's the difference between a naive and an aware datetime object?
2.  How do you get the current time in UTC as an aware object?
3.  How do you convert a string like "2023-10-26" into a date object?
4.  How do you format a datetime object into a string like "October 26, 2023"?
5.  What is a `timedelta` object used for? Give an example.
6.  How would you represent a specific timezone like "Europe/Berlin"?
7.  How do you convert a datetime object from one timezone to another?
8.  What is ISO 8601 format and why is it useful?

## 2. `math`: Mathematical Functions

**Introduction:** Provides access to standard mathematical functions beyond the basic arithmetic operators.

**Real-world Use Cases:**
*   **Scientific & Engineering:** Trigonometry, logarithms, exponentiation.
*   **Graphics:** Calculating distances, angles, transformations.
*   **Statistics:** Square roots, factorials (though `statistics` module is more specialized).
*   **Financial Modeling:** Exponential growth, powers.

**Analogy: The Scientific Calculator**
The `math` module is like Python's built-in scientific calculator, offering functions found on such devices (sin, cos, log, sqrt, etc.) and important mathematical constants.

### 2.1 Common Functions and Constants

In [7]:
import math

# --- Constants --- 
print(f"Pi (π): {math.pi}")
print(f"Euler's number (e): {math.e}")
print(f"Infinity (float): {math.inf}")
print(f"Not a Number (float): {math.nan}")

# --- Number-theoretic and representation functions --- 
print(f"\n--- Rounding & Representation ---")
x = 4.7
y = -4.2
print(f"Ceiling of {x}: {math.ceil(x)}") # Smallest integer >= x
print(f"Floor of {x}: {math.floor(x)}") # Largest integer <= x
print(f"Ceiling of {y}: {math.ceil(y)}")
print(f"Floor of {y}: {math.floor(y)}")
print(f"Absolute value of {y}: {math.fabs(y)}")
print(f"Truncated value of {x}: {math.trunc(x)}") # Remove fractional part
print(f"Factorial of 5: {math.factorial(5)}") # 5*4*3*2*1
print(f"Is 10 close to 10.0000001? {math.isclose(10, 10.0000001)}")
print(f"Is 10 close to 10.001? {math.isclose(10, 10.001)}")
print(f"Is 10 close to 10.001 (tighter tolerance)? {math.isclose(10, 10.001, abs_tol=0.0001)}")
print(f"Is {math.inf} finite? {math.isfinite(math.inf)}")
print(f"Is {math.nan} NaN? {math.isnan(math.nan)}")
print(f"Greatest Common Divisor (GCD) of 48 and 180: {math.gcd(48, 180)}")
# lcm available in Python 3.9+
if hasattr(math, 'lcm'):
    print(f"Least Common Multiple (LCM) of 12 and 18: {math.lcm(12, 18)}")

# --- Power and logarithmic functions --- 
print(f"\n--- Power & Logarithms ---")
print(f"Square root of 16: {math.sqrt(16)}")
print(f"2 to the power of 10: {math.pow(2, 10)}")
print(f"e to the power of 3: {math.exp(3)}")
print(f"Natural logarithm (base e) of 100: {math.log(100)}")
print(f"Log base 10 of 1000: {math.log10(1000)}")
print(f"Log base 2 of 1024: {math.log2(1024)}")

# --- Trigonometric functions --- 
# Angles are typically in radians
print(f"\n--- Trigonometry ---")
angle_rad = math.pi / 4 # 45 degrees in radians
print(f"Sine of pi/4 radians: {math.sin(angle_rad)}")
print(f"Cosine of pi/4 radians: {math.cos(angle_rad)}")
print(f"Tangent of pi/4 radians: {math.tan(angle_rad)}")

angle_deg = 60
print(f"{angle_deg} degrees in radians: {math.radians(angle_deg)}")
print(f"pi/3 radians in degrees: {math.degrees(math.pi/3)}")

# --- Combinatorics (Python 3.8+) --- 
if hasattr(math, 'comb'):
    print(f"\n--- Combinatorics (Py 3.8+) ---")
    # How many ways to choose 2 items from 5 (order doesn't matter)
    print(f"Combinations (5 choose 2): {math.comb(5, 2)}") 
    # How many ways to arrange 3 items from 5 (order matters)
    print(f"Permutations (5 permute 3): {math.perm(5, 3)}")

Pi (π): 3.141592653589793
Euler's number (e): 2.718281828459045
Infinity (float): inf
Not a Number (float): nan

--- Rounding & Representation ---
Ceiling of 4.7: 5
Floor of 4.7: 4
Ceiling of -4.2: -4
Floor of -4.2: -5
Absolute value of -4.2: 4.2
Truncated value of 4.7: 4
Factorial of 5: 120
Is 10 close to 10.0000001? False
Is 10 close to 10.001? False
Is 10 close to 10.001 (tighter tolerance)? False
Is inf finite? False
Is nan NaN? True
Greatest Common Divisor (GCD) of 48 and 180: 12
Least Common Multiple (LCM) of 12 and 18: 36

--- Power & Logarithms ---
Square root of 16: 4.0
2 to the power of 10: 1024.0
e to the power of 3: 20.085536923187668
Natural logarithm (base e) of 100: 4.605170185988092
Log base 10 of 1000: 3.0
Log base 2 of 1024: 10.0

--- Trigonometry ---
Sine of pi/4 radians: 0.7071067811865475
Cosine of pi/4 radians: 0.7071067811865476
Tangent of pi/4 radians: 0.9999999999999999
60 degrees in radians: 1.0471975511965976
pi/3 radians in degrees: 59.99999999999999

--- Co

### 2.2 `cmath` for Complex Numbers

If you need to work with complex numbers (numbers with a real and imaginary part), use the `cmath` module, which provides complex-aware versions of many `math` functions.

In [8]:
import cmath

z = 3 + 4j # Complex number
print(f"Complex number z: {z}")
print(f"Real part: {z.real}")
print(f"Imaginary part: {z.imag}")

# Note: math.sqrt() would raise ValueError for negative numbers
try:
    root_neg_one_math = math.sqrt(-1)
except ValueError as e:
    print(f"math.sqrt(-1) error: {e}")

# cmath.sqrt() handles complex results
root_neg_one_cmath = cmath.sqrt(-1)
print(f"cmath.sqrt(-1): {root_neg_one_cmath}")

# Phase (angle) and magnitude (polar coordinates)
print(f"Phase of {z}: {cmath.phase(z)} radians")
print(f"Magnitude (abs) of {z}: {abs(z)}") # abs() works for complex too

Complex number z: (3+4j)
Real part: 3.0
Imaginary part: 4.0
math.sqrt(-1) error: math domain error
cmath.sqrt(-1): 1j
Phase of (3+4j): 0.9272952180016122 radians
Magnitude (abs) of (3+4j): 5.0


### 2.3 `math` Best Practices & Pitfalls

**Best Practices:**
*   **Use `math.isclose()` for Float Comparisons:** Avoid direct equality checks (`==`) with floats due to potential precision issues. Use `math.isclose()` with appropriate tolerances.
*   **Understand Radians vs. Degrees:** Trigonometric functions in `math` use radians. Convert using `math.radians()` and `math.degrees()` if needed.
*   **Check Domain:** Be aware of the valid input domain for functions (e.g., `math.sqrt()` requires non-negative input, `math.log()` requires positive input).

**Pitfalls:**
*   **Floating-Point Inaccuracy:** Standard floats have limited precision. Don't expect exact results for all calculations (e.g., `0.1 + 0.2` is not exactly `0.3`). For exact decimal arithmetic, use the `decimal` module.
*   **Domain Errors:** Passing invalid input to functions (e.g., `math.sqrt(-1)`, `math.log(0)`) raises `ValueError`.
*   **Integer Overflow (Less Common in Python):** Python integers have arbitrary precision, but operations involving massive numbers can consume significant memory/time.
*   **Confusion with NumPy:** NumPy provides its own, often faster, array-oriented mathematical functions that operate element-wise on NumPy arrays. Don't mix them up with standard `math` functions if you're not using NumPy arrays.

### 2.4 `math` Interview Questions

1.  How should you compare two floating-point numbers for equality in Python?
2.  What is the difference between `math.ceil()` and `math.floor()`?
3.  What units do Python's trigonometric functions (like `math.sin`) expect for angles?
4.  What does `math.sqrt(-1)` return? What module should you use if you need the complex result?
5.  Name two mathematical constants available in the `math` module.

## 3. `re`: Regular Expressions

**Introduction:** Regular expressions (regex) are sequences of characters that define a search pattern. They are incredibly powerful for finding, extracting, and manipulating text based on complex patterns.

**Real-world Use Cases:**
*   **Input Validation:** Checking if strings match formats like email addresses, URLs, phone numbers, zip codes.
*   **Data Scraping:** Extracting specific information (e.g., prices, dates) from web pages or text documents.
*   **Log File Analysis:** Finding specific error messages, IP addresses, or timestamps in log files.
*   **Search and Replace:** Performing complex substitutions in text.
*   **Parsing:** Breaking down structured text into components.

**Analogy: The Expert Pattern Finder**
Imagine you have a massive library of books (text data). Regular expressions are like giving instructions to an expert librarian who can instantly find:
*   All sentences starting with "To be" (`^To be.*`).
*   All occurrences of phone numbers in a specific format (`\d{3}-\d{3}-\d{4}`).
*   All email addresses (`[\w.-]+@[\w.-]+\.\w+`).
*   Replacing all occurrences of "color" with "colour" (`re.sub('color', 'colour', text)`).

The power lies in defining precise *patterns*, not just literal strings.

### 3.1 Core Functions

*   `re.search(pattern, string, flags=0)`: Scans through `string` looking for the *first* location where `pattern` produces a match. Returns a match object if found, else `None`.
*   `re.match(pattern, string, flags=0)`: Tries to match `pattern` only at the *beginning* of the `string`. Returns a match object if found, else `None`.
*   `re.findall(pattern, string, flags=0)`: Finds *all* non-overlapping matches of `pattern` in `string` and returns them as a list of strings (or tuples if the pattern contains capturing groups).
*   `re.finditer(pattern, string, flags=0)`: Finds all non-overlapping matches and returns an *iterator* yielding match objects. More memory-efficient than `findall` for many matches.
*   `re.sub(pattern, repl, string, count=0, flags=0)`: Replaces occurrences of `pattern` in `string` with `repl`. `repl` can be a string (with backreferences like `\1`, `\g<name>`) or a function.
*   `re.split(pattern, string, maxsplit=0, flags=0)`: Splits `string` by occurrences of `pattern`.
*   `re.compile(pattern, flags=0)`: **Compiles** a regular expression pattern into a regex object. This is **highly recommended for performance** if you use the same pattern multiple times.

### 3.2 Basic Syntax and Metacharacters

**Recommendation:** Use **raw strings** (`r"..."`) for regex patterns to avoid issues with backslashes being interpreted by Python before the regex engine sees them.

In [9]:
import re

text = "The quick brown fox jumps over the lazy dog. Phone: 123-456-7890. Email: test.user@example.com."

# --- Simple Literal Match --- 
pattern_literal = r"fox"
match_literal = re.search(pattern_literal, text)
if match_literal:
    print(f"Found '{match_literal.group(0)}' at index {match_literal.start()}-{match_literal.end()}")

# --- Metacharacters --- 
# '.' - Any character (except newline)
# '^' - Start of string (or line in MULTILINE mode)
# '$' - End of string (or line in MULTILINE mode)
# '*' - 0 or more occurrences of the preceding character/group
# '+' - 1 or more occurrences
# '?' - 0 or 1 occurrence
# '{}' - Specific number of occurrences (e.g., {3}, {2,4}, {5,})
# '[]' - Character set (e.g., [aeiou], [a-zA-Z0-9])
# '|' - OR operator (e.g., fox|dog)
# '()' - Capturing group
# '\' - Escape special characters (e.g., \., \*, \?)

# --- Character Classes --- 
# '\d' - Digit (0-9)
# '\D' - Non-digit
# '\s' - Whitespace (space, tab, newline, etc.)
# '\S' - Non-whitespace
# '\w' - Word character (letters, numbers, underscore)
# '\W' - Non-word character

# --- Examples --- 
print("\n--- Regex Examples ---")

# Find a phone number (North American format)
pattern_phone = r"\d{3}-\d{3}-\d{4}" 
match_phone = re.search(pattern_phone, text)
if match_phone:
    print(f"Found phone number: {match_phone.group(0)}")

# Find all words starting with 'q' or 'l'
pattern_words = r"\b[ql]\w*\b" # \b is word boundary
matches_words = re.findall(pattern_words, text, re.IGNORECASE) # Ignore case flag
print(f"Found words starting with q/l: {matches_words}")

# Find email address
pattern_email = r"[\w\.-]+@[\w\.-]+\.\w+" # Simplified email pattern
match_email = re.search(pattern_email, text)
if match_email:
    print(f"Found email: {match_email.group(0)}")

# Use match() - only matches at the beginning
pattern_start = r"The"
match_start = re.match(pattern_start, text)
match_fail = re.match(r"quick", text) # Won't match
print(f"Match 'The' at start? {'Yes' if match_start else 'No'}")
print(f"Match 'quick' at start? {'Yes' if match_fail else 'No'}")

# Use finditer() - iterating over matches
print("Finding all words (using finditer):")
pattern_all_words = r"\b\w+\b"
for i, match in enumerate(re.finditer(pattern_all_words, text)):
    print(f"  Word {i+1}: {match.group(0)} at index {match.start()}")

Found 'fox' at index 16-19

--- Regex Examples ---
Found phone number: 123-456-7890
Found words starting with q/l: ['quick', 'lazy']
Found email: test.user@example.com
Match 'The' at start? Yes
Match 'quick' at start? No
Finding all words (using finditer):
  Word 1: The at index 0
  Word 2: quick at index 4
  Word 3: brown at index 10
  Word 4: fox at index 16
  Word 5: jumps at index 20
  Word 6: over at index 26
  Word 7: the at index 31
  Word 8: lazy at index 35
  Word 9: dog at index 40
  Word 10: Phone at index 45
  Word 11: 123 at index 52
  Word 12: 456 at index 56
  Word 13: 7890 at index 60
  Word 14: Email at index 66
  Word 15: test at index 73
  Word 16: user at index 78
  Word 17: example at index 83
  Word 18: com at index 91


### 3.3 Groups and Capturing

Parentheses `()` create capturing groups, allowing you to extract specific parts of the matched text.

In [10]:
import re

log_line = "2023-10-26 16:30:15 ERROR [UserThread] Failed login attempt for user 'admin' from 192.168.1.100"

# Pattern to extract timestamp, level, user, and IP
# Using capturing groups ()
pattern_log_simple = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(\w+).*user '(\w+)'.*from (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

match_log_simple = re.search(pattern_log_simple, log_line)

if match_log_simple:
    print("--- Simple Group Capturing ---")
    print(f"Full match (group 0): {match_log_simple.group(0)}")
    print(f"Timestamp (group 1): {match_log_simple.group(1)}")
    print(f"Level (group 2): {match_log_simple.group(2)}")
    print(f"User (group 3): {match_log_simple.group(3)}")
    print(f"IP Address (group 4): {match_log_simple.group(4)}")
    print(f"All groups as tuple: {match_log_simple.groups()}")

# Using Named Groups (?P<name>...)
pattern_log_named = r"(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(?P<level>\w+).*user '(?P<user>\w+)'.*from (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
match_log_named = re.search(pattern_log_named, log_line)

if match_log_named:
    print("\n--- Named Group Capturing ---")
    print(f"Timestamp: {match_log_named.group('timestamp')}")
    print(f"Level: {match_log_named.group('level')}")
    print(f"User: {match_log_named.group('user')}")
    print(f"IP Address: {match_log_named.group('ip')}")
    print(f"All groups as dict: {match_log_named.groupdict()}")

# Non-capturing group (?:...)
text_nums = "Order 123, Item 45; Order 456, Item 78"
pattern_non_capture = r"Order (\d+), (?:Item|Product) (\d+)" # Match 'Item' or 'Product' but don't capture it
matches_non_capture = re.findall(pattern_non_capture, text_nums)
print(f"\n--- Non-Capturing Group (findall) ---")
print(f"Found pairs (Order ID, Item ID): {matches_non_capture}") # Only captures the digits

--- Simple Group Capturing ---
Full match (group 0): 2023-10-26 16:30:15 ERROR [UserThread] Failed login attempt for user 'admin' from 192.168.1.100
Timestamp (group 1): 2023-10-26 16:30:15
Level (group 2): ERROR
User (group 3): admin
IP Address (group 4): 192.168.1.100
All groups as tuple: ('2023-10-26 16:30:15', 'ERROR', 'admin', '192.168.1.100')

--- Named Group Capturing ---
Timestamp: 2023-10-26 16:30:15
Level: ERROR
User: admin
IP Address: 192.168.1.100
All groups as dict: {'timestamp': '2023-10-26 16:30:15', 'level': 'ERROR', 'user': 'admin', 'ip': '192.168.1.100'}

--- Non-Capturing Group (findall) ---
Found pairs (Order ID, Item ID): [('123', '45'), ('456', '78')]


### 3.4 Substitution (`re.sub`)

Powerful replacement using patterns.

In [11]:
import re

text = "Contact us at support@example.com or sales-info@example.co.uk for details."

# Replace all email domains with '[REDACTED]'
# Use backreference \1 to keep the username part
pattern_email_sub = r"([\w\.-]+)@([\w\.-]+\.\w+)"
redacted_text = re.sub(pattern_email_sub, r"\1@[REDACTED]", text)
print(f"Redacted text: {redacted_text}")

# Using a function for replacement (e.g., convert temperatures)
temp_text = "Temperature is 32F today, yesterday was 75F."
def fahrenheit_to_celsius(match_obj):
    fahrenheit = int(match_obj.group(1)) # Group 1 captures the digits
    celsius = (fahrenheit - 32) * 5 / 9
    return f"{celsius:.1f}C" # Return the replacement string

pattern_temp = r"(\d+)F\b" # Capture digits before 'F'
celsius_text = re.sub(pattern_temp, fahrenheit_to_celsius, temp_text)
print(f"Converted temperatures: {celsius_text}")

Redacted text: Contact us at support@[REDACTED] or sales-info@[REDACTED] for details.
Converted temperatures: Temperature is 0.0C today, yesterday was 23.9C.


### 3.5 Compiled Expressions (`re.compile`)

**Performance Best Practice:** If you use a regex pattern multiple times (e.g., inside a loop), compile it first for significant speed improvement.

In [12]:
import re
import timeit

emails = [f"user{i}@test.com" for i in range(1000)] + ["invalid-email"] * 10
pattern_email = r"^[\w\.-]+@[\w\.-]+\.\w+$" # Stricter pattern with ^ $

# --- Without compiling --- 
def validate_emails_no_compile(email_list):
    valid_count = 0
    for email in email_list:
        if re.match(pattern_email, email):
            valid_count += 1
    return valid_count

# --- With compiling --- 
COMPILED_EMAIL_REGEX = re.compile(pattern_email)
def validate_emails_compiled(email_list):
    valid_count = 0
    for email in email_list:
        if COMPILED_EMAIL_REGEX.match(email):
            valid_count += 1
    return valid_count

# --- Timing comparison --- 
iterations = 100
time_no_compile = timeit.timeit(lambda: validate_emails_no_compile(emails), number=iterations)
time_compiled = timeit.timeit(lambda: validate_emails_compiled(emails), number=iterations)

print(f"Time without compiling ({iterations} runs): {time_no_compile:.6f} seconds")
print(f"Time with compiling ({iterations} runs):    {time_compiled:.6f} seconds")
print(f"Compiled is faster by factor: {time_no_compile / time_compiled:.2f}x (approx)")

# Also allows using methods directly on the compiled object
match = COMPILED_EMAIL_REGEX.match("test@domain.com")
if match:
    print("Compiled regex match successful.")

Time without compiling (100 runs): 0.115639 seconds
Time with compiling (100 runs):    0.075046 seconds
Compiled is faster by factor: 1.54x (approx)
Compiled regex match successful.


### 3.6 Verbose Flag (`re.VERBOSE`)

For complex patterns, use the `re.VERBOSE` flag to allow whitespace and comments within the pattern string for greatly improved readability.

In [13]:
import re

pattern_complex = r"^(\d{4}-\d{2}-\d{2})\s+([A-Z]+)\s+\[(.*?)\]\s+(.*)$"

pattern_verbose = r"""
^                 # Start of the string/line
(\d{4}-\d{2}-\d{2}) # Capture YYYY-MM-DD date (Group 1)
\s+               # One or more whitespace characters
([A-Z]+)          # Capture log level (e.g., ERROR) (Group 2)
\s+               # One or more whitespace characters
\[(.*?)\]         # Capture thread name inside [] (non-greedy) (Group 3)
\s+               # One or more whitespace characters
(.*)              # Capture the rest of the message (Group 4)
$                 # End of the string/line
"""

log_entry = "2023-10-27 INFO [MainThread] Application started successfully."

match1 = re.match(pattern_complex, log_entry)
# Use re.VERBOSE flag
match2 = re.match(pattern_verbose, log_entry, re.VERBOSE)

print(f"Match with complex pattern successful: {match1 is not None}")
print(f"Match with verbose pattern successful: {match2 is not None}")
if match2:
    print(f"  Date: {match2.group(1)}")
    print(f"  Level: {match2.group(2)}")
    print(f"  Thread: {match2.group(3)}")
    print(f"  Message: {match2.group(4)}")

Match with complex pattern successful: True
Match with verbose pattern successful: True
  Date: 2023-10-27
  Level: INFO
  Thread: MainThread
  Message: Application started successfully.


### 3.7 `re` Best Practices & Pitfalls

**Best Practices:**
*   **Use Raw Strings (`r"..."`):** Almost always use raw strings for regex patterns.
*   **Compile Often-Used Patterns:** Use `re.compile()` for performance.
*   **Be Specific:** Write patterns that are specific enough to avoid unwanted matches. Avoid overly broad patterns like `.*` where possible.
*   **Use `re.VERBOSE`:** Make complex patterns readable with comments and whitespace.
*   **Test Your Regex:** Use online tools (like regex101.com) or Python's interactive mode to test patterns thoroughly on sample data.
*   **Consider Alternatives:** For simple tasks (like checking prefixes/suffixes, simple splitting), standard string methods (`.startswith()`, `.endswith()`, `.split()`, `.find()`, `.replace()`) might be clearer and faster.
*   **Use Non-Greedy Qualifiers (`*?`, `+?`)**: When you want the *shortest* possible match for `*` or `+`.

**Pitfalls:**
*   **Forgetting Raw Strings:** Backslashes get interpreted by Python first, breaking the pattern.
*   **Greedy Matching:** Default quantifiers (`*`, `+`) match as *much* as possible. Use non-greedy (`*?`, `+?`) or more specific patterns if needed.
*   **Complexity:** Overly complex regexes are hard to read, debug, and maintain.
*   **Catastrophic Backtracking (ReDoS):** Poorly written regexes (especially with nested quantifiers and alternation) can take exponential time on certain inputs, leading to Denial of Service vulnerabilities. Test patterns on edge cases.
*   **`match()` vs `search()`:** Forgetting that `match()` only checks the beginning of the string.
*   **Character Encoding:** Ensure the string being searched and the pattern use compatible encodings (usually less of an issue with standard Python strings, but relevant if reading bytes).

### 3.8 `re` Interview Questions

1.  What are regular expressions used for?
2.  What is the difference between `re.match()` and `re.search()`?
3.  Why should you use raw strings (`r"..."`) for regex patterns?
4.  Explain the meaning of these metacharacters: `.` `*` `+` `?` `[]` `()` `^` `$` `\b`.
5.  How do you capture a group within a regex pattern?
6.  What does `re.findall()` return?
7.  When and why would you use `re.compile()`?
8.  What is the purpose of the `re.IGNORECASE` flag?
9.  How can you replace parts of a string using a regex pattern? (`re.sub`)

## 4. `functools`: Tools for Functions and Callables

**Introduction:** This module provides higher-order functions (functions that operate on or return other functions) and tools for creating, modifying, and introspecting callable objects.

**Real-world Use Cases:**
*   **Decorators:** Modifying or enhancing function behavior (`@wraps`).
*   **Caching/Memoization:** Speeding up expensive function calls (`@lru_cache`, `@cache`).
*   **Partial Function Application:** Creating specialized versions of functions with some arguments pre-filled (`partial`).
*   **Function Composition:** Combining functions (though often done manually or with other libraries).
*   **Generic Functions:** Dispatching function calls based on argument types (`@singledispatch`).

**Analogy: The Function Enhancer Kit**
Think of `functools` as a kit containing special tools to modify and improve your existing Python functions:
*   **`@wraps`:** A label maker that ensures when you wrap a function (like putting it in a new box - a decorator), it still retains its original name tag and instructions.
*   **`@lru_cache` / `@cache`:** A smart sticky note pad that remembers the results of function calls for specific inputs, so you don't have to recalculate them.
*   **`partial`:** A template maker that lets you create pre-filled versions of a function's order form (arguments).
*   **`@singledispatch`:** A smart dispatcher that routes a task to the right specialist function based on the type of item (argument) it receives.

### 4.1 Decorator Helper: `functools.wraps`

**Problem:** When you write a decorator, the decorated function often loses its original metadata (name, docstring, etc.), which can confuse debuggers and documentation tools.
**Solution:** Use `@functools.wraps(original_func)` inside your decorator's wrapper function to copy the essential metadata from the original function to the wrapper.

In [14]:
import functools
import time

# --- Decorator WITHOUT @wraps --- 
def timing_decorator_no_wraps(func):
    def wrapper(*args, **kwargs):
        """Wrapper docstring - This hides the original!"""
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        print(f"[No wraps] {func.__name__} took {end_time - start_time:.6f} seconds")
        return result
    return wrapper

@timing_decorator_no_wraps
def slow_operation_a(n):
    """This is the original docstring for slow_operation_a."""
    time.sleep(n)
    return n * n

print("--- Calling function decorated WITHOUT @wraps ---")
slow_operation_a(0.1)
print(f"Function name: {slow_operation_a.__name__}") # Output: wrapper
print(f"Function docstring: {slow_operation_a.__doc__}") # Output: Wrapper docstring...

# --- Decorator WITH @wraps --- 
def timing_decorator_with_wraps(func):
    @functools.wraps(func) # Apply wraps to the inner wrapper
    def wrapper(*args, **kwargs):
        """Wrapper docstring - not visible externally now."""
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        print(f"[With wraps] {func.__name__} took {end_time - start_time:.6f} seconds")
        return result
    return wrapper

@timing_decorator_with_wraps
def slow_operation_b(n):
    """This is the original docstring for slow_operation_b."""
    time.sleep(n)
    return n + n

print("\n--- Calling function decorated WITH @wraps ---")
slow_operation_b(0.1)
print(f"Function name: {slow_operation_b.__name__}") # Output: slow_operation_b
print(f"Function docstring: {slow_operation_b.__doc__}") # Output: This is the original docstring...



--- Calling function decorated WITHOUT @wraps ---
[No wraps] slow_operation_a took 0.100085 seconds
Function name: wrapper
Function docstring: Wrapper docstring - This hides the original!

--- Calling function decorated WITH @wraps ---
[With wraps] slow_operation_b took 0.100087 seconds
Function name: slow_operation_b
Function docstring: This is the original docstring for slow_operation_b.


**Best Practice:** *Always* use `@functools.wraps` when writing decorators.

### 4.2 Caching: `functools.lru_cache` and `functools.cache` (Py 3.9+)

Memoization technique to automatically cache the results of function calls. If the function is called again with the same arguments, the cached result is returned instantly, avoiding re-computation.

*   `@lru_cache(maxsize=128, typed=False)`: Least Recently Used cache. Stores up to `maxsize` results. When full, discards the least recently used item. `typed=True` treats arguments of different types (e.g., `3` and `3.0`) as distinct cache keys.
*   `@cache`: (Python 3.9+) Simpler version of `lru_cache` with `maxsize=None` (cache grows indefinitely) and `typed=False`.

In [15]:
import functools
import time
import sys

# --- Expensive Fibonacci calculation (recursive) ---
def fibonacci_recursive(n):
    if n < 2:
        return n
    return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)

# --- Apply LRU Cache --- 
@functools.lru_cache(maxsize=128)
def fibonacci_cached(n):
    # print(f"Calculating fibonacci_cached({n})") # Uncomment to see calls
    if n < 2:
        return n
    return fibonacci_cached(n-1) + fibonacci_cached(n-2)

# --- Apply Simple Cache (Python 3.9+) --- 
if sys.version_info >= (3, 9):
    @functools.cache
    def factorial_cached(n):
        # print(f"Calculating factorial_cached({n})") # Uncomment to see calls
        return n * factorial_cached(n-1) if n else 1
else:
    # Fallback for older Python using lru_cache
    @functools.lru_cache(maxsize=None) 
    def factorial_cached(n):
        # print(f"Calculating factorial_cached({n}) - fallback")
        return n * factorial_cached(n-1) if n else 1

# --- Timing Comparisons --- 
print("--- Fibonacci Calculations (n=35) ---")
start = time.perf_counter()
result_recursive = fibonacci_recursive(35) 
end = time.perf_counter()
print(f"Recursive Fibonacci took: {end - start:.6f} seconds. Result: {result_recursive}")

start = time.perf_counter()
result_cached = fibonacci_cached(35)
end = time.perf_counter()
print(f"Cached Fibonacci took:    {end - start:.6f} seconds. Result: {result_cached}")

# Call again - should be near instant
start = time.perf_counter()
result_cached_again = fibonacci_cached(35)
end = time.perf_counter()
print(f"Cached Fibonacci (2nd call): {end - start:.6f} seconds. Result: {result_cached_again}")

print(f"Cache Info: {fibonacci_cached.cache_info()}")
fibonacci_cached.cache_clear() # Clear the cache if needed
print(f"Cache Info after clear: {fibonacci_cached.cache_info()}")


print("\n--- Factorial Calculations (n=20) ---")
start = time.perf_counter()
result_fact = factorial_cached(20)
end = time.perf_counter()
print(f"Cached Factorial took: {end - start:.6f} seconds. Result: {result_fact}")

start = time.perf_counter()
result_fact_again = factorial_cached(20)
end = time.perf_counter()
print(f"Cached Factorial (2nd call): {end - start:.6f} seconds. Result: {result_fact_again}")

print(f"Cache Info: {factorial_cached.cache_info()}")

# Note: Arguments to cached functions must be hashable.

--- Fibonacci Calculations (n=35) ---
Recursive Fibonacci took: 1.573020 seconds. Result: 9227465
Cached Fibonacci took:    0.000096 seconds. Result: 9227465
Cached Fibonacci (2nd call): 0.000040 seconds. Result: 9227465
Cache Info: CacheInfo(hits=34, misses=36, maxsize=128, currsize=36)
Cache Info after clear: CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)

--- Factorial Calculations (n=20) ---
Cached Factorial took: 0.000047 seconds. Result: 2432902008176640000
Cached Factorial (2nd call): 0.000041 seconds. Result: 2432902008176640000
Cache Info: CacheInfo(hits=1, misses=21, maxsize=None, currsize=21)


### 4.3 Partial Function Application: `functools.partial`

Creates a new callable object with some of the arguments of the original function pre-filled.

In [16]:
import functools

def power(base, exponent):
    """Calculates base to the power of exponent."""
    return base ** exponent

# Create specialized versions of the power function
square = functools.partial(power, exponent=2)
cube = functools.partial(power, exponent=3)

# Create a version with a fixed base
power_of_two = functools.partial(power, 2) # Fills 'base' argument

print(f"Square of 5: {square(5)}") # Only need to provide 'base'
print(f"Cube of 5: {cube(5)}")
print(f"Power of 2 to 8: {power_of_two(8)}") # Only need to provide 'exponent'

# --- Use Case: Callbacks --- 
def process_event(event_type, data, callback):
    print(f"Processing event '{event_type}' with data: {data}")
    # Simulate processing
    result = f"Processed {data}"
    callback(result) # Call the provided callback

def handle_success(result_data):
    print(f"Success Handler: Received '{result_data}'")

def handle_error(error_code, message):
    print(f"Error Handler: Code {error_code}, Message: '{message}'")

# Create a partial function for the error handler with a specific code
specific_error_handler = functools.partial(handle_error, 404)

print("\n--- Partial for Callbacks ---")
process_event("USER_LOGIN", {"user": "alice"}, handle_success)
process_event("FILE_NOT_FOUND", {"path": "/a/b/c"}, specific_error_handler) # Only needs 'message'

Square of 5: 25
Cube of 5: 125
Power of 2 to 8: 256

--- Partial for Callbacks ---
Processing event 'USER_LOGIN' with data: {'user': 'alice'}
Success Handler: Received 'Processed {'user': 'alice'}'
Processing event 'FILE_NOT_FOUND' with data: {'path': '/a/b/c'}
Error Handler: Code 404, Message: 'Processed {'path': '/a/b/c'}'


### 4.4 Reduction: `functools.reduce`

Applies a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. 

**Note:** While powerful, `reduce` can often be less readable than an explicit loop or built-in functions like `sum()` for simple cases. Use judiciously.

In [17]:
import functools
import operator # Provides functions corresponding to operators

numbers = [1, 2, 3, 4, 5]

# Calculate sum using reduce
sum_result = functools.reduce(lambda x, y: x + y, numbers)
# Equivalent using operator module (often slightly faster/clearer)
sum_result_op = functools.reduce(operator.add, numbers)
# Easiest way: use built-in sum()
sum_result_builtin = sum(numbers)
print(f"Sum using reduce (lambda): {sum_result}")
print(f"Sum using reduce (operator.add): {sum_result_op}")
print(f"Sum using built-in sum(): {sum_result_builtin}")

# Calculate product using reduce
product_result = functools.reduce(operator.mul, numbers)
print(f"Product using reduce (operator.mul): {product_result}")

# Find the maximum value
max_result = functools.reduce(lambda x, y: x if x > y else y, numbers)
# Easiest way: use built-in max()
max_result_builtin = max(numbers)
print(f"Max using reduce: {max_result}")
print(f"Max using built-in max(): {max_result_builtin}")

Sum using reduce (lambda): 15
Sum using reduce (operator.add): 15
Sum using built-in sum(): 15
Product using reduce (operator.mul): 120
Max using reduce: 5
Max using built-in max(): 5


### 4.5 Generic Functions: `functools.singledispatch`

Allows a function to have different implementations based on the type of its *first* argument. Similar to function overloading in other languages, but based on runtime types.

In [18]:
import functools
from decimal import Decimal

# Define the generic function using the decorator
@functools.singledispatch
def format_value(arg):
    """Default implementation for unrecognized types."""
    print(f"(Default) Formatting value: {arg}")
    return str(arg)

# Register specific implementations for different types
@format_value.register(int)
def _(arg):
    """Implementation for integers."""
    print(f"(Int) Formatting integer: {arg}")
    return f"Integer: {arg:,}" # Add comma separators

@format_value.register(float)
def _(arg):
    """Implementation for floats."""
    print(f"(Float) Formatting float: {arg}")
    return f"Float: {arg:.2f}" # Format to 2 decimal places

# Can register for complex types too
@format_value.register(list)
def _(arg):
    print(f"(List) Formatting list: {arg}")
    return f"List with {len(arg)} items: [{', '.join(map(str, arg))}]"

# You can also register functions separately
def _format_decimal(arg):
    print(f"(Decimal) Formatting Decimal: {arg}")
    return f"Decimal: {arg.normalize()}"
format_value.register(Decimal, _format_decimal)

# --- Calling the generic function --- 
print("--- Single Dispatch Examples ---")
formatted1 = format_value(1234567)
formatted2 = format_value(9876.54321)
formatted3 = format_value([1, 'a', 3.0])
formatted4 = format_value("a string") # Uses default
formatted5 = format_value(Decimal("1234.5600"))

print("--- Results ---")
print(formatted1)
print(formatted2)
print(formatted3)
print(formatted4)
print(formatted5)

--- Single Dispatch Examples ---
(Int) Formatting integer: 1234567
(Float) Formatting float: 9876.54321
(List) Formatting list: [1, 'a', 3.0]
(Default) Formatting value: a string
(Decimal) Formatting Decimal: 1234.5600
--- Results ---
Integer: 1,234,567
Float: 9876.54
List with 3 items: [1, a, 3.0]
a string
Decimal: 1234.56


### 4.6 `functools` Best Practices & Pitfalls

**Best Practices:**
*   **`@wraps` is Essential:** Always use `@wraps` in decorators to preserve function metadata.
*   **Choose Cache Wisely:** Use `@cache` (3.9+) for simple unbounded caching. Use `@lru_cache` if you need a size limit or typed caching.
*   **`partial` Readability:** Use `partial` to create cleaner code when passing callbacks or specializing functions, rather than complex lambdas.
*   **`reduce` Sparingly:** Prefer explicit loops or built-ins (`sum`, `max`, `min`, `all`, `any`) over `reduce` for common operations as they are often more readable.
*   **`singledispatch` for Type Overloading:** Use `@singledispatch` when you need a function to behave differently based on the type of its first argument, offering an alternative to complex `if/elif/else` type checks.

**Pitfalls:**
*   **Forgetting `@wraps`:** Leads to confusing debugging and documentation issues.
*   **`lru_cache` with Unhashable Args:** Arguments to cached functions must be hashable (e.g., lists or dicts cannot be arguments directly unless converted to tuples/frozensets).
*   **Cache Invalidation:** Caches (`lru_cache`, `cache`) live for the lifetime of the function object. Be mindful of stale data if the underlying source the function depends on changes.
*   **Mutable Defaults with `partial`:** If a partial function pre-fills an argument with a mutable object, that object is shared across all calls to the partial function.
*   **Overly Complex `reduce`:** Can quickly become hard to understand compared to a simple loop.

### 4.7 `functools` Interview Questions

1.  What problem does `functools.wraps` solve when writing decorators?
2.  What is memoization, and how does `functools.lru_cache` help achieve it?
3.  What are the main differences between `@lru_cache` and `@cache` (Python 3.9+)?
4.  What does `functools.partial` do? Give a use case.
5.  What is the purpose of `functools.reduce`? When might it be less preferred than a loop?
6.  What problem does `functools.singledispatch` address?

## 5. Combined Challenge: Log File Analysis

**Goal:** Write a script that parses a log file, extracts specific information using regex, converts timestamps, and calculates the time difference between the first and last relevant log entries.

**Tasks:**

1.  **Create Sample Log File (`app.log`):** Create a text file with entries like:
    ```
    2023-11-01 10:05:15 INFO [MainThread] Service started.
    2023-11-01 10:05:20 DEBUG [Worker-1] Processing item 1...
    2023-11-01 10:05:22 INFO [Worker-1] Item 1 processed successfully.
    2023-11-01 10:06:05 WARNING [MainThread] Low disk space detected.
    2023-11-01 10:07:10 ERROR [DBThread] Database connection failed. Retrying...
    2023-11-01 10:07:12 INFO [DBThread] Database connection successful.
    2023-11-01 10:08:00 INFO [MainThread] Service shutting down.
    INVALID LOG ENTRY
    ```
2.  **Define Regex:** Create a compiled regular expression (`re.compile`) to capture the timestamp (YYYY-MM-DD HH:MM:SS) and the log level (e.g., INFO, ERROR) from valid log lines. Use named groups.
3.  **Process File:**
    *   Read the log file line by line (`pathlib` or `open` with `with`).
    *   Use the compiled regex's `search()` method to find matches on each line.
    *   If a line matches:
        *   Extract the timestamp string and log level using group names.
        *   Use `datetime.strptime()` to convert the timestamp string into a **naive** `datetime` object.
        *   Keep track of the first and last valid timestamps encountered.
        *   (Optional) Count occurrences of each log level.
    *   Handle potential `IOError` and regex `AttributeError` (if search returns None).
4.  **Calculate Duration:** If both a first and last timestamp were found, calculate the `timedelta` between them.
5.  **Output:** Print the total number of valid log entries processed, the first timestamp, the last timestamp, and the total duration. (Optional: Print log level counts).
6.  **(Bonus using `functools`):** If you were repeatedly parsing timestamps in the *exact same string format*, how could `@lru_cache` be applied to potentially speed up the `datetime.strptime` conversion (assuming identical timestamp strings appear multiple times)?

In [19]:
# --- Solution Space for Challenge ---
import re
from datetime import datetime, timedelta
from pathlib import Path
import logging
from collections import Counter
import functools # For bonus

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s', force=True)

# 1. Create Sample Log File
log_content = """
2023-11-01 10:05:15 INFO [MainThread] Service started.
2023-11-01 10:05:20 DEBUG [Worker-1] Processing item 1...
2023-11-01 10:05:22 INFO [Worker-1] Item 1 processed successfully.
2023-11-01 10:06:05 WARNING [MainThread] Low disk space detected.
2023-11-01 10:07:10 ERROR [DBThread] Database connection failed. Retrying...
2023-11-01 10:07:12 INFO [DBThread] Database connection successful.
2023-11-01 10:08:00 INFO [MainThread] Service shutting down.
INVALID LOG ENTRY
2023-11-01 10:07:10 ERROR [DBThread] Another error for testing counts.
"""
log_file_path = Path("app.log")
try:
    log_file_path.write_text(log_content.strip(), encoding='utf-8')
    logging.info(f"Created sample log file: {log_file_path}")
except IOError as e:
    logging.error(f"Failed to write log file: {e}")
    # Exit if file can't be written for the test
    exit()

# 2. Define Regex
LOG_ENTRY_PATTERN = re.compile(
    r"^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+"
    r"(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)"
    # Optional: r"\s+\[(?P<thread>.*?)\]\s+(?P<message>.*)$"
)
TIMESTAMP_FORMAT = "%Y-%m-%d %H:%M:%S"

# Bonus: Cached strptime function
@functools.lru_cache(maxsize=1024) # Cache up to 1024 unique timestamp strings
def parse_timestamp_cached(ts_string, fmt):
    # print(f"Parsing timestamp: {ts_string}") # Uncomment to see cache misses
    return datetime.strptime(ts_string, fmt)

def analyze_log_file(filepath: Path):
    first_timestamp: datetime | None = None
    last_timestamp: datetime | None = None
    valid_entry_count = 0
    level_counts = Counter()

    logging.info(f"Analyzing log file: {filepath}")
    try:
        with filepath.open('r', encoding='utf-8') as f:
            for line_num, line in enumerate(f, 1):
                match = LOG_ENTRY_PATTERN.search(line)
                if match:
                    valid_entry_count += 1
                    data = match.groupdict()
                    timestamp_str = data['timestamp']
                    level = data['level']
                    level_counts[level] += 1
                    
                    try:
                        # Use the cached version for parsing
                        current_dt = parse_timestamp_cached(timestamp_str, TIMESTAMP_FORMAT)
                        # Or without cache: 
                        # current_dt = datetime.strptime(timestamp_str, TIMESTAMP_FORMAT)
                        
                        if first_timestamp is None:
                            first_timestamp = current_dt
                        # Always update last_timestamp (or check if > current last)
                        # Assuming logs are mostly chronological, just assigning is fine
                        last_timestamp = current_dt 
                        
                    except ValueError:
                        logging.warning(f"Line {line_num}: Invalid date format '{timestamp_str}' despite regex match.")
                # else: # Optional: Log lines that don't match the pattern
                    # logging.debug(f"Line {line_num}: Skipping line, no pattern match.")

    except FileNotFoundError:
        logging.error(f"Log file not found: {filepath}")
        return
    except IOError as e:
        logging.error(f"Error reading log file {filepath}: {e}")
        return
    except Exception as e:
        logging.exception(f"An unexpected error occurred during analysis: {e}")
        return

    # 4. Calculate Duration & 5. Output
    print("\n--- Log Analysis Results ---")
    print(f"Processed {valid_entry_count} valid log entries.")
    
    if first_timestamp and last_timestamp:
        print(f"First Timestamp: {first_timestamp.strftime(TIMESTAMP_FORMAT)}")
        print(f"Last Timestamp:  {last_timestamp.strftime(TIMESTAMP_FORMAT)}")
        duration = last_timestamp - first_timestamp
        print(f"Total Duration:  {duration} (Total Seconds: {duration.total_seconds():.2f})")
    else:
        print("No valid timestamps found to calculate duration.")
        
    print("\nLog Level Counts:")
    for level, count in level_counts.items():
        print(f"  {level}: {count}")

# Run the analysis
analyze_log_file(log_file_path)

INFO: Created sample log file: app.log
INFO: Analyzing log file: app.log



--- Log Analysis Results ---
Processed 8 valid log entries.
First Timestamp: 2023-11-01 10:05:15
Last Timestamp:  2023-11-01 10:07:10
Total Duration:  0:01:55 (Total Seconds: 115.00)

Log Level Counts:
  INFO: 4
  DEBUG: 1
  ERROR: 2


## 6. Conclusion

The Python Standard Library is a treasure trove of powerful tools. `datetime`, `math`, `re`, and `functools` are prime examples, providing essential building blocks for a vast range of programming tasks.

*   `datetime` allows precise control over dates, times, and timezones.
*   `math` offers fundamental mathematical operations.
*   `re` provides sophisticated text pattern matching and manipulation.
*   `functools` enhances function capabilities through caching, partial application, and decorator support.

By understanding their capabilities, best practices, and potential pitfalls, you can leverage these modules to write more efficient, robust, and readable Python code. Continuously exploring the standard library is key to becoming a more effective Python developer.