# Module 4: Functions

## Topics Covered
1. Defining and Calling Functions
2. Parameters and Arguments (*args, **kwargs)
3. Return Values
4. Lambda Functions
5. Built-in Functions for Data Science
6. Scope and Namespaces
7. Decorators (Introduction)

## Learning Objectives

By the end of this module, you will be able to:
- Define and call your own functions to organize code
- Work with different types of function parameters
- Return single and multiple values from functions
- Use lambda functions for quick, one-line operations
- Apply built-in functions commonly used in data science
- Understand variable scope and namespaces
- Create simple decorators to extend function behavior

---

---
# Section 1: Defining and Calling Functions
---

## What is a Function?

A **function** is a reusable block of code that performs a specific task. Instead of writing the same code multiple times, you write it once as a function and call it whenever needed.

Think of a function like a recipe: you define the steps once, and then you can follow that recipe (call the function) whenever you want to make that dish.

### Why This Matters in Data Science

Functions are essential for:
- **Reusability**: Write data cleaning code once, use it on multiple datasets
- **Organization**: Break complex analysis into manageable pieces
- **Testing**: Test individual pieces of logic independently
- **Collaboration**: Share well-defined functions with teammates
- **Maintainability**: Fix bugs in one place, not dozens

## Syntax

```python
def function_name(parameters):
    """Docstring: Describes what the function does."""
    # Function body - code to execute
    return result  # Optional

# Calling the function
result = function_name(arguments)
```

**Components:**
- `def`: Keyword that starts a function definition
- `function_name`: Name you choose (follow snake_case convention)
- `parameters`: Inputs the function accepts (optional)
- `docstring`: Documentation describing the function
- `return`: Sends a value back to the caller (optional)

In [None]:
# Example: A simple function with no parameters

def greet():
    """Print a greeting message."""
    print("Hello! Welcome to data science.")

# Call the function
greet()
greet()  # Can call multiple times

In [None]:
# Example: Function with a parameter

def greet_user(name):
    """Print a personalized greeting."""
    print(f"Hello, {name}! Welcome to data science.")

# Call with different arguments
greet_user("Alice")
greet_user("Bob")
greet_user("Data Analyst")

In [None]:
# Example: Function with multiple parameters

def calculate_total(price, quantity):
    """Calculate total cost for items."""
    total = price * quantity
    print(f"{quantity} items at ${price:.2f} each = ${total:.2f}")

calculate_total(29.99, 3)
calculate_total(149.99, 2)

In [None]:
# Example: Function that returns a value

def calculate_average(numbers):
    """Calculate and return the average of a list of numbers."""
    total = sum(numbers)
    count = len(numbers)
    average = total / count
    return average

# Store the returned value
sales = [1500, 2200, 1800, 2500, 1900]
avg_sales = calculate_average(sales)
print(f"Sales data: {sales}")
print(f"Average sales: ${avg_sales:.2f}")

In [None]:
# Example: Data science function - calculate percentage change

def percentage_change(old_value, new_value):
    """Calculate percentage change between two values."""
    change = ((new_value - old_value) / old_value) * 100
    return change

# Compare monthly sales
jan_sales = 45000
feb_sales = 52000

growth = percentage_change(jan_sales, feb_sales)
print(f"January: ${jan_sales:,}")
print(f"February: ${feb_sales:,}")
print(f"Growth: {growth:.1f}%")

In [None]:
# Example: Function with docstring

def calculate_bmi(weight_kg, height_m):
    """
    Calculate Body Mass Index (BMI).
    
    Parameters:
        weight_kg (float): Weight in kilograms
        height_m (float): Height in meters
    
    Returns:
        float: BMI value
    """
    bmi = weight_kg / (height_m ** 2)
    return bmi

# Use the function
my_bmi = calculate_bmi(70, 1.75)
print(f"BMI: {my_bmi:.1f}")

# Access the docstring
print("\nFunction documentation:")
print(calculate_bmi.__doc__)

## Practice Exercise 1.1

**Task:** Create a function called `calculate_profit` that:
- Takes two parameters: `revenue` and `costs`
- Calculates profit (revenue - costs)
- Calculates profit margin ((profit / revenue) * 100)
- Prints both values formatted nicely

Test with: revenue = 150000, costs = 95000

**Expected Output:**
```
Revenue: $150,000
Costs: $95,000
Profit: $55,000
Profit Margin: 36.7%
```

In [None]:
# Your code here


In [None]:
# Solution 1.1

def calculate_profit(revenue, costs):
    """Calculate and display profit metrics."""
    profit = revenue - costs
    profit_margin = (profit / revenue) * 100
    
    print(f"Revenue: ${revenue:,}")
    print(f"Costs: ${costs:,}")
    print(f"Profit: ${profit:,}")
    print(f"Profit Margin: {profit_margin:.1f}%")

# Test the function
calculate_profit(150000, 95000)

## Practice Exercise 1.2

**Task:** Create a function called `analyze_sales` that:
- Takes a list of sales numbers as a parameter
- Returns a dictionary with: total, average, minimum, maximum, and count

Test with: `[1200, 1800, 1500, 2200, 1900, 2100]`

**Expected Output:**
```
{'total': 10700, 'average': 1783.33, 'minimum': 1200, 'maximum': 2200, 'count': 6}
```

In [None]:
# Your code here


In [None]:
# Solution 1.2

def analyze_sales(sales):
    """Analyze a list of sales and return summary statistics."""
    return {
        'total': sum(sales),
        'average': round(sum(sales) / len(sales), 2),
        'minimum': min(sales),
        'maximum': max(sales),
        'count': len(sales)
    }

# Test the function
sales_data = [1200, 1800, 1500, 2200, 1900, 2100]
result = analyze_sales(sales_data)
print(result)

---
# Section 2: Parameters and Arguments (*args, **kwargs)
---

## Types of Parameters

Python functions support several types of parameters:

1. **Positional parameters**: Matched by position
2. **Keyword arguments**: Matched by name
3. **Default parameters**: Have preset values
4. **`*args`**: Accept any number of positional arguments
5. **`**kwargs`**: Accept any number of keyword arguments

### Why This Matters in Data Science

Flexible parameters allow you to:
- Create functions that handle varying amounts of data
- Set sensible defaults for common use cases
- Build APIs that are easy to use but powerful

In [None]:
# Example: Positional vs keyword arguments

def describe_product(name, price, category):
    """Display product information."""
    print(f"Product: {name}")
    print(f"Price: ${price:.2f}")
    print(f"Category: {category}")

# Positional arguments (order matters)
print("Using positional arguments:")
describe_product("Laptop", 999.99, "Electronics")

print("\nUsing keyword arguments:")
# Keyword arguments (order doesn't matter)
describe_product(category="Electronics", name="Mouse", price=29.99)

In [None]:
# Example: Default parameter values

def calculate_tax(amount, tax_rate=0.08):
    """Calculate tax with optional tax rate (default 8%)."""
    tax = amount * tax_rate
    total = amount + tax
    return total

# Use default tax rate
price = 100
print(f"Price: ${price}")
print(f"With default tax (8%): ${calculate_tax(price):.2f}")

# Override with custom rate
print(f"With 10% tax: ${calculate_tax(price, 0.10):.2f}")
print(f"With 5% tax: ${calculate_tax(price, tax_rate=0.05):.2f}")

In [None]:
# Example: Multiple default parameters

def format_currency(amount, symbol="$", decimals=2, thousands_sep=True):
    """Format a number as currency with customizable options."""
    if thousands_sep:
        formatted = f"{symbol}{amount:,.{decimals}f}"
    else:
        formatted = f"{symbol}{amount:.{decimals}f}"
    return formatted

amount = 1234567.89

print(format_currency(amount))                          # All defaults
print(format_currency(amount, symbol="EUR "))           # Change symbol
print(format_currency(amount, decimals=0))              # No decimals
print(format_currency(amount, thousands_sep=False))     # No separator

In [None]:
# Example: *args - variable positional arguments

def calculate_sum(*numbers):
    """Sum any number of values."""
    print(f"Received: {numbers}")  # numbers is a tuple
    return sum(numbers)

# Call with different numbers of arguments
print(f"Sum of 1, 2: {calculate_sum(1, 2)}")
print(f"Sum of 1, 2, 3, 4, 5: {calculate_sum(1, 2, 3, 4, 5)}")
print(f"Sum of 10, 20, 30, 40, 50, 60: {calculate_sum(10, 20, 30, 40, 50, 60)}")

In [None]:
# Example: Practical use of *args

def calculate_average(*values):
    """Calculate average of any number of values."""
    if len(values) == 0:
        return 0
    return sum(values) / len(values)

print(f"Average of 85, 90, 78: {calculate_average(85, 90, 78):.1f}")
print(f"Average of 100, 95: {calculate_average(100, 95):.1f}")

# Unpack a list with *
scores = [88, 92, 85, 90, 87]
print(f"Average of {scores}: {calculate_average(*scores):.1f}")

In [None]:
# Example: **kwargs - variable keyword arguments

def create_profile(**info):
    """Create a profile from any keyword arguments."""
    print("Profile:")
    for key, value in info.items():
        print(f"  {key}: {value}")

# Call with different keyword arguments
create_profile(name="Alice", age=30, city="New York")
print()
create_profile(name="Bob", role="Data Scientist", company="TechCorp", experience=5)

In [None]:
# Example: Combining all parameter types

def process_data(data_name, *values, multiplier=1, **metadata):
    """
    Process data with flexible inputs.
    
    Parameters:
        data_name: Name of the dataset (required)
        *values: Any number of numeric values
        multiplier: Factor to multiply results (default 1)
        **metadata: Any additional info about the data
    """
    print(f"Dataset: {data_name}")
    print(f"Values: {values}")
    
    if values:
        result = sum(values) * multiplier
        print(f"Sum (x{multiplier}): {result}")
    
    if metadata:
        print("Metadata:")
        for key, value in metadata.items():
            print(f"  {key}: {value}")

# Use the flexible function
process_data("Sales Q1", 100, 200, 300, multiplier=2, source="CRM", year=2024)

## Practice Exercise 2.1

**Task:** Create a function called `create_report` that:
- Has a required parameter: `title`
- Has a default parameter: `author="Unknown"`
- Accepts any number of data points using `*data`
- Prints a formatted report

Test with: `create_report("Sales Report", 100, 200, 150, 300, author="Alice")`

**Expected Output:**
```
=== Sales Report ===
Author: Alice
Data points: 4
Values: (100, 200, 150, 300)
Total: 750
```

In [None]:
# Your code here


In [None]:
# Solution 2.1

def create_report(title, *data, author="Unknown"):
    """Create a formatted report with flexible data."""
    print(f"=== {title} ===")
    print(f"Author: {author}")
    print(f"Data points: {len(data)}")
    print(f"Values: {data}")
    if data:
        print(f"Total: {sum(data)}")

# Test the function
create_report("Sales Report", 100, 200, 150, 300, author="Alice")

## Practice Exercise 2.2

**Task:** Create a function called `build_query` that:
- Takes a required `table_name` parameter
- Accepts any filter conditions as `**filters`
- Returns a simple query-like string

Test with: `build_query("customers", status="active", country="USA", min_orders=5)`

**Expected Output:**
```
SELECT * FROM customers WHERE status='active' AND country='USA' AND min_orders=5
```

In [None]:
# Your code here


In [None]:
# Solution 2.2

def build_query(table_name, **filters):
    """Build a simple query string from table name and filters."""
    query = f"SELECT * FROM {table_name}"
    
    if filters:
        conditions = [f"{key}='{value}'" if isinstance(value, str) else f"{key}={value}" 
                      for key, value in filters.items()]
        query += " WHERE " + " AND ".join(conditions)
    
    return query

# Test the function
result = build_query("customers", status="active", country="USA", min_orders=5)
print(result)

---
# Section 3: Return Values
---

## Understanding Return Values

The `return` statement sends a value back to the code that called the function. A function can:
- Return nothing (implicitly returns `None`)
- Return a single value
- Return multiple values (as a tuple)
- Return early based on conditions

### Why This Matters in Data Science

Return values allow you to:
- Chain operations together
- Store results for later use
- Return multiple metrics from analysis functions
- Build data transformation pipelines

In [None]:
# Example: Function with no return (returns None)

def print_header(title):
    """Print a formatted header."""
    print("=" * 40)
    print(title.center(40))
    print("=" * 40)

result = print_header("Sales Report")
print(f"\nReturn value: {result}")
print(f"Type: {type(result)}")

In [None]:
# Example: Returning multiple values

def get_statistics(numbers):
    """Calculate and return multiple statistics."""
    total = sum(numbers)
    count = len(numbers)
    average = total / count
    minimum = min(numbers)
    maximum = max(numbers)
    
    return total, average, minimum, maximum  # Returns a tuple

data = [23, 45, 67, 89, 12, 34, 56, 78]

# Get all values as tuple
stats = get_statistics(data)
print(f"Stats tuple: {stats}")

# Unpack into separate variables
total, avg, min_val, max_val = get_statistics(data)
print(f"\nTotal: {total}")
print(f"Average: {avg:.2f}")
print(f"Min: {min_val}")
print(f"Max: {max_val}")

In [None]:
# Example: Return a dictionary for named results

def analyze_dataset(data):
    """Analyze data and return results as a dictionary."""
    return {
        'count': len(data),
        'sum': sum(data),
        'mean': sum(data) / len(data),
        'min': min(data),
        'max': max(data),
        'range': max(data) - min(data)
    }

sales = [1500, 2200, 1800, 3100, 2700]
analysis = analyze_dataset(sales)

print("Sales Analysis:")
for metric, value in analysis.items():
    if isinstance(value, float):
        print(f"  {metric}: {value:.2f}")
    else:
        print(f"  {metric}: {value}")

In [None]:
# Example: Early return for validation

def calculate_growth_rate(old_value, new_value):
    """Calculate percentage growth rate with validation."""
    # Early return for invalid input
    if old_value == 0:
        return None  # Can't divide by zero
    
    if old_value < 0 or new_value < 0:
        return None  # Negative values don't make sense here
    
    # Normal calculation
    growth = ((new_value - old_value) / old_value) * 100
    return growth

# Test with various inputs
print(f"100 to 150: {calculate_growth_rate(100, 150)}%")
print(f"0 to 100: {calculate_growth_rate(0, 100)}")
print(f"200 to 180: {calculate_growth_rate(200, 180)}%")

In [None]:
# Example: Returning different types based on conditions

def safe_divide(a, b, return_tuple=False):
    """Divide two numbers safely with optional error info."""
    if b == 0:
        if return_tuple:
            return None, "Division by zero error"
        return None
    
    result = a / b
    if return_tuple:
        return result, "Success"
    return result

# Simple usage
print(f"10 / 2 = {safe_divide(10, 2)}")
print(f"10 / 0 = {safe_divide(10, 0)}")

# With status info
result, status = safe_divide(10, 2, return_tuple=True)
print(f"\nResult: {result}, Status: {status}")

result, status = safe_divide(10, 0, return_tuple=True)
print(f"Result: {result}, Status: {status}")

## Practice Exercise 3.1

**Task:** Create a function called `grade_score` that:
- Takes a numeric score (0-100)
- Returns a tuple: (letter_grade, pass_status)
- Grading: A (90+), B (80-89), C (70-79), D (60-69), F (below 60)
- Pass status: "Pass" if grade is D or above, "Fail" otherwise

Test with scores: 95, 72, 55

**Expected Output:**
```
Score 95: ('A', 'Pass')
Score 72: ('C', 'Pass')
Score 55: ('F', 'Fail')
```

In [None]:
# Your code here


In [None]:
# Solution 3.1

def grade_score(score):
    """Convert a numeric score to letter grade and pass status."""
    if score >= 90:
        grade = 'A'
    elif score >= 80:
        grade = 'B'
    elif score >= 70:
        grade = 'C'
    elif score >= 60:
        grade = 'D'
    else:
        grade = 'F'
    
    status = 'Pass' if grade != 'F' else 'Fail'
    
    return grade, status

# Test the function
for score in [95, 72, 55]:
    print(f"Score {score}: {grade_score(score)}")

## Practice Exercise 3.2

**Task:** Create a function called `validate_email` that:
- Takes an email string
- Returns a dictionary with: `valid` (bool), `username`, `domain`, `error` (if any)
- An email is valid if it contains exactly one "@" and at least one "." after the @

Test with: "alice@example.com", "invalid-email", "bob@domain"

**Expected Output:**
```
alice@example.com: {'valid': True, 'username': 'alice', 'domain': 'example.com', 'error': None}
invalid-email: {'valid': False, 'username': None, 'domain': None, 'error': 'Missing @ symbol'}
bob@domain: {'valid': False, 'username': None, 'domain': None, 'error': 'Invalid domain'}
```

In [None]:
# Your code here


In [None]:
# Solution 3.2

def validate_email(email):
    """Validate an email and return detailed results."""
    result = {'valid': False, 'username': None, 'domain': None, 'error': None}
    
    # Check for @ symbol
    if email.count('@') != 1:
        result['error'] = 'Missing @ symbol'
        return result
    
    # Split email
    username, domain = email.split('@')
    
    # Check domain has a dot
    if '.' not in domain:
        result['error'] = 'Invalid domain'
        return result
    
    # Valid email
    result['valid'] = True
    result['username'] = username
    result['domain'] = domain
    
    return result

# Test the function
test_emails = ["alice@example.com", "invalid-email", "bob@domain"]
for email in test_emails:
    print(f"{email}: {validate_email(email)}")

---
# Section 4: Lambda Functions
---

## What are Lambda Functions?

A **lambda function** is a small, anonymous function defined in a single line. They're useful for short operations where a full function definition would be overkill.

Think of lambdas as "throwaway" functions – quick tools for simple tasks.

### Why This Matters in Data Science

Lambda functions are everywhere in data science:
- Pandas `apply()`, `map()`, and `transform()`
- Sorting with custom keys
- Filtering data
- Quick data transformations

## Syntax

```python
lambda arguments: expression
```

**Key points:**
- `lambda`: Keyword to define the function
- `arguments`: Input parameters (comma-separated)
- `expression`: Single expression that gets returned
- No `return` keyword needed – the expression result is automatically returned

In [None]:
# Example: Regular function vs lambda

# Regular function
def square(x):
    return x ** 2

# Equivalent lambda
square_lambda = lambda x: x ** 2

print(f"Regular function: {square(5)}")
print(f"Lambda function: {square_lambda(5)}")

In [None]:
# Example: Lambda with multiple arguments

# Calculate total price
calculate_total = lambda price, quantity: price * quantity

print(f"5 items at $10: ${calculate_total(10, 5)}")

# Calculate percentage
percentage = lambda part, whole: (part / whole) * 100

print(f"35 out of 50: {percentage(35, 50)}%")

In [None]:
# Example: Using lambda with sorted()

products = [
    {"name": "Laptop", "price": 999},
    {"name": "Mouse", "price": 29},
    {"name": "Keyboard", "price": 79},
    {"name": "Monitor", "price": 249}
]

# Sort by price (ascending)
by_price = sorted(products, key=lambda p: p["price"])
print("Sorted by price (low to high):")
for p in by_price:
    print(f"  {p['name']}: ${p['price']}")

# Sort by name
by_name = sorted(products, key=lambda p: p["name"])
print("\nSorted by name:")
for p in by_name:
    print(f"  {p['name']}: ${p['price']}")

In [None]:
# Example: Using lambda with filter()

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Filter even numbers
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(f"Even numbers: {evens}")

# Filter numbers greater than 5
greater_than_5 = list(filter(lambda x: x > 5, numbers))
print(f"Greater than 5: {greater_than_5}")

# Filter products over $100
products = [{"name": "Laptop", "price": 999}, {"name": "Mouse", "price": 29}, {"name": "Monitor", "price": 249}]
expensive = list(filter(lambda p: p["price"] > 100, products))
print(f"Products over $100: {[p['name'] for p in expensive]}")

In [None]:
# Example: Using lambda with map()

prices = [100, 200, 150, 300, 250]

# Apply 10% discount
discounted = list(map(lambda p: p * 0.9, prices))
print(f"Original: {prices}")
print(f"Discounted: {discounted}")

# Convert to formatted strings
formatted = list(map(lambda p: f"${p:,.2f}", prices))
print(f"Formatted: {formatted}")

# Square all numbers
numbers = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x ** 2, numbers))
print(f"\nNumbers: {numbers}")
print(f"Squared: {squared}")

In [None]:
# Example: Lambda with conditional expression

# Categorize numbers
categorize = lambda x: "positive" if x > 0 else "negative" if x < 0 else "zero"

numbers = [-5, 0, 10, -3, 7]
for num in numbers:
    print(f"{num}: {categorize(num)}")

In [None]:
# Example: Practical data cleaning with lambda

# Clean messy names
names = ["  ALICE  ", "bob", "  Charlie", "DIANA  "]

clean_name = lambda s: s.strip().title()
cleaned = list(map(clean_name, names))

print(f"Original: {names}")
print(f"Cleaned: {cleaned}")

## Practice Exercise 4.1

**Task:** Given this list of employees:
```python
employees = [
    {"name": "Alice", "salary": 75000, "dept": "Engineering"},
    {"name": "Bob", "salary": 65000, "dept": "Marketing"},
    {"name": "Charlie", "salary": 85000, "dept": "Engineering"},
    {"name": "Diana", "salary": 70000, "dept": "Sales"}
]
```

Use lambda functions to:
1. Sort by salary (highest first)
2. Filter only Engineering employees
3. Get a list of just the names

**Expected Output:**
```
By salary: ['Charlie', 'Alice', 'Diana', 'Bob']
Engineering only: ['Alice', 'Charlie']
All names: ['Alice', 'Bob', 'Charlie', 'Diana']
```

In [None]:
# Your code here


In [None]:
# Solution 4.1

employees = [
    {"name": "Alice", "salary": 75000, "dept": "Engineering"},
    {"name": "Bob", "salary": 65000, "dept": "Marketing"},
    {"name": "Charlie", "salary": 85000, "dept": "Engineering"},
    {"name": "Diana", "salary": 70000, "dept": "Sales"}
]

# 1. Sort by salary (highest first)
by_salary = sorted(employees, key=lambda e: e["salary"], reverse=True)
print(f"By salary: {[e['name'] for e in by_salary]}")

# 2. Filter Engineering only
engineering = list(filter(lambda e: e["dept"] == "Engineering", employees))
print(f"Engineering only: {[e['name'] for e in engineering]}")

# 3. Get all names
names = list(map(lambda e: e["name"], employees))
print(f"All names: {names}")

---
# Section 5: Built-in Functions for Data Science
---

## Essential Built-in Functions

Python comes with many built-in functions that are particularly useful for data work. Here are the most important ones:

| Function | Purpose | Example |
|----------|---------|--------|
| `len()` | Count items | `len([1,2,3])` → 3 |
| `sum()` | Add numbers | `sum([1,2,3])` → 6 |
| `min()` | Find minimum | `min([3,1,2])` → 1 |
| `max()` | Find maximum | `max([3,1,2])` → 3 |
| `sorted()` | Sort items | `sorted([3,1,2])` → [1,2,3] |
| `abs()` | Absolute value | `abs(-5)` → 5 |
| `round()` | Round number | `round(3.7)` → 4 |
| `enumerate()` | Index + value | `list(enumerate(['a','b']))` |
| `zip()` | Combine iterables | `list(zip([1,2], ['a','b']))` |
| `map()` | Apply function | `list(map(str, [1,2,3]))` |
| `filter()` | Filter items | `list(filter(bool, [0,1,2]))` |
| `all()` | All truthy? | `all([True, True])` → True |
| `any()` | Any truthy? | `any([False, True])` → True |

In [None]:
# Example: Basic aggregation functions

sales = [1500, 2200, 1800, 3100, 2700, 1900, 2400]

print(f"Sales data: {sales}")
print(f"Count: {len(sales)}")
print(f"Total: ${sum(sales):,}")
print(f"Average: ${sum(sales) / len(sales):,.2f}")
print(f"Minimum: ${min(sales):,}")
print(f"Maximum: ${max(sales):,}")
print(f"Range: ${max(sales) - min(sales):,}")

In [None]:
# Example: sorted() with key parameter

words = ["banana", "Apple", "cherry", "Date"]

# Default sort (case-sensitive)
print(f"Default sort: {sorted(words)}")

# Case-insensitive sort
print(f"Case-insensitive: {sorted(words, key=str.lower)}")

# Sort by length
print(f"By length: {sorted(words, key=len)}")

# Reverse sort
print(f"Reverse: {sorted(words, reverse=True)}")

In [None]:
# Example: round() for data presentation

values = [3.14159, 2.71828, 1.41421, 1.73205]

# Round to different decimal places
print("Original values:", values)
print("Rounded to 2 decimals:", [round(v, 2) for v in values])
print("Rounded to 1 decimal:", [round(v, 1) for v in values])
print("Rounded to integers:", [round(v) for v in values])

# Round to negative places (tens, hundreds)
big_numbers = [1234, 5678, 9012]
print(f"\nOriginal: {big_numbers}")
print(f"Rounded to tens: {[round(n, -1) for n in big_numbers]}")
print(f"Rounded to hundreds: {[round(n, -2) for n in big_numbers]}")

In [None]:
# Example: all() and any() for data validation

# Check if all values meet a condition
scores = [85, 90, 78, 92, 88]
all_passing = all(score >= 60 for score in scores)
print(f"Scores: {scores}")
print(f"All passing (>=60): {all_passing}")

# Check if any value meets a condition
has_perfect = any(score == 100 for score in scores)
print(f"Any perfect score: {has_perfect}")

# Practical: Check data completeness
record = {"name": "Alice", "email": "alice@test.com", "phone": None}
all_fields_present = all(record.values())
print(f"\nRecord: {record}")
print(f"All fields complete: {all_fields_present}")

In [None]:
# Example: enumerate() for indexed loops

products = ["Laptop", "Mouse", "Keyboard", "Monitor"]

print("Product List:")
for index, product in enumerate(products, start=1):
    print(f"  {index}. {product}")

In [None]:
# Example: zip() to combine data

products = ["Laptop", "Mouse", "Keyboard"]
prices = [999, 29, 79]
quantities = [10, 50, 30]

print("Inventory:")
for product, price, qty in zip(products, prices, quantities):
    value = price * qty
    print(f"  {product}: {qty} units @ ${price} = ${value:,}")

In [None]:
# Example: Combining multiple built-in functions

data = [45, -12, 78, 0, -5, 92, 33, -8, 67]

# Analysis
positive_count = sum(1 for x in data if x > 0)
negative_count = sum(1 for x in data if x < 0)
abs_values = [abs(x) for x in data]

print(f"Data: {data}")
print(f"Positive values: {positive_count}")
print(f"Negative values: {negative_count}")
print(f"Absolute values: {abs_values}")
print(f"Sum of absolutes: {sum(abs_values)}")
print(f"Max absolute: {max(abs_values)}")

## Practice Exercise 5.1

**Task:** Given this data, use built-in functions to answer the questions:

```python
temperatures = [72, 68, 75, 80, 82, 78, 71, 69, 85, 77]
```

1. What is the average temperature? (rounded to 1 decimal)
2. What is the temperature range (max - min)?
3. How many days were above 75?
4. Were all days above freezing (32)?
5. Sort temperatures from lowest to highest

**Expected Output:**
```
Average: 75.7
Range: 17 degrees
Days above 75: 5
All above freezing: True
Sorted: [68, 69, 71, 72, 75, 77, 78, 80, 82, 85]
```

In [None]:
# Your code here


In [None]:
# Solution 5.1

temperatures = [72, 68, 75, 80, 82, 78, 71, 69, 85, 77]

# 1. Average temperature
average = round(sum(temperatures) / len(temperatures), 1)
print(f"Average: {average}")

# 2. Temperature range
temp_range = max(temperatures) - min(temperatures)
print(f"Range: {temp_range} degrees")

# 3. Days above 75
above_75 = sum(1 for t in temperatures if t > 75)
print(f"Days above 75: {above_75}")

# 4. All above freezing
all_above_freezing = all(t > 32 for t in temperatures)
print(f"All above freezing: {all_above_freezing}")

# 5. Sorted
print(f"Sorted: {sorted(temperatures)}")

---
# Section 6: Scope and Namespaces
---

## What is Scope?

**Scope** refers to where a variable can be accessed in your code. Python has several scope levels:

1. **Local scope**: Variables defined inside a function
2. **Enclosing scope**: Variables in outer functions (for nested functions)
3. **Global scope**: Variables defined at the module level
4. **Built-in scope**: Python's built-in names (like `print`, `len`)

This is called the **LEGB rule** (Local → Enclosing → Global → Built-in).

### Why This Matters in Data Science

Understanding scope helps you:
- Avoid bugs from variable name conflicts
- Write cleaner, more maintainable code
- Understand how functions interact with data
- Debug issues where variables have unexpected values

In [None]:
# Example: Local vs global scope

# Global variable
message = "I am global"

def show_message():
    # Local variable (shadows global)
    message = "I am local"
    print(f"Inside function: {message}")

show_message()
print(f"Outside function: {message}")

In [None]:
# Example: Function can read global variables

tax_rate = 0.08  # Global

def calculate_tax(amount):
    # Can read tax_rate from global scope
    return amount * tax_rate

price = 100
tax = calculate_tax(price)
print(f"Price: ${price}")
print(f"Tax rate: {tax_rate}")
print(f"Tax: ${tax}")

In [None]:
# Example: The global keyword

counter = 0  # Global

def increment_counter():
    global counter  # Explicitly use global variable
    counter += 1
    print(f"Counter inside function: {counter}")

print(f"Counter before: {counter}")
increment_counter()
increment_counter()
increment_counter()
print(f"Counter after: {counter}")

In [None]:
# Example: Enclosing scope (nested functions)

def outer_function():
    outer_var = "I'm from outer"
    
    def inner_function():
        # Can access outer_var from enclosing scope
        print(f"Inner sees: {outer_var}")
    
    inner_function()

outer_function()

In [None]:
# Example: Best practice - avoid global mutable state

# Bad: modifying global state
results = []

def bad_add_result(value):
    global results
    results.append(value)  # Modifies global

# Good: return new values
def good_add_result(results_list, value):
    return results_list + [value]  # Returns new list

# Even better: use a class or pass data explicitly
def process_data(data):
    """Process data and return results - no side effects."""
    return [x * 2 for x in data]

my_data = [1, 2, 3]
processed = process_data(my_data)
print(f"Original: {my_data}")
print(f"Processed: {processed}")

In [None]:
# Example: Common scope pitfall

# This creates a LOCAL variable, doesn't modify global
x = 10

def try_to_modify():
    x = 20  # Creates new local variable, doesn't modify global
    print(f"Inside: x = {x}")

try_to_modify()
print(f"Outside: x = {x}")  # Global x unchanged!

## Practice Exercise 6.1

**Task:** Predict the output of this code without running it. Then run it to verify.

```python
value = 100

def outer():
    value = 200
    
    def inner():
        value = 300
        print(f"Inner: {value}")
    
    inner()
    print(f"Outer: {value}")

outer()
print(f"Global: {value}")
```

**Expected Output:**
```
Inner: 300
Outer: 200
Global: 100
```

In [None]:
# Your code here - run to verify


In [None]:
# Solution 6.1

value = 100

def outer():
    value = 200
    
    def inner():
        value = 300
        print(f"Inner: {value}")
    
    inner()
    print(f"Outer: {value}")

outer()
print(f"Global: {value}")

# Each function has its own local 'value' variable
# They don't interfere with each other

---
# Section 7: Decorators (Introduction)
---

## What are Decorators?

A **decorator** is a function that takes another function and extends its behavior without explicitly modifying it. Think of it as a wrapper that adds functionality.

The `@decorator` syntax is just shorthand for `function = decorator(function)`.

### Why This Matters in Data Science

Decorators are used for:
- **Timing functions**: Measure execution time of data processing
- **Logging**: Track function calls and parameters
- **Caching**: Store results of expensive computations
- **Validation**: Check inputs before processing
- **Retry logic**: Retry failed API calls

In [None]:
# Example: Understanding the concept

def simple_decorator(func):
    """A decorator that prints before and after the function runs."""
    def wrapper():
        print("Before the function")
        func()
        print("After the function")
    return wrapper

def say_hello():
    print("Hello!")

# Apply decorator manually
decorated_hello = simple_decorator(say_hello)
decorated_hello()

In [None]:
# Example: Using @ syntax

def simple_decorator(func):
    def wrapper():
        print("Before the function")
        func()
        print("After the function")
    return wrapper

@simple_decorator  # Same as: greet = simple_decorator(greet)
def greet():
    print("Hello, World!")

greet()

In [None]:
# Example: Practical decorator - timing function execution

import time

def timer(func):
    """Decorator that measures function execution time."""
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"{func.__name__} took {end - start:.4f} seconds")
        return result
    return wrapper

@timer
def slow_function():
    """Simulate a slow operation."""
    time.sleep(0.5)
    return "Done!"

@timer
def process_data(data):
    """Process a list of numbers."""
    return sum(x ** 2 for x in data)

print(slow_function())
print()
result = process_data(range(100000))
print(f"Result: {result}")

In [None]:
# Example: Logging decorator

def log_call(func):
    """Log function calls with arguments and results."""
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__}")
        print(f"  Args: {args}")
        print(f"  Kwargs: {kwargs}")
        result = func(*args, **kwargs)
        print(f"  Returned: {result}")
        return result
    return wrapper

@log_call
def add(a, b):
    return a + b

@log_call
def greet(name, greeting="Hello"):
    return f"{greeting}, {name}!"

add(5, 3)
print()
greet("Alice", greeting="Hi")

In [None]:
# Example: Validation decorator

def validate_positive(func):
    """Ensure all numeric arguments are positive."""
    def wrapper(*args, **kwargs):
        for arg in args:
            if isinstance(arg, (int, float)) and arg < 0:
                raise ValueError(f"All arguments must be positive, got {arg}")
        return func(*args, **kwargs)
    return wrapper

@validate_positive
def calculate_area(width, height):
    return width * height

print(f"Area (5, 3): {calculate_area(5, 3)}")

try:
    calculate_area(-5, 3)
except ValueError as e:
    print(f"Error: {e}")

In [None]:
# Example: Using functools.lru_cache for memoization

from functools import lru_cache
import time

@lru_cache(maxsize=100)
def expensive_calculation(n):
    """Simulate an expensive calculation."""
    time.sleep(0.1)  # Simulate slow computation
    return n ** 2

# First calls are slow
start = time.time()
for i in range(5):
    expensive_calculation(i)
print(f"First run: {time.time() - start:.2f} seconds")

# Cached calls are instant
start = time.time()
for i in range(5):
    expensive_calculation(i)
print(f"Cached run: {time.time() - start:.4f} seconds")

## Practice Exercise 7.1

**Task:** Create a decorator called `repeat` that runs a function 3 times.

Test with a function that prints "Processing data..."

**Expected Output:**
```
Processing data...
Processing data...
Processing data...
```

In [None]:
# Your code here


In [None]:
# Solution 7.1

def repeat(func):
    """Decorator that runs a function 3 times."""
    def wrapper(*args, **kwargs):
        for _ in range(3):
            func(*args, **kwargs)
    return wrapper

@repeat
def process():
    print("Processing data...")

process()

## Practice Exercise 7.2

**Task:** Create a decorator called `debug` that prints:
- The function name being called
- The arguments passed
- The result returned

Test with a function that calculates the average of numbers.

**Expected Output:**
```
DEBUG: Calling calculate_avg
DEBUG: Arguments: ([10, 20, 30, 40],)
DEBUG: Returned: 25.0
Average: 25.0
```

In [None]:
# Your code here


In [None]:
# Solution 7.2

def debug(func):
    """Decorator that prints debug information."""
    def wrapper(*args, **kwargs):
        print(f"DEBUG: Calling {func.__name__}")
        print(f"DEBUG: Arguments: {args}")
        result = func(*args, **kwargs)
        print(f"DEBUG: Returned: {result}")
        return result
    return wrapper

@debug
def calculate_avg(numbers):
    return sum(numbers) / len(numbers)

result = calculate_avg([10, 20, 30, 40])
print(f"Average: {result}")

---
# Module Summary

## Key Takeaways

### Defining Functions
- Use `def function_name(parameters):` to define functions
- Include docstrings to document what functions do
- Functions make code reusable and organized

### Parameters and Arguments
- Positional vs keyword arguments
- Default parameter values for optional settings
- `*args` for variable positional arguments
- `**kwargs` for variable keyword arguments

### Return Values
- Use `return` to send values back
- Can return multiple values as tuples
- Return dictionaries for named results
- Early return for validation

### Lambda Functions
- Syntax: `lambda arguments: expression`
- Great with `sorted()`, `map()`, `filter()`
- Keep them simple and readable

### Built-in Functions
- `len()`, `sum()`, `min()`, `max()` for aggregations
- `sorted()` with custom keys
- `all()` and `any()` for conditions
- `enumerate()` and `zip()` for iteration

### Scope
- LEGB rule: Local → Enclosing → Global → Built-in
- Prefer passing data as arguments over global variables
- Use `global` keyword sparingly

### Decorators
- Functions that modify other functions
- `@decorator` syntax for clean application
- Common uses: timing, logging, caching, validation

---

## Next Module

In **Module 5: File Handling**, we'll learn about:
- Reading and writing text files
- Working with CSV files
- Handling JSON data
- Using context managers for safe file operations

File handling is essential for loading data into your Python programs!

---

## Additional Practice

For extra practice, try these challenges:

1. **Statistics Calculator**: Create a function that takes a list of numbers and returns a dictionary with count, mean, median, mode, standard deviation, and variance.

2. **Data Validator**: Create a function with `**kwargs` that validates different data types (email, phone, age, etc.) and returns validation results.

3. **Retry Decorator**: Create a decorator that retries a function up to 3 times if it raises an exception, with a delay between retries.

4. **Pipeline Builder**: Create functions that can be chained together to process data (clean → transform → validate → analyze).