# 🐍 Notebook 02: Python Essentials for Generative AI

**Week 1-2: Python & ML Foundations**  
**Gen AI Masters Program**

---

## 📋 Objectives

Welcome to the second notebook in our Python foundations module. While the previous notebook focused on setting up your environment, this one dives deep into the **core Python programming concepts** that are essential for building any AI or machine learning application. A solid grasp of these fundamentals is non-negotiable for a successful career in this field.

By the end of this notebook, you will have a strong command of:
1.  **Core Data Structures**: Master the use of lists, dictionaries, sets, and tuples for storing and organizing data.
2.  **Control Flow**: Implement sophisticated logic using `if/else` statements, loops, and highly efficient comprehensions.
3.  **Modular Code**: Write clean, reusable, and maintainable code with functions and lambda expressions.
4.  **Object-Oriented Programming (OOP)**: Structure your code effectively with classes, a key skill for building complex systems like model training pipelines.
5.  **Robustness and I/O**: Learn to handle files for data loading/saving and manage potential errors gracefully with exception handling.
6.  **Pythonic Best Practices**: Write code that is not just functional but also efficient, readable, and maintainable—the hallmark of a professional developer.

**Estimated Time:** 2-3 hours

---

## 📚 Why is Python the Undisputed Language of AI?

Python's dominance in the AI and machine learning landscape is a result of a powerful combination of factors:

-   🚀 **A Rich and Mature Ecosystem**: Python is the home of essential libraries that power modern AI, including **PyTorch**, **TensorFlow**, and **Hugging Face**. This ecosystem provides pre-built tools for nearly every task, from model training to deployment.
-   📊 **Data Science Powerhouses**: It seamlessly integrates with libraries like **NumPy**, **Pandas**, and **Matplotlib**, which are the industry standards for numerical computation, data manipulation, and visualization.
-   🤝 **Simple and Flexible Integration**: Python's simplicity allows it to be easily connected with web frameworks (like Flask or FastAPI for creating APIs), databases, and cloud services, making it ideal for building end-to-end AI applications.
-   👥 **A Vibrant and Supportive Community**: Python is backed by a massive global community of developers and researchers. This means you have access to extensive documentation, tutorials, and open-source libraries for almost any problem you might encounter.

This notebook will equip you with the essential Python skills needed to leverage this powerful ecosystem and build cutting-edge AI applications. Let's get started! 🎯

## 1️⃣ Section 1: Python's Core Data Structures

Data structures are the fundamental constructs used to organize, manage, and store data in a program. Choosing the right data structure is crucial for writing efficient and readable code. In AI, they are used to store everything from **model configurations** and **training data batches** to **tokenized text sequences** and **probability distributions** from a model's output layer.

### Lists: Ordered, Mutable Collections

A **list** is one of the most versatile data structures in Python. It is an ordered and mutable collection, meaning you can change its contents, add new items, or remove existing ones.

-   **Ordered**: The items in a list have a defined order, and that order will not change unless you explicitly modify it.
-   **Mutable**: You can add, remove, or change items in a list after it has been created.

In AI, lists are commonly used for:
-   Storing sequences of data, like a series of text prompts for a language model.
-   Holding a history of model outputs or training loss values.
-   Representing a batch of data points before they are converted into a tensor.

In [None]:
# --- List Creation and Basic Operations ---
# Here, we define two lists: one for popular language models and another for their corresponding performance scores.
# This is a common pattern for storing related data, although dictionaries or lists of objects are often better for more complex scenarios.
models = ["GPT-4", "Claude 3", "Gemini 1.5", "Llama 3"]
scores = [0.95, 0.93, 0.91, 0.89]

print(f"Initial models: {models}")
# Accessing elements by index is a fundamental list operation. Python uses 0-based indexing.
print(f"Score of the first model ('{models[0]}'): {scores[0]}")

# --- Modifying Lists ---
# Lists are mutable, which means you can change their content after they are created.
# The .append() method adds an item to the end of the list.
models.append("Mistral")
scores.append(0.85)
print(f"\nUpdated models after append: {models}")

# --- Slicing Lists ---
# Slicing is a powerful feature for accessing sub-lists. The syntax is `list[start:stop:step]`.
# The `stop` index is exclusive, meaning the element at that index is not included.
print(f"The top 2 models are: {models[:2]}")  # From the beginning up to (but not including) index 2
print(f"Models from the third position onwards: {models[2:]}")  # From index 2 to the end

# --- Common List Operations ---
# The `len()` function returns the number of items in a list.
print(f"\nTotal number of models: {len(models)}")
# The `in` keyword is an efficient way to check for the existence of an element in a list.
print(f"Is 'Gemini 1.5' in our list? {'Gemini 1.5' in models}")

### Dictionaries: Unordered Key-Value Pairs

A **dictionary** is a collection of key-value pairs. Unlike lists, which are indexed by a range of numbers, dictionaries are indexed by **keys**, which can be any immutable type (like strings, numbers, or tuples).

-   **Unordered (Historically)**: In Python versions before 3.7, dictionaries were unordered. Since Python 3.7, they are **insertion ordered**, meaning they remember the order in which items were inserted.
-   **Mutable**: You can add, remove, or change key-value pairs.
-   **Efficient Lookups**: Dictionaries are highly optimized for retrieving values when the key is known. This is much faster than searching for an element in a list.

Dictionaries are perfect for storing structured information, such as:
-   **Hyperparameters** for a model (e.g., learning rate, batch size).
-   **Metadata** for a dataset (e.g., name, version, author).
-   **JSON-like objects** returned from an API call.

In [None]:
# --- Dictionary for Model Configuration ---
# Dictionaries are the ideal data structure for storing model hyperparameters and configurations.
# The keys are strings that describe the hyperparameter, and the values are the settings.
model_config = {
    "name": "GPT-4",
    "parameters": "1.8T",
    "context_length": 128000,
    "temperature": 0.7,
    "top_p": 0.9,
    "provider": "OpenAI"
}

print("--- Model Configuration ---")
# The .items() method returns a view object that displays a list of a dictionary's key-value tuple pairs.
# This is useful for iterating over all key-value pairs in a dictionary.
for key, value in model_config.items():
    print(f"  - {key.replace('_', ' ').title()}: {value}")

# --- Accessing and Modifying Dictionary Values ---
# You can access the value of a key using square bracket notation.
# This will raise a `KeyError` if the key does not exist.
print(f"\nModel Name: {model_config['name']}")

# A safer way to access keys is using the .get() method.
# It returns `None` (or a specified default value) if the key is not found, avoiding an error.
print(f"Model's learning rate: {model_config.get('learning_rate', 'Not specified')}")

# Add a new key-value pair or update an existing one.
model_config["status"] = "production"
print(f"\nUpdated config with status: {model_config}")

## 2️⃣ Section 2: Control Flow

**Control flow** refers to the order in which the statements in your program are executed. Python provides several constructs to control this flow, allowing you to execute code conditionally and repeatedly. These form the logical core of any program.

### Conditional Logic: `if`, `elif`, `else`

Conditional statements allow you to execute certain blocks of code only when specific conditions are met. This is fundamental for decision-making in a program.

In machine learning, conditional statements are used everywhere:
-   **Making decisions** based on model performance metrics (e.g., if accuracy > 90%, deploy the model).
-   **Controlling the training loop** (e.g., if validation loss stops improving, stop training early).
-   **Preprocessing data** based on feature values (e.g., if a value is missing, fill it with the mean).

In [None]:
# --- Function for Model Evaluation ---
# This function uses conditional logic (`if`, `elif`, `else`) to classify a model's performance.
# Using functions to encapsulate logic like this is a core principle of good programming.
def evaluate_model_performance(score: float) -> str:
    """
    Evaluates a model's performance tier based on a given score.
    
    Args:
        score (float): The performance score of the model (e.g., accuracy, F1-score).

    Returns:
        str: A string describing the performance tier.
    """
    if score >= 0.95:
        return "Tier 1: Excellent performance. Ready for production."
    elif score >= 0.85:
        return "Tier 2: Good performance. Suitable for fine-tuning or specific tasks."
    elif score >= 0.75:
        return "Tier 3: Average performance. Consider retraining or using a different model."
    else:
        return "Tier 4: Poor performance. Requires significant review and retraining."

# --- Testing the Evaluation Function ---
# It's crucial to test your functions with a range of inputs to ensure they work as expected.
test_scores = [0.98, 0.87, 0.76, 0.65]

print("--- Model Performance Evaluation ---")
for score in test_scores:
    evaluation = evaluate_model_performance(score)
    print(f"Score: {score:.2f} -> {evaluation}")

### Loops and Comprehensions

**Loops** are used for iterating over a sequence (like a list, tuple, dictionary, or string) and executing a block of code for each item.

**Comprehensions** provide a concise, elegant, and often more efficient way to create lists, dictionaries, or sets from other iterables. They are considered more "Pythonic" than traditional loops for many common tasks and can result in more readable code.

-   **List Comprehension**: `[expression for item in iterable if condition]`
-   **Dictionary Comprehension**: `{key_expression: value_expression for item in iterable if condition}`
-   **Set Comprehension**: `{expression for item in iterable if condition}`

In AI, you might use comprehensions to:
-   Quickly preprocess a list of text documents.
-   Create a dictionary mapping words to their frequencies.
-   Filter a list of results based on a confidence score.

In [None]:
# --- Standard `for` loop ---
# A `for` loop is the standard way to iterate over a sequence.
prompts = ["Summarize this long article", "Translate this sentence to Spanish", "Extract key entities from this text"]

print("--- Iterating with a standard `for` loop: ---")
# `enumerate` is a useful function that adds a counter to an iterable.
# It returns pairs of (index, item), which is often cleaner than managing the index manually.
for i, prompt in enumerate(prompts, 1):  # Start counting from 1
    print(f"  Prompt {i}: '{prompt}'")

# --- List Comprehension ---
# This is a more concise and often faster way to create a new list by transforming an existing one.
# The following line is equivalent to:
# prompt_lengths = []
# for p in prompts:
#     prompt_lengths.append(len(p))
print("\n--- Using a list comprehension to get prompt lengths: ---")
prompt_lengths = [len(p) for p in prompts]
print(f"  Lengths of prompts: {prompt_lengths}")

# --- List Comprehension with Filtering ---
# Comprehensions can also include a conditional `if` clause to filter items.
# This creates a new list containing only the prompts that are longer than 30 characters.
print("\n--- Using a list comprehension to find long prompts: ---")
long_prompts = [p for p in prompts if len(p) > 30]
print(f"  Long prompts: {long_prompts}")

# --- Dictionary Comprehension ---
# You can also use comprehensions to create dictionaries.
# Here, we create a dictionary mapping each prompt to its length.
print("\n--- Using a dictionary comprehension: ---")
prompt_length_map = {p: len(p) for p in prompts}
print(f"  Prompt-to-length map: {prompt_length_map}")

## 3️⃣ Section 3: Functions and Lambda Expressions

**Functions** are reusable blocks of code that perform a specific task. They are the cornerstone of writing modular, organized, and maintainable programs. By breaking down a complex problem into smaller, manageable functions, you make your code easier to read, debug, and scale.

In machine learning, functions are used to define everything from:
-   **Data preprocessing steps** (e.g., cleaning text, normalizing images).
-   **Model training and evaluation loops**.
-   **API endpoints** that serve model predictions.

A well-written function should have:
-   A clear, descriptive name.
-   A **docstring** that explains what it does, its parameters, and what it returns.
-   **Type hints** to specify the expected data types for arguments and the return value.

In [None]:
import string
from typing import List, Dict, Optional

# --- A Well-Documented Function with Type Hinting ---
# This function demonstrates several best practices:
# - Type hints (`text: str`, `-> str`) make the function's contract clear.
# - Default arguments (`lowercase: bool = True`) make the function more flexible.
# - A detailed docstring explains the function's purpose, arguments, and return value.
def preprocess_text(
    text: str, 
    lowercase: bool = True, 
    remove_punctuation: bool = False
) -> str:
    """
    Cleans and preprocesses raw text for Natural Language Processing (NLP) tasks.

    Args:
        text (str): The input string to be processed.
        lowercase (bool): If True, converts the text to lowercase. Defaults to True.
        remove_punctuation (bool): If True, removes all punctuation marks. Defaults to False.

    Returns:
        str: The cleaned and preprocessed text.
    """
    # Apply lowercase transformation if the flag is set
    if lowercase:
        text = text.lower()
    
    # Apply punctuation removal if the flag is set
    if remove_punctuation:
        # `str.maketrans` creates a translation table. Here, it's configured to map
        # every character in `string.punctuation` to `None`, effectively deleting them.
        translator = str.maketrans('', '', string.punctuation)
        text = text.translate(translator)
    
    # `strip()` removes any leading or trailing whitespace.
    return text.strip()

# --- Testing the Preprocessing Function ---
sample_text = "  Hello, World! This is the Gen AI Masters Program.  "
print(f"Original text: '{sample_text}'")

# Test case 1: Default behavior (lowercase, keep punctuation)
processed_default = preprocess_text(sample_text)
print(f"Processed (default): '{processed_default}'")

# Test case 2: Both lowercase and remove punctuation
processed_full = preprocess_text(sample_text, remove_punctuation=True)
print(f"Processed (lowercase & no punctuation): '{processed_full}'")

# Test case 3: Keep original case, remove punctuation
processed_no_lower = preprocess_text(sample_text, lowercase=False, remove_punctuation=True)
print(f"Processed (no lowercase, no punctuation): '{processed_no_lower}'")

### Lambda Functions: Small, Anonymous Functions

A **lambda function** is a small, anonymous function defined with the `lambda` keyword. It can take any number of arguments but can only have one expression.

**Syntax**: `lambda arguments: expression`

Lambda functions are syntactically restricted and cannot contain complex statements or annotations. They are best suited for short, one-off operations where a full function definition would be overly verbose.

They are particularly useful when working with higher-order functions (functions that take other functions as arguments), such as:
-   `map()`: Applies a function to every item of an iterable.
-   `filter()`: Creates a new iterable with elements that satisfy a condition.
-   `sorted()`: Sorts an iterable, optionally with a custom key.

In [None]:
# --- Lambda for a Simple Calculation ---
# This lambda function performs min-max normalization, a common technique in machine learning
# to scale features to a fixed range (usually 0 to 1).
normalize = lambda x, min_val, max_val: (x - min_val) / (max_val - min_val)

scores = [65, 78, 82, 95, 100]
min_score, max_score = min(scores), max(scores)

# We can use this lambda function directly within a list comprehension for a concise transformation.
normalized_scores = [normalize(s, min_score, max_score) for s in scores]
print(f"Original scores: {scores}")
# Using an f-string with formatting to make the output cleaner.
print(f"Normalized scores (0-1): {[f'{s:.2f}' for s in normalized_scores]}")


# --- Using Lambda with `map` and `filter` ---
# These built-in functions are powerful when combined with lambdas for data processing.
texts = ["hello world", "generative ai is transforming industries", "python is a key skill"]

# `map` applies the lambda function to every item in the `texts` list.
# `map` returns a map object, so we convert it to a list to see the results.
word_counts = list(map(lambda t: len(t.split()), texts))
print(f"\nWord count of each text: {word_counts}")

# `filter` applies the lambda function to each item and returns only those for which the function returns True.
# Like `map`, `filter` returns an iterator, so we convert it to a list.
long_texts = list(filter(lambda t: len(t) > 20, texts))
print(f"Texts longer than 20 characters: {long_texts}")

## 4️⃣ Section 4: Object-Oriented Programming (OOP)

**Object-Oriented Programming (OOP)** is a programming paradigm that structures a program around "objects" rather than functions and logic. An object is a self-contained unit that combines data (called **attributes**) and behavior that operates on that data (called **methods**).

The core principles of OOP are:
-   **Encapsulation**: Bundling data and methods that work on that data within one unit (the class).
-   **Abstraction**: Hiding the complex implementation details and exposing only the necessary parts of an object.
-   **Inheritance**: Allowing a new class (child) to inherit attributes and methods from an existing class (parent).
-   **Polymorphism**: Allowing objects of different classes to be treated as objects of a common superclass.

In machine learning, classes are essential for creating reusable and organized components, such as:
-   **Models**: A `Model` class can encapsulate the model architecture, its weights (attributes), and its forward pass logic (a method).
-   **DataLoaders**: A `DataLoader` class can manage a dataset, batching, and preprocessing logic.
-   **Training Pipelines**: A `Trainer` class can orchestrate the entire training process, including the training loop, validation, and saving checkpoints.

In [None]:
from dataclasses import dataclass, asdict
from typing import List

# --- Using `@dataclass` for Configuration ---
# Dataclasses are a modern Python feature (introduced in 3.7) that provide a concise
# way to create classes that are primarily used for storing data.
# They automatically generate special methods like `__init__`, `__repr__`, and `__eq__`.
@dataclass
class LLMConfig:
    """A dataclass to hold the configuration for a Language Model."""
    name: str
    model_id: str
    temperature: float = 0.7
    max_tokens: int = 1024
    top_p: float = 0.9

# --- Defining a Class for a Text Generator ---
class TextGenerator:
    """A simple class representing a text generation model."""
    
    def __init__(self, config: LLMConfig):
        """
        Initializes the generator with a given configuration.
        The `__init__` method is the constructor for a class.
        """
        self.config = config
        self.history: List[str] = []  # An attribute to store generation history
        print(f"Initialized TextGenerator with model: '{self.config.name}'")
    
    def generate(self, prompt: str) -> str:
        """
        Simulates text generation based on a prompt.
        This is a method, a function that belongs to the class.
        """
        print(f"\nGenerating response for prompt: '{prompt}'...")
        # In a real-world scenario, this method would contain the logic to call a model's API
        # or run a local model to get a response.
        response = f"[{self.config.name}] This is a simulated response to your prompt."
        self.history.append(prompt)  # Modify the object's state
        return response
    
    def clear_history(self):
        """Clears the generation history."""
        self.history = []
        print("\nGeneration history has been cleared.")
    
    def __repr__(self) -> str:
        """
        Provides a developer-friendly string representation of the object.
        This is useful for debugging and logging.
        """
        return f"TextGenerator(model='{self.config.name}', history_count={len(self.history)})"

# --- Creating and Using an Instance of the Class ---
# 1. Create a configuration object using our dataclass.
gpt4_config = LLMConfig(name="GPT-4 Turbo", model_id="gpt-4-1106-preview")

# 2. Instantiate the TextGenerator class with the configuration object.
# This creates an "object" or "instance" of the class.
my_generator = TextGenerator(config=gpt4_config)
print(my_generator)  # This calls the __repr__ method

# 3. Use the object's methods to perform actions.
response1 = my_generator.generate("What is Generative AI?")
print(f"  -> Response: {response1}")

response2 = my_generator.generate("How do transformers work?")
print(f"  -> Response: {response2}")

# 4. Inspect the object's state (its attributes).
print(f"\nCurrent generator state: {my_generator}")
print(f"Generation history: {my_generator.history}")

# Dataclasses provide a handy `asdict` function to convert an instance to a dictionary,
# which is useful for serialization (e.g., saving to a JSON file).
print(f"Configuration as dictionary: {asdict(gpt4_config)}")

## 5️⃣ Section 5: File I/O and Exception Handling

**File I/O (Input/Output)** is the process of reading from and writing to files on the file system. This is a daily task in machine learning, whether you're:
-   Loading a dataset from a CSV or JSON file.
-   Saving a trained model's weights (a "checkpoint").
-   Writing logs to track the progress of a training run.

**Exception Handling** is a critical mechanism for building robust and reliable programs. It allows you to gracefully handle errors that might occur during program execution (e.g., a file not being found, a network error) instead of letting your program crash.

The primary tool for this in Python is the **`try...except`** block:
-   The **`try`** block contains the code that might raise an error.
-   The **`except`** block contains the code that will be executed if a specific error occurs.
-   The **`finally`** block contains code that will be executed no matter what, whether an error occurred or not (useful for cleanup operations).

In [None]:
import json
from pathlib import Path

# --- Creating a Sample Dataset ---
# This dictionary represents a simple dataset we want to save to a file.
# This structure is very similar to what you would find in a JSON file.
dataset = {
    "metadata": {
        "name": "Instruction Prompts Dataset",
        "version": "1.0",
        "author": "Gen AI Masters Program",
        "total_samples": 3
    },
    "prompts": [
        {"id": "001", "task": "summarization", "text": "Summarize the provided article on climate change."},
        {"id": "002", "task": "translation", "text": "Translate 'Hello, how are you?' to French."},
        {"id": "003", "task": "classification", "text": "Classify this movie review as positive or negative."}
    ]
}

# --- Writing to and Reading from a JSON File ---
# Using `pathlib.Path` is the modern, object-oriented way to handle file paths.
output_file = Path("sample_dataset.json")

# The `try...except...finally` block allows us to handle potential errors gracefully.
try:
    print(f"--- Writing data to '{output_file}' ---")
    # The `with` statement is a context manager that ensures the file is automatically
    # closed, even if errors occur. This is the recommended way to work with files.
    with output_file.open('w', encoding='utf-8') as f:
        # `json.dump` serializes a Python dictionary into a JSON formatted string and writes it to the file.
        # `indent=4` makes the JSON file human-readable.
        json.dump(dataset, f, indent=4)
    print(f"✅ Successfully saved dataset.")
    
    print(f"\n--- Reading data back from '{output_file}' ---")
    with output_file.open('r', encoding='utf-8') as f:
        # `json.load` deserializes a JSON formatted string from a file back into a Python dictionary.
        loaded_dataset = json.load(f)
    
    print(f"✅ Successfully loaded dataset.")
    print(f"   - Dataset Name: {loaded_dataset['metadata']['name']}")
    print(f"   - Number of prompts: {len(loaded_dataset['prompts'])}")

# --- Graceful Error Handling ---
# We can catch specific errors to provide more helpful feedback.
except FileNotFoundError:
    print(f"❌ Error: The file '{output_file}' was not found during the read operation.")
except json.JSONDecodeError:
    print("❌ Error: Failed to decode JSON. The file may be corrupted or not in valid JSON format.")
except Exception as e:
    # This is a general catch-all for any other unexpected errors.
    print(f"❌ An unexpected error occurred: {e}")
finally:
    # The `finally` block is always executed, making it ideal for cleanup tasks
    # like closing connections or releasing resources.
    print("\n✨ File I/O operations complete.")

## 6️⃣ Section 6: Python Best Practices for Machine Learning

Writing code that is not only correct but also **clean, efficient, and maintainable** is crucial for building robust and scalable ML systems. Adhering to best practices makes your code easier to debug, reuse, and collaborate on.

### The Importance of Type Hints and Docstrings

As demonstrated in previous examples, clear documentation and type hints are not just "nice-to-haves"; they are essential for professional software development.

-   **Type Hints**: By specifying the expected data types of function arguments and return values (e.g., `text: str -> List[str]`), you make your code more self-documenting. This also allows static analysis tools and IDEs (like VS Code) to catch potential bugs before you even run the code.
-   **Docstrings**: A good docstring explains the "why" and "how" of a function. It should describe what the function does, what its parameters are, what it returns, and any errors it might raise. This is invaluable for anyone using your code, including your future self.

In [None]:
from typing import List, Dict
import numpy as np

def calculate_regression_metrics(
    y_true: List[float], 
    y_pred: List[float]
) -> Dict[str, float]:
    """
    Calculates key regression metrics: Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).

    This function is a good example of Python best practices:
    - Clear type hints for inputs and outputs.
    - A comprehensive docstring explaining the function, its arguments, return value, and potential errors.
    - An example of usage right in the docstring (which can even be used for automated testing with `doctest`).
    - Defensive programming with a check for input validity.

    Args:
        y_true (List[float]): A list of ground truth target values.
        y_pred (List[float]): A list of predicted values from the model.

    Returns:
        Dict[str, float]: A dictionary containing the calculated MSE and RMSE, 
                          rounded to four decimal places, along with the sample count.
                          
    Raises:
        ValueError: If the input lists `y_true` and `y_pred` have different lengths.
    
    Example:
        >>> true_values = [1.0, 2.0, 3.0]
        >>> pred_values = [1.1, 2.1, 2.9]
        >>> metrics = calculate_regression_metrics(true_values, pred_values)
        >>> print(f"MSE: {metrics['mse']:.4f}, RMSE: {metrics['rmse']:.4f}")
        MSE: 0.0100, RMSE: 0.1000
    """
    # Input validation is crucial for robust functions.
    if len(y_true) != len(y_pred):
        raise ValueError("Input lists `y_true` and `y_pred` must have the same length.")
    
    # Using libraries like NumPy for numerical operations is highly recommended.
    # NumPy arrays allow for efficient, vectorized calculations that are much faster than Python loops.
    y_true_np = np.array(y_true)
    y_pred_np = np.array(y_pred)
    
    # Calculate Mean Squared Error (MSE)
    # This is the average of the squared differences between the predicted and actual values.
    mse = np.mean((y_true_np - y_pred_np) ** 2)
    
    # Calculate Root Mean Squared Error (RMSE)
    # This is the square root of the MSE and is often preferred as it is in the same units as the target variable.
    rmse = np.sqrt(mse)
    
    return {
        "mse": round(mse, 4),
        "rmse": round(rmse, 4),
        "samples": len(y_true)
    }

# --- Test the metrics function ---
true_values = [1.0, 2.0, 3.0, 4.0, 5.0]
predicted_values = [1.1, 2.2, 2.9, 4.1, 5.2]

try:
    metrics = calculate_regression_metrics(true_values, predicted_values)
    print(f"Calculated Regression Metrics: {metrics}")
    
    # Test the error handling
    print("\n--- Testing error handling with mismatched lengths ---")
    calculate_regression_metrics([1, 2], [1, 2, 3])
except ValueError as e:
    print(f"Caught expected error: {e}")

### Advanced Python: Decorators and Context Managers

These are more advanced but powerful Python features that you will frequently encounter in ML libraries and frameworks.

-   **Decorators**: A decorator is a function that takes another function as an argument and extends its behavior without explicitly modifying it. They are a form of metaprogramming and are heavily used for tasks like:
    -   **Logging**: Automatically log when a function is called and what it returns.
    -   **Timing**: Measure the execution time of a function.
    -   **Caching**: Cache the results of expensive function calls.
    -   **Authentication/Authorization**: In web frameworks, decorators are used to check if a user is logged in before allowing them to access an endpoint.

-   **Context Managers** (`with` statements): A context manager is an object that defines the methods `__enter__` and `__exit__`. It is used to manage resources by ensuring that setup and teardown operations are always executed. The `with` statement guarantees that the `__exit__` method is called, even if an error occurs within the block. This is crucial for:
    -   **File Handling**: Automatically closing files to prevent resource leaks.
    -   **Database Connections**: Ensuring a database connection is closed.
    -   **Model States**: In PyTorch, a context manager like `torch.no_grad()` is used to temporarily disable gradient calculations during inference, which saves memory and computation.

In [None]:
import time
from functools import wraps
from contextlib import contextmanager

# --- A Decorator for Timing Function Execution ---
def timing_decorator(func):
    """A decorator that prints the execution time of the decorated function."""
    @wraps(func)  # `@wraps` is a helper decorator that preserves the original function's metadata (like its name and docstring).
    def wrapper(*args, **kwargs):
        start_time = time.perf_counter()  # Use perf_counter for high-precision timing
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        elapsed_time = end_time - start_time
        print(f"⏱️  Function '{func.__name__}' executed in {elapsed_time:.4f} seconds.")
        return result
    return wrapper

# --- Applying the Decorator ---
# We apply the decorator to our function using the `@` syntax.
# Now, whenever `simulate_data_processing` is called, `timing_decorator` will be executed automatically.
@timing_decorator
def simulate_data_processing(num_records: int):
    """A function that simulates a time-consuming data processing task."""
    print(f"Processing {num_records:,} records...")
    # A generator expression is used here for memory efficiency.
    # It doesn't create a full list in memory.
    _ = sum(i * i for i in range(num_records))
    print("Processing complete.")

# Test the decorated function
simulate_data_processing(5_000_000)


# --- A Context Manager for Model Inference ---
# The `@contextmanager` decorator from the `contextlib` module is a convenient way to create a context manager from a simple generator function.
@contextmanager
def inference_mode(model_name: str):
    """A context manager to simulate setting up and tearing down a model for inference."""
    print(f"\n🔄 Preparing model '{model_name}' for inference...")
    # Code before the `yield` is the setup phase (like __enter__).
    # e.g., loading model weights, setting model to evaluation mode.
    try:
        yield  # The `yield` keyword passes control back to the `with` block.
    finally:
        # Code after the `yield` is the teardown phase (like __exit__).
        # This code is guaranteed to run, even if errors occur in the `with` block.
        print(f"✅ Finished inference with '{model_name}'. Cleaning up resources.")

# Use the context manager with a `with` statement.
with inference_mode("ResNet-50"):
    print("   -> Running inference on a batch of images...")
    time.sleep(0.5) # Simulate the work being done.
    print("   -> Inference successful.")

## 🎯 Section 7: Practice Exercises

The best way to solidify your understanding is through practice. These exercises are designed to reinforce the concepts covered in this notebook. Try to solve them yourself before looking at the provided solutions.

**Goal**: Apply your knowledge of functions, decorators, and classes to solve common programming challenges in the context of machine learning.

In [None]:
# --- Exercise 1: Text Tokenization Function ---
# A common first step in many NLP tasks is tokenization.
def tokenize_text(text: str) -> List[str]:
    """
    Splits a string into a list of cleaned tokens.
    
    This function should perform the following steps:
    1. Convert the input text to lowercase to ensure consistency.
    2. Split the text by whitespace to get a list of potential tokens.
    3. Remove any resulting empty strings that might appear from multiple spaces.
    """
    # A list comprehension is a clean and efficient way to implement this.
    return [token for token in text.lower().split() if token]

# Test Exercise 1
text_to_tokenize = "  This is a sample sentence for tokenization. "
tokens = tokenize_text(text_to_tokenize)
print(f"--- Tokenization Exercise ---")
print(f"Original: '{text_to_tokenize}'")
print(f"Tokens: {tokens}\n")


# --- Exercise 2: Caching Decorator ---
# Caching is a powerful optimization technique. This exercise implements a simple version.
def simple_cache(func):
    """A decorator that caches the results of a function based on its arguments."""
    _cache = {}  # The cache is stored in a dictionary within the decorator's scope.
    @wraps(func)
    def wrapper(*args):
        # The arguments to the function are used as the cache key.
        # Since `args` is a tuple, it is hashable and can be used as a dictionary key.
        if args in _cache:
            print(f"(Result for {args} found in cache)")
            return _cache[args]
        
        print(f"(Calculating result for {args} and caching...)")
        result = func(*args)
        _cache[args] = result
        return result
    return wrapper

# We can chain decorators. The one closest to the function (`@timing_decorator`) is applied first.
@simple_cache
@timing_decorator
def slow_computation(a: int, b: int) -> int:
    """A function that simulates a slow computation that we want to cache."""
    time.sleep(1)  # Simulate a 1-second delay
    return a + b

print("--- Caching Decorator Exercise ---")
# The first call will be slow and will cache the result.
slow_computation(5, 10)
# The second call with the same arguments should be almost instantaneous and hit the cache.
slow_computation(5, 10)
# A call with different arguments will be slow again.
slow_computation(3, 7)
print("")


# --- Exercise 3: Iterable DataLoader Class ---
# In ML, data is often processed in batches. This class mimics that behavior.
class SimpleDataLoader:
    """A class that takes a list of data and makes it iterable in batches."""
    def __init__(self, data: List, batch_size: int):
        if not isinstance(batch_size, int) or batch_size <= 0:
            raise ValueError("Batch size must be a positive integer.")
        self.data = data
        self.batch_size = batch_size
    
    def __len__(self) -> int:
        """
        Returns the total number of batches. This is a special "dunder" method.
        The calculation ensures that even a partial final batch is counted.
        """
        return (len(self.data) + self.batch_size - 1) // self.batch_size
    
    def __iter__(self):
        """
        This dunder method makes the object iterable, allowing it to be used in a `for` loop.
        It should `yield` one batch at a time.
        """
        for i in range(0, len(self.data), self.batch_size):
            yield self.data[i:i + self.batch_size]

print("--- DataLoader Exercise ---")
my_data = list(range(33))  # A sample dataset of 33 data points
data_loader = SimpleDataLoader(my_data, batch_size=8)

# Because we implemented __len__, we can call len() on our object.
print(f"Data: {my_data}")
print(f"Batch Size: {data_loader.batch_size}, Total Batches: {len(data_loader)}")

# Because we implemented __iter__, we can loop over our object directly.
print("\nIterating through the data loader:")
for i, batch in enumerate(data_loader):
    print(f"  Batch {i+1}: {batch}")

print("\n\n✅ All exercises completed successfully!")

## 🎉 Summary & Key Takeaways

Congratulations on completing this intensive review of essential Python concepts! You have revisited and reinforced the fundamental skills that are absolutely critical for building sophisticated and robust Generative AI applications. A deep understanding of these topics separates a hobbyist from a professional AI engineer.

### Core Concepts Mastered:
-   **Data Structures**: You are now proficient with lists, dictionaries, sets, and tuples, and you have a clear understanding of when to use each one for maximum efficiency. You can write clean, efficient list and dictionary comprehensions to transform and filter data.
-   **Control Flow**: You can confidently use `if/elif/else` statements and `for/while` loops to direct the logic of your programs, from simple decision-making to complex training loops.
-   **Functions & Lambdas**: You can write modular, reusable functions with clear type hints and docstrings, making your code more maintainable and easier to debug. You know how and when to use lambda functions for concise, on-the-fly operations.
-   **Object-Oriented Programming**: You understand how to structure complex systems using classes and dataclasses, encapsulating data and behavior into logical, reusable units like models, data loaders, and trainers.
-   **Pythonic Best Practices**: You can write robust code that handles errors gracefully using `try...except` blocks, manages resources safely with context managers (`with` statements), and leverages advanced features like decorators to add functionality cleanly.

---

### 📚 Next Steps

With your Python foundations solidified, you are now fully prepared to move on to the specialized libraries that form the backbone of data science and machine learning in Python. The skills you've honed in this notebook will be applied in every subsequent module of this program.

<div align="center">
    <h3>Great job! You're ready for the next step in your AI journey.</h3>
    <p>The next notebook in this module will cover <strong>NumPy and Pandas</strong>, the essential libraries for high-performance numerical computing and data manipulation. Get ready to work with large datasets efficiently!</p>
</div>