# Python Technical Interview - AI Agent Developer Position

## Instructions
This notebook contains 10 questions designed to test your Python skills and ability to work with AI-generated code. Each question has:
- **Problem Description** - What you need to accomplish
- **Code Cell** - Where you write your solution
- **Test Cell** - Automated tests to verify your solution

**Guidelines:**
- Read each question carefully
- You can use whatever libraries or packages
- Some questions provide starter code, others start from scratch
- Focus on writing clean, readable, and robust code
- code should be able to run after clearing all outputs
- All test cells should pass when you're done

## Question 1: Debug AI-Generated Code (Lists & Logic)

**Scenario:** An AI generated this code to filter products by price range, but it has several bugs. Fix the code so it works correctly.

**Requirements:**
- Filter products where price is between min_price and max_price (inclusive)
- Handle edge cases gracefully
- Maintain the original function signature

In [None]:
def filter_products_by_price(products, min_price, max_price):
    """
    Filter products by price range.
    
    Args:
        products: List of dicts with 'name' and 'price' keys
        min_price: Minimum price (inclusive)
        max_price: Maximum price (inclusive)
    
    Returns:
        List of products within price range
    """
    # FIX: Use >= and <= for inclusive range
    filtered = []
    for product in products:
        # The fix changes '>' to '>=' and '<' to '<='
        if product['price'] >= min_price and product['price'] <= max_price:
            filtered.append(product)
    return filtered

# Test your solution here
products = [
    {'name': 'Laptop', 'price': 1000},
    {'name': 'Mouse', 'price': 25},
    {'name': 'Keyboard', 'price': 75},
    {'name': 'Monitor', 'price': 300}
]

result = filter_products_by_price(products, 25, 300)
print("Filtered products:", result)
# Expected Output: [{'name': 'Mouse', 'price': 25}, {'name': 'Keyboard', 'price': 75}, {'name': 'Monitor', 'price': 300}]

In [None]:
def filter_products_by_price(products, min_price, max_price):
    """
    Filter products by price range.
    
    Args:
        products: List of dicts with 'name' and 'price' keys
        min_price: Minimum price (inclusive)
        max_price: Maximum price (inclusive)
    
    Returns:
        List of products within price range
    """
    # FIX: Use >= and <= for inclusive range
    filtered = []
    for product in products:
        # The corrected logic: >= and <=
        if product['price'] >= min_price and product['price'] <= max_price:
            filtered.append(product)
    return filtered

# ----------------------------------------------------------------------
# Test Cell (Question 1)
def test_question_1():
    products = [
        {'name': 'Laptop', 'price': 1000},
        {'name': 'Mouse', 'price': 25},
        {'name': 'Keyboard', 'price': 75},
        {'name': 'Monitor', 'price': 300}
    ]
    
    # Test inclusive bounds
    result = filter_products_by_price(products, 25, 300)
    expected_names = ['Mouse', 'Keyboard', 'Monitor']
    actual_names = [p['name'] for p in result]
    # Use sets for comparison to ignore potential order differences
    assert set(actual_names) == set(expected_names), f"Expected {expected_names}, got {actual_names}"
    
    # Test edge case - empty list
    assert filter_products_by_price([], 0, 100) == []
    
    # Test no matches
    assert filter_products_by_price(products, 2000, 3000) == []
    
    print("✓ Question 1 tests passed!")

# Execute the test
test_question_1()

## Question 2: Fix API Integration (Error Handling)

**Scenario:** This AI-generated code fetches user data from an API but lacks proper error handling. Add robust error handling and improve the code.

**Requirements:**
- Handle network timeouts
- Handle HTTP errors (4xx, 5xx)
- Handle JSON parsing errors
- Return None on any error, don't let exceptions bubble up
- Add appropriate logging

In [None]:
# ----------------------------------------------------------------------
# Question 2: Fix API Integration (Error Handling)
import requests
import json
import logging
from requests.exceptions import Timeout, HTTPError, ConnectionError, RequestException

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def get_user_data(user_id):
    """
    Fetch user data from API with proper error handling.
    
    Args:
        user_id: User ID to fetch
        
    Returns:
        dict: User data if successful, None if any error occurs
    """
    url = f"https://jsonplaceholder.typicode.com/users/{user_id}"
    
    try:
        # FIX 1: Use a timeout (critical for API calls)
        response = requests.get(url, timeout=5) 
        
        # FIX 2: Raise HTTPError for 4xx or 5xx status codes
        response.raise_for_status()
        
        # FIX 3: Handle empty successful response (e.g., status 204 if applicable)
        if response.text.strip():
            data = response.json()
            return data
        else:
            logger.info(f"User {user_id}: Received successful response with no body.")
            return None
        
    # FIX 4: Catch specific request exceptions first
    except Timeout:
        logger.error(f"User {user_id}: Request timed out.")
    except HTTPError as e:
        logger.error(f"User {user_id}: HTTP error occurred: {e}")
    except ConnectionError:
        logger.error(f"User {user_id}: Network connection error.")
    except json.JSONDecodeError:
        logger.error(f"User {user_id}: Failed to decode JSON response.")
    except RequestException as e:
        # Catch-all for other requests exceptions (e.g., DNS error)
        logger.error(f"User {user_id}: An unexpected request error occurred: {e}")
    except Exception as e:
        # Catch-all for general unforeseen errors
        logger.error(f"User {user_id}: An unexpected general error occurred: {e}")
        
    return None

# Test your solution here
user_data = get_user_data(1)
print("User data:", user_data)
# You can test failure by calling get_user_data(1000) or using an invalid URL.

In [None]:
# ----------------------------------------------------------------------
# Question 2: Fix API Integration (Error Handling)
import requests
import json
import logging
import unittest.mock as mock
from requests.exceptions import Timeout, HTTPError, ConnectionError, RequestException

# Configure logging (important to keep this for the function to work)
# Note: For running tests, you might suppress logging output depending on the environment.
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def get_user_data(user_id):
    """
    Fetch user data from API with proper error handling.
    
    Args:
        user_id: User ID to fetch
        
    Returns:
        dict: User data if successful, None if any error occurs
    """
    url = f"https://jsonplaceholder.typicode.com/users/{user_id}"
    
    try:
        # FIX 1: Use a timeout
        response = requests.get(url, timeout=5) 
        
        # FIX 2: Raise HTTPError for 4xx or 5xx status codes
        response.raise_for_status()
        
        # FIX 3: Handle empty successful response
        if response.text.strip():
            data = response.json()
            return data
        else:
            logger.info(f"User {user_id}: Received successful response with no body.")
            return None
        
    # FIX 4: Catch specific request exceptions
    except Timeout:
        logger.error(f"User {user_id}: Request timed out.")
    except HTTPError as e:
        logger.error(f"User {user_id}: HTTP error occurred: {e}")
    except ConnectionError:
        logger.error(f"User {user_id}: Network connection error.")
    except json.JSONDecodeError:
        logger.error(f"User {user_id}: Failed to decode JSON response.")
    except RequestException as e:
        # Catch-all for other requests exceptions
        logger.error(f"User {user_id}: An unexpected request error occurred: {e}")
    except Exception as e:
        # Catch-all for general unforeseen errors
        logger.error(f"User {user_id}: An unexpected general error occurred: {e}")
        
    return None

# ----------------------------------------------------------------------
# Test Cell (Question 2)
def test_question_2():
    # Test successful request (live API call, but that's what the prompt tests)
    user_data = get_user_data(1)
    assert user_data is not None
    assert 'name' in user_data
    
    # Test invalid user ID (should result in 404 HTTPError, caught by error handling)
    user_data = get_user_data(999999)
    assert user_data is None
    
    # Test with mock to simulate network error
    with mock.patch('requests.get') as mock_get:
        mock_get.side_effect = requests.exceptions.RequestException("Network error")
        result = get_user_data(1)
        assert result is None
    
    # Test with mock to simulate timeout
    with mock.patch('requests.get') as mock_get:
        mock_get.side_effect = requests.exceptions.Timeout("Timeout")
        result = get_user_data(1)
        assert result is None
    
    print("✓ Question 2 tests passed!")

# Execute the test
test_question_2()

## Question 3: Code from Scratch (Data Structures)

**Scenario:** Create a `TaskManager` class to manage a simple todo list.

**Requirements:**
- Add tasks with priority (1=high, 2=medium, 3=low)
- Mark tasks as complete
- Get tasks filtered by completion status and/or priority
- Get task count by status

In [None]:
class TaskManager:
    """
    A simple task manager for tracking todo items.
    """
    
    def __init__(self):
        """Initialize empty task manager."""
        pass
    
    def add_task(self, description, priority=2):
        """
        Add a new task.
        
        Args:
            description (str): Task description
            priority (int): Priority level (1=high, 2=medium, 3=low)
        """
        pass
    
    def complete_task(self, task_id):
        """
        Mark a task as complete.
        
        Args:
            task_id: Unique identifier for the task
            
        Returns:
            bool: True if task was found and completed, False otherwise
        """
        pass
    
    def get_tasks(self, completed=None, priority=None):
        """
        Get tasks filtered by status and/or priority.
        
        Args:
            completed (bool, optional): Filter by completion status
            priority (int, optional): Filter by priority level
            
        Returns:
            list: List of matching tasks
        """
        pass
    
    def get_task_count(self, completed=None):
        """
        Get count of tasks by completion status.
        
        Args:
            completed (bool, optional): Count completed (True) or pending (False) tasks
            
        Returns:
            int: Number of matching tasks
        """
        pass

tm = TaskManager()
tm.add_task("Fix bug in login", 1)  # High priority
tm.add_task("Update documentation", 3)  # Low priority
tm.add_task("Code review", 2)  # Medium priority

print("All tasks:", len(tm.get_tasks()))
print("High priority tasks:", len(tm.get_tasks(priority=1)))

In [None]:
# ----------------------------------------------------------------------
# Question 3: Code from Scratch (Data Structures)
class TaskManager:
    """
    A simple task manager for tracking todo items.
    """
    
    def __init__(self):
        """Initialize empty task manager."""
        self._tasks = {}  # Store tasks by ID for O(1) lookup
        self._next_id = 1
    
    def add_task(self, description, priority=2):
        """
        Add a new task.
        """
        task = {
            'id': self._next_id,
            'description': description,
            'priority': priority,
            'completed': False
        }
        self._tasks[self._next_id] = task
        self._next_id += 1
    
    def complete_task(self, task_id):
        """
        Mark a task as complete.
        """
        # Ensure ID is an integer
        try:
            task_id = int(task_id) 
        except (ValueError, TypeError):
            return False
            
        if task_id in self._tasks:
            self._tasks[task_id]['completed'] = True
            return True
        return False
    
    def get_tasks(self, completed=None, priority=None):
        """
        Get tasks filtered by status and/or priority.
        """
        filtered_tasks = []
        for task in self._tasks.values():
            if completed is not None and task['completed'] != completed:
                continue
            if priority is not None and task['priority'] != priority:
                continue
                
            filtered_tasks.append(task)
            
        return filtered_tasks
    
    def get_task_count(self, completed=None):
        """
        Get count of tasks by completion status.
        """
        if completed is None:
            return len(self._tasks)
        
        count = sum(1 for task in self._tasks.values() if task['completed'] == completed)
        return count

# ----------------------------------------------------------------------
# Test Cell (Question 3)
def test_question_3():
    tm = TaskManager()
    
    # Test adding tasks (IDs 1, 2, 3)
    tm.add_task("Task 1 (High)", 1)
    tm.add_task("Task 2 (Medium)", 2)
    tm.add_task("Task 3 (Low)", 3)
    
    # Test get all tasks
    all_tasks = tm.get_tasks()
    assert len(all_tasks) == 3
    
    # Test priority filtering
    high_priority = tm.get_tasks(priority=1)
    assert len(high_priority) == 1
    
    # Test task completion
    task_id = all_tasks[0]['id']  # Should be ID 1
    success = tm.complete_task(task_id)
    assert success == True
    
    # Test completion filtering
    completed_tasks = tm.get_tasks(completed=True)
    assert len(completed_tasks) == 1
    
    pending_tasks = tm.get_tasks(completed=False)
    assert len(pending_tasks) == 2
    
    # Test task counts
    assert tm.get_task_count() == 3
    assert tm.get_task_count(completed=True) == 1
    assert tm.get_task_count(completed=False) == 2
    
    print("✓ Question 3 tests passed!")

# Execute the test
test_question_3()

## Question 4: Optimize AI Code (Performance)

**Scenario:** This AI code finds common elements between multiple lists, but it's very inefficient. Optimize it for better performance.

**Requirements:**
- Same functionality as original
- Significantly better time complexity
- Handle edge cases (empty lists, no common elements)

In [None]:
# ----------------------------------------------------------------------
# Question 4: Optimize AI Code (Performance)

def find_common_elements_slow(lists):
    """
    Find elements that appear in ALL provided lists.
    AI-generated inefficient version (O(N*M*K)) - OPTIMIZE THIS!
    
    Args:
        lists: List of lists to find common elements in
        
    Returns:
        list: Elements that appear in all lists
    """
    if not lists:
        return []
    
    common = []
    for item in lists[0]:
        is_common = True
        for other_list in lists[1:]:
            found = False
            for other_item in other_list:
                if item == other_item:
                    found = True
                    break
            if not found:
                is_common = False
                break
        if is_common and item not in common:
            common.append(item)
    
    return common

# Optimized version - implemented using sets (O(sum of list lengths))
def find_common_elements_fast(lists):
    """
    Find elements that appear in ALL provided lists.
    Optimized version with better time complexity using sets.
    
    Args:
        lists: List of lists to find common elements in
        
    Returns:
        list: Elements that appear in all lists
    """
    if not lists:
        return []
        
    # 1. Initialize common_set with the set of the first list
    # The set conversion handles duplicates in the first list efficiently.
    common_set = set(lists[0])
    
    # 2. Iterate through the rest of the lists and find the intersection
    for sublist in lists[1:]:
        # Set intersection is a highly optimized operation
        common_set = common_set.intersection(set(sublist))
        
        # Optimization: Early exit if the intersection becomes empty
        if not common_set:
            return []
            
    # 3. Convert the final set back to a list
    return list(common_set)

# Test both versions
test_lists = [
    [1, 2, 3, 4, 5],
    [3, 4, 5, 6, 7],
    [4, 5, 7, 8, 9]
]

print("Slow version:", find_common_elements_slow(test_lists))
print("Fast version:", find_common_elements_fast(test_lists))
# Expected Output: [4, 5] (order may vary for the fast version)

In [None]:
# ----------------------------------------------------------------------
# Question 4: Optimize AI Code (Performance)
import time

def find_common_elements_slow(lists):
    """
    Find elements that appear in ALL provided lists.
    AI-generated inefficient version.
    """
    if not lists:
        return []
    
    common = []
    for item in lists[0]:
        is_common = True
        for other_list in lists[1:]:
            found = False
            # O(N) lookup inside a loop - this is the slow part
            for other_item in other_list:
                if item == other_item:
                    found = True
                    break
            if not found:
                is_common = False
                break
        # O(N) lookup for duplicates in 'common' list
        if is_common and item not in common:
            common.append(item)
    
    return common

# Optimized version - implemented using sets (Time Complexity: O(Sum of List Lengths))
def find_common_elements_fast(lists):
    """
    Find elements that appear in ALL provided lists.
    Optimized version with better time complexity using sets.
    """
    if not lists:
        return []
        
    # 1. Initialize common_set with the set of the first list (O(N) conversion)
    common_set = set(lists[0])
    
    # 2. Iterate through the rest of the lists and find the intersection (O(M+K...))
    for sublist in lists[1:]:
        # Set intersection is highly optimized (O(min(len(set1), len(set2))))
        common_set = common_set.intersection(set(sublist))
        
        # Optimization: Early exit if the intersection becomes empty
        if not common_set:
            return []
            
    # 3. Convert the final set back to a list
    return list(common_set)

# ----------------------------------------------------------------------
# Test Cell (Question 4)

def test_question_4():
    # Basic functionality test
    test_lists = [
        [1, 2, 3, 4, 5],
        [3, 4, 5, 6, 7],
        [4, 5, 7, 8, 9]
    ]
    
    slow_result = find_common_elements_slow(test_lists)
    fast_result = find_common_elements_fast(test_lists)
    
    # Comparison check: order doesn't matter for common elements
    assert set(slow_result) == set(fast_result), "Results don't match"
    assert set(fast_result) == {4, 5}, f"Expected {{4, 5}}, got {set(fast_result)}"
    
    # Edge cases
    assert find_common_elements_fast([]) == []
    assert find_common_elements_fast([[1, 2], []]) == []
    # Note: list conversion might change order, but for a single list, it should match elements
    assert set(find_common_elements_fast([[1, 2, 3]])) == {1, 2, 3}
    
    # Performance test (rough)
    # 10 lists of 1000 elements each
    large_lists = [[i for i in range(1000)] for _ in range(10)]
    
    start_time = time.time()
    find_common_elements_fast(large_lists)
    fast_time = time.time() - start_time
    
    # Fast version should complete in reasonable time (well under 1.0s for this scale)
    assert fast_time < 1.0, f"Optimized version is still too slow ({fast_time:.4f}s)"
    
    print("✓ Question 4 tests passed!")

# Execute the test
test_question_4()

## Question 5: Fix Function with Edge Cases

**Scenario:** This AI function calculates statistics for a list of numbers, but fails on various edge cases. Make it robust.

**Requirements:**
- Handle empty lists
- Handle non-numeric values gracefully
- Handle division by zero
- Return meaningful error messages or default values

In [None]:
# ----------------------------------------------------------------------
# Question 5: Fix Function with Edge Cases
from collections import Counter
import math
from typing import List, Any, Dict, Optional

def calculate_stats(numbers: List[Any]) -> Dict[str, Optional[Union[float, int, str]]]:
    """
    Calculate basic statistics for a list of numbers, handling edge cases.
    """
    
    # 1. Filter for valid numbers (int or float, not None, and finite)
    # FIX 1: Filter out non-numeric and non-finite values (like None, strings, or math.inf)
    valid_numbers = [x for x in numbers if isinstance(x, (int, float)) and math.isfinite(x)]
    n = len(valid_numbers)
    
    # FIX 2: Handle empty lists (n=0)
    if n == 0:
        return {
            'mean': None,
            'median': None,
            'mode': None,
            'std_dev': 0.0,
            'count': 0,
            'error': 'Input list is empty or contains no valid numbers.'
        }
        
    sorted_nums = sorted(valid_numbers)
    
    # Mean
    mean = sum(sorted_nums) / n
    
    # Median
    if n % 2 == 0:
        median = (sorted_nums[n//2 - 1] + sorted_nums[n//2]) / 2
    else:
        median = sorted_nums[n//2]
    
    # Mode (most frequent)
    # NOTE: Counter works correctly even after filtering non-numeric types
    counts = Counter(sorted_nums)
    mode_value = counts.most_common(1)[0][0]
    
    # Standard deviation
    # FIX 3: Handle single-item list (n=1) where std_dev is 0 to avoid potential division-by-zero 
    # if using sample std_dev (and explicitly ensure correct population std_dev calculation).
    if n <= 1:
        std_dev = 0.0
    else:
        # Population standard deviation
        variance = sum((x - mean) ** 2 for x in sorted_nums) / n
        std_dev = variance ** 0.5
    
    return {
        'mean': mean,
        'median': median,
        'mode': mode_value,
        'std_dev': std_dev,
        'count': n
    }

# Test your solution
test_cases = [
    [1, 2, 3, 4, 5], 
    [], 
    [1], 
    [1, 1, 1], 
    [1, 'invalid', 3], 
    [1, 2, None, 4],
    [5, 8, 1, 10, 5, 8] # Even list for median
]

for i, case in enumerate(test_cases):
    print(f"Test case {i+1}: {case}")
    try:
        result = calculate_stats(case)
        print(f"  Result: {result}")
    except Exception as e:
        # Should not raise an exception after the fix
        print(f"  Error (UNEXPECTED): {e}") 
    print()

In [None]:
# ----------------------------------------------------------------------
# Question 5: Fix Function with Edge Cases
from collections import Counter
import math
from typing import List, Any, Dict, Optional, Union

def calculate_stats(numbers: List[Any]) -> Dict[str, Optional[Union[float, int, str]]]:
    """
    Calculate basic statistics for a list of numbers, handling edge cases.
    """
    
    # 1. Filter for valid numbers (int or float, not None, and finite)
    valid_numbers = [x for x in numbers if isinstance(x, (int, float)) and math.isfinite(x)]
    n = len(valid_numbers)
    
    # 2. Handle empty lists (n=0)
    if n == 0:
        return {
            'mean': None,
            'median': None,
            'mode': None,
            'std_dev': 0.0,
            'count': 0,
            'error': 'Input list is empty or contains no valid numbers.'
        }
        
    sorted_nums = sorted(valid_numbers)
    
    # Mean
    mean = sum(sorted_nums) / n
    
    # Median
    if n % 2 == 0:
        # Even number of elements: average of the two middle elements
        median = (sorted_nums[n//2 - 1] + sorted_nums[n//2]) / 2
    else:
        # Odd number of elements: the middle element
        median = sorted_nums[n//2]
    
    # Mode (most frequent)
    counts = Counter(sorted_nums)
    mode_value = counts.most_common(1)[0][0]
    
    # Standard deviation
    # Handle single-item list (n=1) where std_dev is 0
    if n <= 1:
        std_dev = 0.0
    else:
        # Population standard deviation
        variance = sum((x - mean) ** 2 for x in sorted_nums) / n
        std_dev = variance ** 0.5
    
    return {
        'mean': mean,
        'median': median,
        'mode': mode_value,
        'std_dev': std_dev,
        'count': n
    }

# ----------------------------------------------------------------------
# Test Cell (Question 5)
def test_question_5():
    # Normal case
    result = calculate_stats([1, 2, 3, 4, 5])
    assert result['mean'] == 3.0
    assert result['median'] == 3.0
    assert result['count'] == 5
    
    # Single item
    result = calculate_stats([42])
    assert result['mean'] == 42
    assert result['median'] == 42
    assert result['mode'] == 42
    assert result['std_dev'] == 0.0 # Must be 0.0 from implementation
    
    # Empty list - should handle gracefully
    result = calculate_stats([])
    assert 'error' in result and result['count'] == 0
    
    # Mixed types - should handle gracefully (only 1 and 3 are valid)
    result = calculate_stats([1, 'invalid', 3])
    assert result['count'] == 2
    assert result['mean'] == 2.0
    
    # All same values
    result = calculate_stats([5, 5, 5, 5])
    assert result['mean'] == 5.0
    assert result['std_dev'] == 0.0
    
    print("✓ Question 5 tests passed!")

# Execute the test
test_question_5()

## Question 6: Complete Partial Implementation (Pandas/Data)

### Goal
Implement `analyze_sales_data(df, group_by_column)`.

### Input
A pandas DataFrame `df` with columns:
- `product`
- `category`
- `sales`
- `profit`

### Output (must match exactly)
- Return a DataFrame **indexed by `group_by_column`** (do not reset the index).
- Include exactly these columns (names must match):
  - `sales_sum` — sum of `sales`
  - `sales_mean` — mean of `sales`
  - `profit_sum` — sum of `profit`
  - `profit_mean` — mean of `profit`
  - `profit_margin` — `profit_sum / sales_sum` (use `NaN` if `sales_sum == 0`)
- Handle missing values: treat missing `sales` or `profit` as `0` before aggregation.
- Sorting is **not required**.

### Edge Behavior
- If `df` is empty or `group_by_column` is missing, return an empty DataFrame with the required column names.

In [None]:
# ----------------------------------------------------------------------
# Question 6: Complete Partial Implementation (Pandas/Data)
import pandas as pd
import numpy as np
from typing import List

def analyze_sales_data(df: pd.DataFrame, group_by_column: str) -> pd.DataFrame:
    """
    Analyze sales data by grouping and calculating statistics.
    
    Args:
        df: DataFrame with columns ['product', 'category', 'sales', 'profit']
        group_by_column: Column name to group by
        
    Returns:
        DataFrame with aggregated statistics: 
        ['sales_sum', 'sales_mean', 'profit_sum', 'profit_mean', 'profit_margin']
    """
    required_output_cols = ['sales_sum', 'sales_mean', 'profit_sum', 'profit_mean', 'profit_margin']

    # Edge Case 1: Return empty DataFrame with required columns if input is bad
    if df.empty or group_by_column not in df.columns or 'sales' not in df.columns or 'profit' not in df.columns:
        return pd.DataFrame(columns=required_output_cols)

    df_cleaned = df.copy()
    # 1. Handle missing values: fill NaN in 'sales' and 'profit' with 0 before aggregation.
    df_cleaned[['sales', 'profit']] = df_cleaned[['sales', 'profit']].fillna(0)

    # 2. Group by the specified column and calculate statistics
    aggregated_data = df_cleaned.groupby(group_by_column).agg(
        sales_sum=('sales', 'sum'),
        sales_mean=('sales', 'mean'),
        profit_sum=('profit', 'sum'),
        profit_mean=('profit', 'mean')
    )

    # 3. Calculate profit margin: profit_sum / sales_sum
    # Use np.where to handle division by zero (where sales_sum is 0, margin should be NaN or 0)
    # Using np.nan is generally safer for statistical output.
    aggregated_data['profit_margin'] = np.where(
        aggregated_data['sales_sum'] == 0,
        np.nan,
        aggregated_data['profit_sum'] / aggregated_data['sales_sum']
    )
    
    # 4. Sort by total sales (descending)
    aggregated_data = aggregated_data.sort_values(by='sales_sum', ascending=False)
    
    # Ensure final DataFrame matches the required columns
    return aggregated_data[required_output_cols]

# Create sample data for testing
sample_data = pd.DataFrame({
    'product': ['A', 'B', 'C', 'A', 'B', 'C', 'A'],
    'category': ['Electronics', 'Electronics', 'Clothing', 'Electronics', 'Electronics', 'Clothing', 'Electronics'],
    'sales': [100, 200, 150, 120, np.nan, 180, 110],
    'profit': [20, 50, 30, 25, 40, 35, 22]
})

print("Sample data:")
print(sample_data)

print("\nAnalysis by product:")
result_product = analyze_sales_data(sample_data, 'product')
print(result_product)

print("\nAnalysis by category:")
result_category = analyze_sales_data(sample_data, 'category')
print(result_category)

In [None]:
# ----------------------------------------------------------------------
# Question 6: Complete Partial Implementation (Pandas/Data)
import pandas as pd
import numpy as np
from typing import List

def analyze_sales_data(df: pd.DataFrame, group_by_column: str) -> pd.DataFrame:
    """
    Analyze sales data by grouping and calculating statistics.
    
    Args:
        df: DataFrame with columns ['product', 'category', 'sales', 'profit']
        group_by_column: Column name to group by
        
    Returns:
        DataFrame with aggregated statistics: 
        ['sales_sum', 'sales_mean', 'profit_sum', 'profit_mean', 'profit_margin']
    """
    required_output_cols = ['sales_sum', 'sales_mean', 'profit_sum', 'profit_mean', 'profit_margin']

    # Edge Case 1: Handle bad input
    if df.empty or group_by_column not in df.columns or 'sales' not in df.columns or 'profit' not in df.columns:
        return pd.DataFrame(columns=required_output_cols)

    df_cleaned = df.copy()
    # 1. Handle missing values: fill NaN in 'sales' and 'profit' with 0
    df_cleaned[['sales', 'profit']] = df_cleaned[['sales', 'profit']].fillna(0)

    # 2. Group by the specified column and calculate statistics
    aggregated_data = df_cleaned.groupby(group_by_column).agg(
        sales_sum=('sales', 'sum'),
        sales_mean=('sales', 'mean'),
        profit_sum=('profit', 'sum'),
        profit_mean=('profit', 'mean')
    )

    # 3. Calculate profit margin: profit_sum / sales_sum
    # Use np.where to handle division by zero (set margin to NaN if sales_sum is 0)
    aggregated_data['profit_margin'] = np.where(
        aggregated_data['sales_sum'] == 0,
        np.nan,
        aggregated_data['profit_sum'] / aggregated_data['sales_sum']
    )
    
    # 4. Sort by total sales (descending)
    aggregated_data = aggregated_data.sort_values(by='sales_sum', ascending=False)
    
    # Ensure final DataFrame matches the required columns
    return aggregated_data[required_output_cols]

# ----------------------------------------------------------------------
# Test Cell (Question 6)
def test_question_6():
    # Create test data
    test_data = pd.DataFrame({
        'product': ['A', 'B', 'A', 'B', 'A'],
        'category': ['Cat1', 'Cat2', 'Cat1', 'Cat2', 'Cat1'],
        # Product A: 100 + 150 + 50 = 300 sales, 20 + 30 + 10 = 60 profit
        # Product B: 200 + 300 = 500 sales, 40 + 60 = 100 profit
        'sales': [100, 200, 150, 300, 50],
        'profit': [20, 40, 30, 60, 10]
    })
    
    # Test grouping by product
    result = analyze_sales_data(test_data, 'product')
    
    # Check structure
    assert isinstance(result, pd.DataFrame), "Should return DataFrame"
    assert len(result) == 2, "Should have 2 groups (A and B)"
    
    # Check required columns exist
    required_cols = ['sales_sum', 'sales_mean', 'profit_sum', 'profit_mean', 'profit_margin']
    for col in required_cols:
        assert col in result.columns, f"Missing column: {col}"
    
    # Check calculations for product A
    # We use .loc['A'] if 'A' is the index, or filter if the index name isn't set
    product_a = result.loc['A']
    
    # Check A calculations
    assert product_a['sales_sum'] == 300.0, "Product A sales sum should be 300.0"
    assert product_a['profit_sum'] == 60.0, "Product A profit sum should be 60.0"
    # Margin check: 60/300 = 0.2
    assert np.isclose(product_a['profit_margin'], 0.2), f"Product A margin should be 0.2, got {product_a['profit_margin']}"
    
    # Check B calculations
    product_b = result.loc['B']
    # Margin check: 100/500 = 0.2
    assert np.isclose(product_b['profit_margin'], 0.2), f"Product B margin should be 0.2, got {product_b['profit_margin']}"
    
    # Check sorting (B has 500 sales, A has 300 sales)
    assert result.index.tolist() == ['B', 'A'], "Result should be sorted by sales_sum descending (B then A)"

    print("✓ Question 6 tests passed!")

# Execute the test
test_question_6()

## Question 7: Refactor Messy AI Code (Clean Code)

**Scenario:** This AI code works but is poorly structured and hard to maintain. Refactor it following clean code principles.

**Requirements:**
- Improve readability and maintainability
- Add proper documentation
- Follow naming conventions
- Break down large functions
- Add type hints if possible

In [None]:
# ----------------------------------------------------------------------
# Question 7: Refactor Messy AI Code (Clean Code)
from typing import List, Dict, Any, Optional

# --- Helper Functions for Clean Code ---

def _is_valid_user(item: Dict[str, Any]) -> bool:
    """Checks if an item is an active, adult-aged user with a valid email."""
    
    age = item.get('age')
    email = item.get('email')
    
    # Use a single return statement with combined logical conditions
    return (
        item.get('type') == 'user' and
        item.get('active') is True and
        isinstance(age, int) and age >= 18 and
        isinstance(email, str) and '@' in email
    )

def _get_age_category(age: int) -> str:
    """Classifies the user into an age category."""
    if age >= 65:
        return 'senior'
    elif age >= 25:
        return 'adult'
    else: # age is >= 18 and < 25 (validated by _is_valid_user)
        return 'young_adult'

# --- Original Messy Code (Kept for comparison/testing) ---
def process_data(data):
    """Messy AI-generated code that works but needs refactoring - CLEAN IT UP!"""
    result = {}
    for item in data:
        # Long, deeply nested conditional block
        if 'type' in item and item['type'] == 'user':
            if 'active' in item and item['active']:
                if 'age' in item and isinstance(item['age'], int):
                    if item['age'] >= 18:
                        if 'email' in item and '@' in item['email']:
                            
                            category = 'adult'
                            if item['age'] >= 65:
                                category = 'senior'
                            elif item['age'] >= 25:
                                category = 'adult'
                            else:
                                category = 'young_adult'
                            
                            if category not in result:
                                result[category] = {'count': 0, 'emails': [], 'total_age': 0}
                            
                            result[category]['count'] += 1
                            result[category]['emails'].append(item['email'])
                            result[category]['total_age'] += item['age']
    
    # Calculate averages
    for cat in result:
        result[cat]['avg_age'] = result[cat]['total_age'] / result[cat]['count']
        del result[cat]['total_age']
    
    return result

# --- Refactored Implementation (process_user_data_clean) ---

def process_user_data_clean(data: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
    """
    Refactored version: Processes user data and aggregates statistics by age category.
    """
    category_data: Dict[str, Dict[str, Any]] = {}

    for item in data:
        # Step 1: Validation using helper function
        if not _is_valid_user(item):
            continue
            
        user_age: int = item['age']
        user_email: str = item['email']
        
        # Step 2: Categorization using helper function
        category = _get_age_category(user_age)
        
        # Step 3: Clean Aggregation using setdefault
        stats = category_data.setdefault(category, {'count': 0, 'emails': [], 'total_age': 0})
            
        # Update statistics
        stats['count'] += 1
        stats['emails'].append(user_email)
        stats['total_age'] += user_age

    # Final Step: Calculate averages and clean up
    for cat in category_data:
        stats = category_data[cat]
        # Avoid DivisionByZero (though should be safe here due to count check)
        if stats['count'] > 0: 
            stats['avg_age'] = stats['total_age'] / stats['count']
        else:
            stats['avg_age'] = 0.0
            
        del stats['total_age']
    
    return category_data

# Test data
test_data = [
    {'type': 'user', 'active': True, 'age': 25, 'email': 'user1@test.com'}, # adult
    {'type': 'user', 'active': True, 'age': 70, 'email': 'user2@test.com'}, # senior
    {'type': 'user', 'active': False, 'age': 30, 'email': 'user3@test.com'}, # inactive (excluded)
    {'type': 'admin', 'active': True, 'age': 35, 'email': 'admin@test.com'}, # wrong type (excluded)
    {'type': 'user', 'active': True, 'age': 20, 'email': 'invalid-email'}, # invalid email (excluded)
    {'type': 'user', 'active': True, 'age': 40, 'email': 'user4@test.com'}, # adult
    {'type': 'user', 'active': True, 'age': 19, 'email': 'young@test.com'}, # young_adult
]

print("Original result:")
original_result = process_data(test_data)
# Ensure float precision is comparable
print(json.dumps(original_result, indent=2, sort_keys=True))

print("\nClean result:")
clean_result = process_user_data_clean(test_data)
# Ensure float precision is comparable
print(json.dumps(clean_result, indent=2, sort_keys=True)) 

# Test Cell (omitted for brevity, assume it passes)
# test_question_7()

In [None]:
# ----------------------------------------------------------------------
# Question 7: Refactor Messy AI Code (Clean Code)
import json
from typing import List, Dict, Any, Optional, Union

# --- Helper Functions for Clean Code ---

def _is_valid_user(item: Dict[str, Any]) -> bool:
    """Checks if an item is an active, adult-aged user with a valid email."""
    
    age = item.get('age')
    email = item.get('email')
    
    return (
        item.get('type') == 'user' and
        item.get('active') is True and
        isinstance(age, int) and age >= 18 and
        isinstance(email, str) and '@' in email
    )

def _get_age_category(age: int) -> str:
    """Classifies the user into an age category."""
    if age >= 65:
        return 'senior'
    elif age >= 25:
        return 'adult'
    else: # age is >= 18 and < 25
        return 'young_adult'

# --- Original Messy Code (Kept for test comparison) ---
def process_data(data):
    """Messy AI-generated code that works but needs refactoring - CLEAN IT UP!"""
    result = {}
    for item in data:
        if 'type' in item and item['type'] == 'user':
            if 'active' in item and item['active']:
                if 'age' in item and isinstance(item['age'], int):
                    if item['age'] >= 18:
                        if 'email' in item and '@' in item['email']:
                            
                            # Logic to determine category
                            category = 'adult'
                            if item['age'] >= 65:
                                category = 'senior'
                            elif item['age'] >= 25:
                                category = 'adult'
                            else:
                                category = 'young_adult'
                            
                            # Logic to initialize dictionary
                            if category not in result:
                                result[category] = {'count': 0, 'emails': [], 'total_age': 0}
                            
                            # Aggregation
                            result[category]['count'] += 1
                            result[category]['emails'].append(item['email'])
                            result[category]['total_age'] += item['age']
    
    # Calculate averages
    for cat in result:
        result[cat]['avg_age'] = result[cat]['total_age'] / result[cat]['count']
        del result[cat]['total_age']
    
    return result

# --- Refactored Implementation (process_user_data_clean) ---
def process_user_data_clean(data: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
    """
    Refactored version: Processes user data and aggregates statistics by age category.
    """
    category_data: Dict[str, Dict[str, Any]] = {}

    for item in data:
        # Step 1: Validation
        if not _is_valid_user(item):
            continue
            
        user_age: int = item['age']
        user_email: str = item['email']
        
        # Step 2: Categorization
        category = _get_age_category(user_age)
        
        # Step 3: Clean Aggregation using setdefault
        stats = category_data.setdefault(category, {'count': 0, 'emails': [], 'total_age': 0})
            
        stats['count'] += 1
        stats['emails'].append(user_email)
        stats['total_age'] += user_age

    # Final Step: Calculate averages and clean up
    for cat in category_data:
        stats = category_data[cat]
        stats['avg_age'] = stats['total_age'] / stats['count']
        del stats['total_age']
    
    return category_data

# ----------------------------------------------------------------------
# Test Cell (Question 7)
def test_question_7():
    test_data = [
        {'type': 'user', 'active': True, 'age': 25, 'email': 'user1@test.com'}, # adult
        {'type': 'user', 'active': True, 'age': 70, 'email': 'user2@test.com'}, # senior
        {'type': 'user', 'active': False, 'age': 30, 'email': 'user3@test.com'}, # inactive (excluded)
        {'type': 'user', 'active': True, 'age': 20, 'email': 'user4@test.com'}, # young_adult
    ]
    
    original_result = process_data(test_data)
    clean_result = process_user_data_clean(test_data)
    
    # Results should be functionally equivalent
    assert set(original_result.keys()) == set(clean_result.keys()), "Categories don't match"
    
    for category in original_result:
        assert original_result[category]['count'] == clean_result[category]['count'], f"Count mismatch for {category}"
        # Use abs difference for floating point comparison
        assert abs(original_result[category]['avg_age'] - clean_result[category]['avg_age']) < 0.01, f"Average age mismatch for {category}"
    
    print("✓ Question 7 tests passed!")

# Execute the test
test_question_7()

## Question 8: Debug Complex Logic (Algorithms)

**Scenario:** This AI implementation of binary search has subtle bugs. Find and fix all the issues.

**Requirements:**
- Fix the binary search algorithm
- Handle edge cases properly
- Maintain O(log n) time complexity
- Return correct index or -1 if not found

In [None]:
# ----------------------------------------------------------------------
# Question 8: Debug Complex Logic (Algorithms)

def binary_search_fixed(arr, target):
    """
    Binary search implementation with bugs - FIND AND FIX THEM!
    
    Args:
        arr: Sorted list of integers
        target: Value to search for
        
    Returns:
        int: Index of target if found, -1 otherwise
    """
    if not arr:
        return -1
        
    left = 0
    # FIX 1: Set right to the last valid index (len(arr) - 1)
    right = len(arr) - 1
    
    # FIX 2: Use left <= right to ensure the element at mid is checked 
    # even when left and right converge.
    while left <= right:
        # Calculate mid. Using left + (right - left) // 2 prevents integer overflow
        # but the simple form is fine for Python's standard integers.
        mid = (left + right) // 2
        
        if arr[mid] == target:
            return mid
        # FIX 3: If target is greater, search the right side (excluding mid)
        elif arr[mid] < target:
            left = mid + 1
        # FIX 4: If target is smaller, search the left side (excluding mid)
        else: # arr[mid] > target
            right = mid - 1
    
    return -1

# Test cases
test_arrays = [
    ([1, 3, 5, 7, 9, 11], 7),     # Should find at index 3
    ([1, 3, 5, 7, 9, 11], 1),     # Should find at index 0
    ([1, 3, 5, 7, 9, 11], 11),    # Should find at index 5
    ([1, 3, 5, 7, 9, 11], 6),     # Should return -1
    ([5], 5),                    # Single element found
    ([5], 3),                    # Single element not found
    ([], 5),                      # Empty array
]

for arr, target in test_arrays:
    # Use the fixed function name
    result = binary_search_fixed(arr, target)
    print(f"Searching for {target} in {arr}: {result}")

In [None]:
# ----------------------------------------------------------------------
# Question 8: Debug Complex Logic (Algorithms)

def binary_search_buggy(arr, target):
    """
    Binary search implementation with bugs - FIND AND FIX THEM!
    (Function name is kept as binary_search_buggy to satisfy test cell, 
     but the implementation is the FIXed one.)
    
    Args:
        arr: Sorted list of integers
        target: Value to search for
        
    Returns:
        int: Index of target if found, -1 otherwise
    """
    if not arr:
        return -1
        
    left = 0
    # FIX 1: Set right to the last valid index (len(arr) - 1)
    right = len(arr) - 1
    
    # FIX 2: Use left <= right for inclusive search bounds
    while left <= right:
        mid = (left + right) // 2
        
        if arr[mid] == target:
            return mid
        # FIX 3: If too small, search right side (mid + 1)
        elif arr[mid] < target:
            left = mid + 1
        # FIX 4: If too large, search left side (mid - 1)
        else: # arr[mid] > target
            right = mid - 1
    
    return -1

# ----------------------------------------------------------------------
# Test Cell (Question 8)
def test_question_8():
    # Test cases with expected results
    test_cases = [
        ([1, 3, 5, 7, 9, 11], 7, 3),        # Found at index 3
        ([1, 3, 5, 7, 9, 11], 1, 0),        # Found at index 0
        ([1, 3, 5, 7, 9, 11], 11, 5),       # Found at index 5
        ([1, 3, 5, 7, 9, 11], 6, -1),       # Not found
        ([1, 3, 5, 7, 9, 11], 0, -1),       # Less than min
        ([1, 3, 5, 7, 9, 11], 12, -1),      # Greater than max
        ([5], 5, 0),                        # Single element found
        ([5], 3, -1),                       # Single element not found
        ([], 5, -1),                        # Empty array
    ]
    
    for arr, target, expected in test_cases:
        # Calls the corrected function logic
        result = binary_search_buggy(arr, target) 
        assert result == expected, f"Failed for {target} in {arr}: expected {expected}, got {result}"
    
    # Test that it actually uses binary search (check performance/correctness on large array)
    large_array = list(range(0, 10000, 2))  # [0, 2, 4, 6, ..., 9998]
    # 5000 is the 2500th element (0th element is 0, 1st is 2, ..., 2500th is 5000)
    result = binary_search_buggy(large_array, 5000) 
    assert result == 2500, "Should find 5000 at index 2500"
    
    print("✓ Question 8 tests passed!")

# Execute the test
test_question_8()

## Question 9: Add Missing Functionality

**Scenario:** This AI code provides a basic cache implementation but is missing several key features. Add the missing functionality to make it production-ready.

**Requirements:**
- Add TTL (time-to-live) support for automatic expiration
- Add size limit with LRU (Least Recently Used) eviction
- Add cache statistics tracking (hits, misses, evictions)
- Add methods for cache management (clear, size, cleanup)
- Handle thread safety considerations

In [None]:
import time
from typing import Any, Optional, Dict, Union
from collections import OrderedDict
import math

class SimpleCache:
    """
    Enhanced cache implementation with TTL, size limits, LRU eviction, and statistics.
    """
    
    def __init__(self, max_size: int = 100, default_ttl: Optional[int] = None):
        """
        Initialize cache with size limit and default TTL.
        """
        self.max_size = max_size
        self.default_ttl = default_ttl
        
        # Use OrderedDict for combined data storage and LRU tracking. 
        # The order represents insertion/access time (LRU is at the beginning).
        self._data: OrderedDict[str, Dict[str, Union[Any, float, int]]] = OrderedDict()
        
        # Statistics
        self._stats = {
            'hits': 0,
            'misses': 0,
            'evictions': 0,
            'expired_removals': 0
        }
        
    def _is_expired(self, key: str) -> bool:
        """Check if a cache entry has expired."""
        entry = self._data.get(key)
        if entry is None:
            return False
            
        # Expiration time is stored as a timestamp (float)
        expiry_time = entry.get('expiry')
        
        # If expiry is None or math.inf, it never expires
        if expiry_time is None or expiry_time == math.inf:
            return False
            
        return time.time() > expiry_time

    def get(self, key: str) -> Optional[Any]:
        """
        Get value from cache.
        """
        # 1. Check if key exists (Miss/Hit stat)
        if key not in self._data:
            self._stats['misses'] += 1
            return None
        
        # 2. Check if item has expired (TTL)
        if self._is_expired(key):
            self.delete(key, is_expired_removal=True) # Delete and update stat
            self._stats['misses'] += 1 # Treat as a miss
            return None
        
        # 3. Update LRU order (move to end)
        entry = self._data.pop(key)
        self._data[key] = entry
        
        # 4. Update hit statistics
        self._stats['hits'] += 1
        
        return entry['value']
    
    def set(self, key: str, value: Any, ttl: Optional[int] = None) -> None:
        """
        Set value in cache.
        """
        current_ttl = ttl if ttl is not None else self.default_ttl
        
        # 1. Calculate expiration time if TTL provided
        expiry_time = None
        if current_ttl is not None and current_ttl > 0:
            expiry_time = time.time() + current_ttl
        elif current_ttl is not None and current_ttl <= 0:
            # If TTL is 0 or negative, treat it as immediate expiry or non-storage
            return 
        elif current_ttl is None:
            expiry_time = math.inf # Use a large value to signify permanent storage
        
        # Remove existing key to update LRU order and ensure new data is stored
        if key in self._data:
            self._data.pop(key)
        
        # 2. Check if cache is full and evict LRU items
        if len(self._data) >= self.max_size:
            # Evict the oldest (first) item
            self._evict_lru(count=1)
        
        # 3. Store value with metadata
        self._data[key] = {
            'value': value,
            'expiry': expiry_time
        }
        # 4. LRU order is automatically updated as it was popped/re-added or newly added
        
    def delete(self, key: str, is_expired_removal: bool = False) -> bool:
        """Delete key from cache."""
        if key in self._data:
            del self._data[key]
            
            # Update expired removal stat only if called from _is_expired check
            if is_expired_removal:
                self._stats['expired_removals'] += 1
                
            return True
        return False

    # --- Missing Management Methods ---

    def clear(self) -> None:
        """Clear all items from cache and reset statistics."""
        self._data.clear()
        self._stats = {
            'hits': 0,
            'misses': 0,
            'evictions': 0,
            'expired_removals': 0
        }
    
    def size(self) -> int:
        """Return current number of items in cache."""
        return len(self._data)
    
    def get_stats(self) -> Dict[str, int]:
        """
        Get cache statistics.
        
        Returns:
            Dict with keys: hits, misses, evictions, expired_removals, current_size
        """
        stats_copy = self._stats.copy()
        stats_copy['current_size'] = self.size()
        return stats_copy
    
    def cleanup_expired(self) -> int:
        """
        Remove expired items from cache.
        
        Returns:
            Number of items removed
        """
        keys_to_remove = [key for key in self._data if self._is_expired(key)]
        removed_count = 0
        
        for key in keys_to_remove:
            # Delete calls the internal deletion logic, updating stats
            self.delete(key, is_expired_removal=True) 
            removed_count += 1
            
        return removed_count
    
    def _evict_lru(self, count: int = 1) -> int:
        """
        Evict least recently used items (from the beginning of OrderedDict).
        """
        evicted_count = 0
        for _ in range(count):
            try:
                # popitem(last=False) removes and returns the first (LRU) item
                self._data.popitem(last=False) 
                self._stats['evictions'] += 1
                evicted_count += 1
            except KeyError:
                # Cache is empty
                break
                
        return evicted_count

# Test your enhanced implementation
if __name__ == "__main__":
    # Test TTL functionality
    cache = SimpleCache(max_size=3, default_ttl=1)  # 1 second TTL
    
    print("=== Testing TTL ===")
    cache.set("temp_key", "temp_value") # TTL is 1 sec
    print(f"Immediately after set (Hit): {cache.get('temp_key')}")
    time.sleep(1.1)
    # The get() call should delete the expired item and return None
    print(f"After TTL expired (Miss/Removal): {cache.get('temp_key')}") 
    
    print("\n=== Testing Size Limits & LRU ===")
    cache.clear()
    cache.set("a", 1, ttl=None) # No expiration
    cache.set("b", 2, ttl=None)
    cache.set("c", 3, ttl=None)
    print(f"Cache size after adding 3 items: {cache.size()}") # Expected: 3
    
    # Access 'a' to make it recently used (b, c, a order)
    cache.get("a") 
    
    # Add 'd' which should evict 'b' (least recently used)
    cache.set("d", 4, ttl=None)
    # Order should be c, a, d
    print(f"After adding 'd': a={cache.get('a')}, b={cache.get('b')}, c={cache.get('c')}, d={cache.get('d')}")
    assert cache.get('b') is None # 'b' should be evicted

    print("\n=== Testing Statistics ===")
    stats = cache.get_stats()
    print(f"Cache statistics: {stats}")
    
    # Expected stats after tests:
    # Hits: 2 (temp_key, a)
    # Misses: 2 (temp_key after expiry, b after eviction)
    # Evictions: 1 (b)
    # Expired_Removals: 1 (temp_key)
    
    print("\n=== Testing Cleanup ===")
    cache.set("expire_me_1", "value", ttl=1)
    cache.set("expire_me_2", "value", ttl=1)
    cache.set("permanent", "value", ttl=None) # Will not expire
    
    time.sleep(1.1)
    removed_count = cache.cleanup_expired()
    print(f"Expired items removed: {removed_count}") # Expected: 2
    
    final_stats = cache.get_stats()
    print(f"Final statistics: {final_stats}")
    assert final_stats['current_size'] == 1 # Only 'permanent' should remain
    assert final_stats['expired_removals'] == 3 # 1 from TTL test + 2 from cleanup

In [None]:
# ----------------------------------------------------------------------
# Question 9: Add Missing Features (Complex Class Enhancement)
import time
from typing import Any, Optional, Dict, Union
from collections import OrderedDict
import math

class SimpleCache:
    """
    Enhanced cache implementation with TTL, size limits, LRU eviction, and statistics.
    """
    
    def __init__(self, max_size: int = 100, default_ttl: Optional[int] = None):
        """
        Initialize cache with size limit and default TTL.
        """
        self.max_size = max_size
        self.default_ttl = default_ttl
        
        # Use OrderedDict for combined data storage and LRU tracking. 
        # The order represents insertion/access time (LRU is at the beginning).
        self._data: OrderedDict[str, Dict[str, Union[Any, float, int]]] = OrderedDict()
        
        # Statistics
        self._stats = {
            'hits': 0,
            'misses': 0,
            'evictions': 0,
            'expired_removals': 0
        }
        
    def _is_expired(self, key: str) -> bool:
        """Check if a cache entry has expired."""
        entry = self._data.get(key)
        if entry is None:
            return False
            
        # Expiration time is stored as a timestamp (float)
        expiry_time = entry.get('expiry')
        
        # If expiry is None or math.inf, it never expires
        if expiry_time is None or expiry_time == math.inf:
            return False
            
        return time.time() > expiry_time

    def get(self, key: str) -> Optional[Any]:
        """
        Get value from cache.
        """
        # 1. Check if key exists
        if key not in self._data:
            self._stats['misses'] += 1
            return None
        
        # 2. Check if item has expired (TTL)
        if self._is_expired(key):
            # Treat as miss and delete it
            self.delete(key, is_expired_removal=True) 
            self._stats['misses'] += 1 
            return None
        
        # 3. Update LRU order (move to end)
        entry = self._data.pop(key)
        self._data[key] = entry
        
        # 4. Update hit statistics
        self._stats['hits'] += 1
        
        return entry['value']
    
    def set(self, key: str, value: Any, ttl: Optional[int] = None) -> None:
        """
        Set value in cache, handling LRU eviction if max_size is reached.
        """
        current_ttl = ttl if ttl is not None else self.default_ttl
        
        # 1. Calculate expiration time
        expiry_time = None
        if current_ttl is not None and current_ttl > 0:
            expiry_time = time.time() + current_ttl
        elif current_ttl is None:
            expiry_time = math.inf # Permanent storage
        else: # TTL is 0 or negative
            return
        
        # Remove existing key to update LRU order/data
        if key in self._data:
            self._data.pop(key)
        
        # 2. Check if cache is full and evict LRU items
        if len(self._data) >= self.max_size:
            self._evict_lru(count=1) # Evicts the oldest item
        
        # 3. Store value with metadata (and updates LRU order implicitly)
        self._data[key] = {
            'value': value,
            'expiry': expiry_time
        }
        
    def delete(self, key: str, is_expired_removal: bool = False) -> bool:
        """Delete key from cache."""
        if key in self._data:
            del self._data[key]
            
            if is_expired_removal:
                self._stats['expired_removals'] += 1
                
            return True
        return False

    def clear(self) -> None:
        """Clear all items from cache and reset statistics."""
        self._data.clear()
        self._stats = {
            'hits': 0,
            'misses': 0,
            'evictions': 0,
            'expired_removals': 0
        }
    
    def size(self) -> int:
        """Return current number of items in cache."""
        return len(self._data)
    
    def get_stats(self) -> Dict[str, int]:
        """
        Get cache statistics.
        """
        stats_copy = self._stats.copy()
        stats_copy['current_size'] = self.size()
        return stats_copy
    
    def cleanup_expired(self) -> int:
        """
        Remove expired items from cache.
        """
        keys_to_remove = [key for key in self._data if self._is_expired(key)]
        removed_count = 0
        
        for key in keys_to_remove:
            self.delete(key, is_expired_removal=True) 
            removed_count += 1
            
        return removed_count
    
    def _evict_lru(self, count: int = 1) -> int:
        """
        Evict least recently used items (from the beginning of OrderedDict).
        """
        evicted_count = 0
        for _ in range(count):
            try:
                # popitem(last=False) removes and returns the first (LRU) item
                self._data.popitem(last=False) 
                self._stats['evictions'] += 1
                evicted_count += 1
            except KeyError:
                break
                
        return evicted_count

# ----------------------------------------------------------------------
# Test Cell (Question 9)
import time

def test_question_9():
    print("Testing enhanced cache implementation...")
    
    # Test 1: Basic functionality
    cache = SimpleCache(max_size=3, default_ttl=60)
    
    cache.set("key1", "value1")
    cache.set("key2", "value2")
    
    assert cache.get("key1") == "value1", "Basic get/set failed"
    assert cache.get("key2") == "value2", "Basic get/set failed"
    assert cache.size() == 2, f"Expected size 2, got {cache.size()}"
    
    # Test 2: TTL expiration
    cache.clear()
    cache.set("ttl_key", "ttl_value", ttl=1)  # 1 second TTL
    assert cache.get("ttl_key") == "ttl_value", "TTL key should be accessible immediately"
    
    time.sleep(1.1)  # Wait for expiration
    assert cache.get("ttl_key") is None, "TTL key should be expired and return None"
    
    # Test 3: Size limits and LRU eviction
    cache.clear()
    cache.set("a", 1) # LRU: a
    cache.set("b", 2) # LRU: a, b
    cache.set("c", 3) # LRU: a, b, c. Cache is now full (max_size=3)
    
    # Access 'a' to make it recently used (b, c, a)
    cache.get("a")
    
    # Add 'd' which should evict 'b' (least recently used)
    cache.set("d", 4) # LRU: c, a, d
    
    assert cache.get("a") == 1, "Recently used 'a' should not be evicted"
    assert cache.get("b") is None, "Least recently used 'b' should be evicted"
    assert cache.get("c") == 3, "'c' should still be in cache"
    assert cache.get("d") == 4, "Newly added 'd' should be in cache"
    assert cache.size() == 3, "Cache size should remain at max_size"
    
    # Test 4: Statistics tracking
    cache.clear()
    cache.set("stat_key", "stat_value")
    cache.get("stat_key")  # Hit 1
    cache.get("nonexistent")  # Miss 1
    cache.set("evict_me", 0) # LRU: stat_key, evict_me
    cache.set("force_evict", 0) # LRU: stat_key, evict_me, force_evict
    cache.set("new", 0) # Evicts stat_key, Evictions 1. LRU: evict_me, force_evict, new

    stats = cache.get_stats()
    required_stats = ["hits", "misses", "evictions", "current_size", "expired_removals"]
    for stat in required_stats:
        assert stat in stats, f"Missing statistic: {stat}"
    
    assert stats["hits"] == 1, f"Expected 1 hit, got {stats['hits']}"
    assert stats["misses"] == 1, f"Expected 1 miss, got {stats['misses']}"
    assert stats["evictions"] == 1, f"Expected 1 eviction, got {stats['evictions']}"
    assert stats["current_size"] == 3, f"Expected current size 3, got {stats['current_size']}"
    
    # Test 5: Manual cleanup
    cache.clear()
    cache.set("expire1", "value1", ttl=1)
    cache.set("expire2", "value2", ttl=1)
    cache.set("keep", "value3", ttl=None)  # No expiration
    
    time.sleep(1.1)  # Wait for expiration
    removed_count = cache.cleanup_expired()
    
    assert removed_count == 2, f"Should have removed 2 expired items, removed {removed_count}"
    assert cache.get("keep") == "value3", "Non-expiring item should remain"
    assert cache.size() == 1, "Only one item should remain after cleanup"
    
    # Test 6: Edge cases
    cache.clear()
    assert cache.size() == 0, "Cache should be empty after clear"
    assert cache.get("nonexistent") is None, "Getting non-existent key should return None"
    assert cache.delete("nonexistent") == False, "Deleting non-existent key should return False"
    
    # Test delete functionality
    cache.set("delete_me", "value")
    assert cache.delete("delete_me") == True, "Deleting existing key should return True"
    assert cache.get("delete_me") is None, "Deleted key should not be accessible"
    
    print("✓ All Question 9 tests passed!")

# Execute the test
test_question_9()

## Question 10: Integration Challenge (Multiple Components)

**Scenario:** You have three separate AI-generated modules that need to work together in a data processing pipeline, but they have interface mismatches and compatibility issues. Your job is to create the integration layer that makes them work together seamlessly.

**Requirements:**
- Create adapter/wrapper functions to handle data format conversions
- Build a unified pipeline that chains all three components
- Add comprehensive error handling for the integration
- Handle edge cases and invalid data gracefully
- Create helper functions for data transformation


In [None]:
import json
from typing import List, Dict, Any, Tuple, Optional, Union

# Component 1: Data Processor (returns dict with specific structure)
class DataProcessor:
    """AI Component 1 - processes raw data and returns structured dict"""
    
    def process_data(self, raw_data: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Process raw data and return structured dict."""
        if not isinstance(raw_data, list):
            raise ValueError("Expected list input")
        
        result = {
            'total_items': len(raw_data),
            'processed_items': [],
            'metadata': {'processing_time': 0.1, 'timestamp': '2024-01-01T12:00:00Z'}
        }
        
        for item in raw_data:
            # Check for dict and valid numeric value before processing
            is_valid = isinstance(item, dict) and 'value' in item and isinstance(item['value'], (int, float))
            
            if is_valid:
                result['processed_items'].append({
                    'id': item.get('id', 'unknown'),
                    'processed_value': item['value'] * 2,
                    'original_value': item['value'],
                    'status': 'processed'
                })
            else:
                result['processed_items'].append({
                    'id': 'error',
                    'processed_value': 0,
                    'original_value': item.get('value', None) if isinstance(item, dict) else None,
                    'status': 'failed'
                })
        
        return result

# Component 2: Analytics Engine (expects JSON string, returns tuple)
class AnalyticsEngine:
    """AI Component 2 - performs analytics on data, expects JSON string input"""
    
    def analyze(self, json_data_string: str) -> Tuple[Optional[str], Union[Dict[str, float], str]]:
        """Analyze data from JSON string, return (summary, metrics) tuple."""
        try:
            data = json.loads(json_data_string)
        except json.JSONDecodeError:
            return None, "Invalid JSON format"
        
        if not isinstance(data, dict) or 'processed_items' not in data:
            return None, "Missing processed_items in data structure"
        
        items = data['processed_items']
        if not isinstance(items, list):
            return None, "processed_items must be a list"
        
        # Extract numeric values for analysis
        values = []
        failed_count = 0
        
        for item in items:
            if isinstance(item, dict) and item.get('status') == 'processed':
                if 'processed_value' in item and isinstance(item['processed_value'], (int, float)):
                    values.append(item['processed_value'])
            else:
                failed_count += 1
        
        if not values:
            total_items = len(items)
            return None, f"No valid numeric data found for analysis (Total items: {total_items})"
        
        summary = f"Analyzed {len(items)} items ({len(values)} successful, {failed_count} failed)"
        metrics = {
            'avg_value': sum(values) / len(values),
            'max_value': max(values),
            'min_value': min(values),
            'total_value': sum(values),
            'success_rate': len(values) / len(items) if items else 0.0
        }
        
        return summary, metrics

# Component 3: Report Generator (expects list of tuples, returns formatted string)
class ReportGenerator:
    """AI Component 3 - generates reports from analytics results"""
    
    def generate_report(self, analytics_results_list: List[Tuple[Optional[str], Union[Dict, str]]]) -> str:
        """Generate report from list of (summary, metrics) tuples."""
        if not isinstance(analytics_results_list, list):
            return "Error: Expected list input for report generation"
        
        if not analytics_results_list:
            return "Error: No data provided for report generation"
        
        report_lines = [
            "=" * 50,
            "           ANALYSIS REPORT",
            "=" * 50
        ]
        
        for i, result in enumerate(analytics_results_list):
            report_lines.append(f"\n--- DATASET {i+1} ---")

            if not isinstance(result, tuple) or len(result) != 2:
                report_lines.append("Analysis failed: Invalid data format - expected (summary, metrics) tuple")
                continue
            
            summary, metrics = result
            
            if summary is None:
                report_lines.append("Analysis failed")
                report_lines.append(f"  Error: {metrics}")
                continue
            
            report_lines.append(f"Summary: {summary}")
            
            if isinstance(metrics, dict):
                report_lines.append("Metrics:")
                for key, value in metrics.items():
                    if isinstance(value, float):
                        report_lines.append(f"    {key}: {value:.2f}")
                    else:
                        report_lines.append(f"    {key}: {value}")
            else:
                report_lines.append(f"Metrics: {metrics}")
        
        report_lines.append("\n" + "=" * 50)
        return "\n".join(report_lines)

# ----------------------------------------------------------------------
# INTEGRATION FUNCTIONS (Solution)

def dict_to_json_adapter(data_dict: Dict[str, Any]) -> str:
    """
    Convert dictionary to JSON string for AnalyticsEngine.
    """
    try:
        # Use json.dumps to safely serialize the dictionary
        return json.dumps(data_dict)
    except TypeError:
        # Handle cases where the dict contains unserializable types
        raise ValueError("Data dictionary contains un-serializable types for JSON conversion.")

def validate_and_clean_raw_data(raw_data: Any) -> List[Dict[str, Any]]:
    """
    Validate and clean raw input data.
    Ensures the output is always a list of dictionaries.
    """
    # If the input is not a list, wrap it in a list if it's a dict, otherwise return an empty list.
    if not isinstance(raw_data, list):
        if isinstance(raw_data, dict):
            raw_data = [raw_data]
        else:
            return []
    
    # Simple cleaning: ensure every item in the list is a dictionary.
    cleaned_list = [item if isinstance(item, dict) else {} for item in raw_data]
    return cleaned_list

def integrated_pipeline(raw_data_list: List[Any]) -> str:
    """
    Integrate all three components to process data end-to-end.
    """
    # Step 1: Initialize components
    processor = DataProcessor()
    analytics = AnalyticsEngine()
    reporter = ReportGenerator()
    
    analytics_results = []
    
    for i, raw_data in enumerate(raw_data_list):
        try:
            # Step 2: Validate and clean each raw dataset
            cleaned_data = validate_and_clean_raw_data(raw_data)
            
            # Step 3: Process through DataProcessor
            processed_dict = processor.process_data(cleaned_data)
            
            # Step 4: Convert results to JSON string (Adapter)
            json_data = dict_to_json_adapter(processed_dict)
            
            # Step 5: Run analytics
            analysis_result = analytics.analyze(json_data)
            
        except (ValueError, TypeError, Exception) as e:
            # Gracefully handle component errors
            analysis_result = (None, f"Pipeline Error (Dataset {i+1}): {type(e).__name__}: {str(e)}")
            
        # Step 6: Collect results
        analytics_results.append(analysis_result)
        
    # Step 7: Generate final report
    return reporter.generate_report(analytics_results)

# ----------------------------------------------------------------------
# Test and Execution

def create_sample_data() -> List[Any]:
    """Create sample test data for the pipeline."""
    return [
        # Dataset 1: Normal data (List[Dict])
        [
            {'id': 'A1', 'value': 10},
            {'id': 'A2', 'value': 20},
            {'id': 'A3', 'value': 15}
        ],
        # Dataset 2: Smaller dataset (List[Dict])
        [
            {'id': 'B1', 'value': 5},
            {'id': 'B2', 'value': 25}
        ],
        # Dataset 3: Mixed data with issues (List[Dict])
        [
            {'id': 'C1', 'value': 30},
            {'id': 'C2'},  # Missing value
            {'value': 40},  # Missing id
            {'id': 'C4', 'value': 'invalid'},  # Invalid value type (will fail in DataProcessor)
            'not_a_dict' # Invalid item type (will be cleaned to {})
        ],
        # Dataset 4: Error test - raw input is not a list/dict (should be cleaned to [])
        "this is a string, not data", 
        # Dataset 5: Single dictionary input (should be wrapped by validate)
        {'id': 'D1', 'value': 100} 
    ]

# Test the integration
if __name__ == "__main__":
    import traceback
    print("Testing component integration...")
    
    print("\n=== Testing Individual Components (Sanity Check) ===")
    
    processor = DataProcessor()
    analytics = AnalyticsEngine()
    reporter = ReportGenerator()
    
    # Test DataProcessor
    test_data = [{'id': 'test', 'value': 10}, {'id': 'test2', 'value': 'bad'}]
    processed = processor.process_data(test_data)
    print(f"DataProcessor output: {processed}")
    
    # Test AnalyticsEngine
    json_data = json.dumps(processed)
    analysis_result = analytics.analyze(json_data)
    print(f"AnalyticsEngine output: {analysis_result}")
    
    # Test ReportGenerator
    report = reporter.generate_report([analysis_result])
    print(f"ReportGenerator output:\n{report}")
    
    print("\n=== Testing Integrated Pipeline (Full End-to-End) ===")
    
    sample_datasets = create_sample_data()
    
    try:
        final_report = integrated_pipeline(sample_datasets)
        print("Integration successful!")
        print(final_report)
    except Exception as e:
        print(f"Integration failed: {e}")
        traceback.print_exc()

In [None]:
import json
from typing import List, Dict, Any, Tuple, Optional, Union
import traceback

# ----------------------------------------------------------------------
# Component 1: Data Processor (returns dict with specific structure)
class DataProcessor:
    """AI Component 1 - processes raw data and returns structured dict"""
    
    def process_data(self, raw_data: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Process raw data and return structured dict."""
        if not isinstance(raw_data, list):
            raise ValueError("Expected list input")
        
        result = {
            'total_items': len(raw_data),
            'processed_items': [],
            'metadata': {'processing_time': 0.1, 'timestamp': '2024-01-01T12:00:00Z'}
        }
        
        for item in raw_data:
            # Check for dict and valid numeric value before processing
            is_valid = isinstance(item, dict) and 'value' in item and isinstance(item['value'], (int, float))
            
            if is_valid:
                result['processed_items'].append({
                    'id': item.get('id', 'unknown'),
                    'processed_value': item['value'] * 2,
                    'original_value': item['value'],
                    'status': 'processed'
                })
            else:
                result['processed_items'].append({
                    'id': 'error',
                    'processed_value': 0,
                    'original_value': item.get('value', None) if isinstance(item, dict) else None,
                    'status': 'failed'
                })
        
        return result

# Component 2: Analytics Engine (expects JSON string, returns tuple)
class AnalyticsEngine:
    """AI Component 2 - performs analytics on data, expects JSON string input"""
    
    def analyze(self, json_data_string: str) -> Tuple[Optional[str], Union[Dict[str, float], str]]:
        """Analyze data from JSON string, return (summary, metrics) tuple."""
        try:
            data = json.loads(json_data_string)
        except json.JSONDecodeError:
            return None, "Invalid JSON format"
        
        if not isinstance(data, dict) or 'processed_items' not in data:
            return None, "Missing processed_items in data structure"
        
        items = data['processed_items']
        if not isinstance(items, list):
            return None, "processed_items must be a list"
        
        # Extract numeric values for analysis
        values = []
        failed_count = 0
        
        for item in items:
            if isinstance(item, dict) and item.get('status') == 'processed':
                if 'processed_value' in item and isinstance(item['processed_value'], (int, float)):
                    values.append(item['processed_value'])
            else:
                failed_count += 1
        
        if not values:
            total_items = len(items)
            return None, f"No valid numeric data found for analysis (Total items: {total_items})"
        
        summary = f"Analyzed {len(items)} items ({len(values)} successful, {failed_count} failed)"
        metrics = {
            'avg_value': sum(values) / len(values),
            'max_value': max(values),
            'min_value': min(values),
            'total_value': sum(values),
            'success_rate': len(values) / len(items) if items else 0.0
        }
        
        return summary, metrics

# Component 3: Report Generator (expects list of tuples, returns formatted string)
class ReportGenerator:
    """AI Component 3 - generates reports from analytics results"""
    
    def generate_report(self, analytics_results_list: List[Tuple[Optional[str], Union[Dict, str]]]) -> str:
        """Generate report from list of (summary, metrics) tuples."""
        if not isinstance(analytics_results_list, list):
            return "Error: Expected list input for report generation"
        
        if not analytics_results_list:
            return "Error: No data provided for report generation"
        
        report_lines = [
            "=" * 50,
            "           ANALYSIS REPORT",
            "=" * 50
        ]
        
        for i, result in enumerate(analytics_results_list):
            report_lines.append(f"\n--- DATASET {i+1} ---")

            if not isinstance(result, tuple) or len(result) != 2:
                report_lines.append("Analysis failed: Invalid data format - expected (summary, metrics) tuple")
                continue
            
            summary, metrics = result
            
            if summary is None:
                report_lines.append("Analysis failed")
                report_lines.append(f"  Error: {metrics}")
                continue
            
            report_lines.append(f"Summary: {summary}")
            
            if isinstance(metrics, dict):
                report_lines.append("Metrics:")
                for key, value in metrics.items():
                    if isinstance(value, float):
                        report_lines.append(f"    {key}: {value:.2f}")
                    else:
                        report_lines.append(f"    {key}: {value}")
            else:
                report_lines.append(f"Metrics: {metrics}")
        
        report_lines.append("\n" + "=" * 50)
        return "\n".join(report_lines)

# ----------------------------------------------------------------------
# INTEGRATION FUNCTIONS (Solution)

def dict_to_json_adapter(data_dict: Dict[str, Any]) -> str:
    """
    Convert dictionary to JSON string for AnalyticsEngine.
    """
    try:
        return json.dumps(data_dict)
    except TypeError:
        raise ValueError("Data dictionary contains un-serializable types for JSON conversion.")

def validate_and_clean_raw_data(raw_data: Any) -> List[Dict[str, Any]]:
    """
    Validate and clean raw input data.
    Ensures the output is always a list of dictionaries.
    """
    if not isinstance(raw_data, list):
        if isinstance(raw_data, dict):
            raw_data = [raw_data]
        else:
            return []
    
    cleaned_list = [item if isinstance(item, dict) else {} for item in raw_data]
    return cleaned_list

def integrated_pipeline(raw_data_list: List[Any]) -> str:
    """
    Integrate all three components to process data end-to-end.
    """
    processor = DataProcessor()
    analytics = AnalyticsEngine()
    reporter = ReportGenerator()
    
    analytics_results = []
    
    for i, raw_data in enumerate(raw_data_list):
        analysis_result: Tuple[Optional[str], Union[Dict[str, float], str]]
        try:
            # Step 1: Validate and clean
            cleaned_data = validate_and_clean_raw_data(raw_data)
            
            # Step 2: Process
            processed_dict = processor.process_data(cleaned_data)
            
            # Step 3: Convert to JSON (Adapter)
            json_data = dict_to_json_adapter(processed_dict)
            
            # Step 4: Run analytics
            analysis_result = analytics.analyze(json_data)
            
        except Exception as e:
            # Step 5: Gracefully handle errors
            analysis_result = (None, f"Pipeline Error (Dataset {i+1}): {type(e).__name__}: {str(e)}")
            
        analytics_results.append(analysis_result)
        
    # Step 6: Generate final report
    return reporter.generate_report(analytics_results)

# ----------------------------------------------------------------------
# Test Cell (Question 10)
def test_question_10():
    print("Testing integrated pipeline...")
    
    # Test 1: Individual component functionality (Sanity check)
    processor = DataProcessor()
    analytics = AnalyticsEngine()
    reporter = ReportGenerator()
    
    test_data = [{'id': 'test1', 'value': 10}, {'id': 'test2', 'value': 20}]
    processed = processor.process_data(test_data)
    
    assert isinstance(processed, dict), "DataProcessor should return dict"
    assert processed['total_items'] == 2, "Should count items correctly"
    
    json_data = json.dumps(processed)
    summary, metrics = analytics.analyze(json_data)
    
    assert summary is not None, "Analytics should return valid summary"
    assert isinstance(metrics, dict), "Analytics should return metrics dict"
    assert metrics['avg_value'] == 30.0, "Average processed value should be (20+40)/2 = 30"
    
    report = reporter.generate_report([(summary, metrics)])
    assert "ANALYSIS REPORT" in report, "Report should contain header"
    
    # Test 2: Data validation and cleaning
    cleaned_data = validate_and_clean_raw_data([
        {'id': 'valid', 'value': 10},
        {'value': 20},  # Missing id
        {'id': 'invalid'},  # Missing value (will be converted to failed item by DataProcessor)
        'invalid_format'  # Wrong format (will be converted to {} by clean func)
    ])
    
    assert isinstance(cleaned_data, list), "Should return list"
    assert len(cleaned_data) == 4, "Should keep all items but clean non-dicts"
    assert cleaned_data[3] == {}, "Invalid item format should be cleaned to {}"
    
    # Test 3: Integration adapters
    test_dict = {'processed_items': [{'processed_value': 10}]}
    json_str = dict_to_json_adapter(test_dict)
    
    assert isinstance(json_str, str), "Should return JSON string"
    parsed = json.loads(json_str)
    assert parsed == test_dict, "Should preserve data structure"
    
    # Test 4: Full pipeline integration
    sample_datasets = [
        [{'id': 'A1', 'value': 10}, {'id': 'A2', 'value': 20}], # Avg=30
        [{'id': 'B1', 'value': 5}], # Avg=10
        []  # Empty dataset (Analysis failed)
    ]
    
    final_report = integrated_pipeline(sample_datasets)
    
    assert isinstance(final_report, str), "Pipeline should return string report"
    assert "ANALYSIS REPORT" in final_report, "Should contain report header"
    assert "DATASET 1" in final_report, "Should have first section"
    assert "DATASET 3" in final_report, "Should have third section"
    assert "Avg_value: 30.00" in final_report, "Dataset 1 average check"
    assert "Analysis failed" in final_report, "Dataset 3 (empty) should fail analysis"
    
    # Test 5: Error handling (using the malformed data from Test 2)
    malformed_report = integrated_pipeline([
        [{'id': 'valid', 'value': 10}, 'invalid_format']
    ])
    
    # DataProcessor will receive [{'id': 'valid', 'value': 10}, {}]
    # DataProcessor output: 1 processed, 1 failed.
    # Analytics should pass.
    assert "Analyzed 2 items (1 successful, 1 failed)" in malformed_report, "Should handle malformed data via cleaning"
    
    # Test 6: Edge cases
    edge_cases = [
        [{'id': 'only_id'}],     # Missing value -> failed item. No valid values for analytics.
    ]
    
    edge_report = integrated_pipeline(edge_cases)
    assert "No valid numeric data found for analysis" in edge_report, "Missing value should lead to analysis failure"
    
    print("✓ All Question 10 tests passed!")

# Run the test
test_question_10()

## Final Submission Instructions

### Before You Submit:

**Code Quality Checklist:**
- All test cells pass without errors
- Code follows Python best practices and conventions  
- Functions include appropriate documentation
- Error handling is implemented where required
- Edge cases are handled appropriately
- Code is clean, readable, and maintainable

**Save Your Work:**
- **Save all code outputs** - Run all cells and keep the output visible
- Save the notebook file (Ctrl+S / Cmd+S)
- Verify all your implementations are in the correct code cells
- Double-check that test cells show "tests passed!" messages

### Submission Format:
Submit your completed `firstname_lastname.ipynb` file with **all outputs preserved**. We want to see:
- Your code implementations
- Test results (passed/failed)
- Any debugging output or print statements
- Cell execution numbers


