# List Comprehension and Dictionary Comprehension

## Learning Objectives
By the end of this notebook, you will be able to:
- Create list comprehensions for data transformation and filtering
- Build dictionary comprehensions for key-value data processing
- Use nested comprehensions for complex data structures
- Apply comprehensions to real-world data engineering scenarios
- Understand when to use comprehensions vs traditional loops

## 1. Basic List Comprehensions

In [None]:
from typing import List, Dict, Any, Set

# Basic syntax: [expression for item in iterable]

# Traditional loop approach
numbers: List[int] = [1, 2, 3, 4, 5]
squares_traditional: List[int] = []
for num in numbers:
    squares_traditional.append(num ** 2)

# List comprehension approach
squares_comprehension: List[int] = [num ** 2 for num in numbers]

print(f"Original numbers: {numbers}")
print(f"Squares (traditional): {squares_traditional}")
print(f"Squares (comprehension): {squares_comprehension}")

# More examples of basic transformations
temperatures_celsius: List[float] = [0, 10, 20, 25, 30, 35]

# Convert to Fahrenheit
temperatures_fahrenheit: List[float] = [(temp * 9/5) + 32 for temp in temperatures_celsius]
print(f"\nCelsius: {temperatures_celsius}")
print(f"Fahrenheit: {temperatures_fahrenheit}")

# String transformations
names: List[str] = ["alice", "bob", "charlie", "diana"]
capitalized_names: List[str] = [name.capitalize() for name in names]
name_lengths: List[int] = [len(name) for name in names]

print(f"\nOriginal names: {names}")
print(f"Capitalized: {capitalized_names}")
print(f"Name lengths: {name_lengths}")

# Working with ranges
even_numbers: List[int] = [num for num in range(0, 21, 2)]
cubes: List[int] = [i ** 3 for i in range(1, 6)]

print(f"\nEven numbers (0-20): {even_numbers}")
print(f"Cubes (1-5): {cubes}")

## 2. List Comprehensions with Filtering

In [None]:
# Syntax: [expression for item in iterable if condition]

# Filter positive numbers and square them
mixed_numbers: List[int] = [-5, -2, 0, 3, 7, -1, 8, 12]
positive_squares: List[int] = [num ** 2 for num in mixed_numbers if num > 0]

print(f"Mixed numbers: {mixed_numbers}")
print(f"Positive squares: {positive_squares}")

# Filter and transform strings
words: List[str] = ["apple", "banana", "cherry", "date", "elderberry", "fig"]
long_words_upper: List[str] = [word.upper() for word in words if len(word) > 5]
short_words: List[str] = [word for word in words if len(word) <= 4]

print(f"\nAll words: {words}")
print(f"Long words (uppercase): {long_words_upper}")
print(f"Short words: {short_words}")

# Working with data records
sales_data: List[Dict[str, Any]] = [
    {"product": "Laptop", "price": 999.99, "quantity": 2},
    {"product": "Mouse", "price": 29.99, "quantity": 10},
    {"product": "Keyboard", "price": 79.99, "quantity": 5},
    {"product": "Monitor", "price": 299.99, "quantity": 3},
    {"product": "Webcam", "price": 89.99, "quantity": 8}
]

# Extract product names for high-value items (price > 100)
expensive_products: List[str] = [item["product"] for item in sales_data if item["price"] > 100]

# Calculate total value for items with quantity > 5
high_quantity_totals: List[float] = [
    item["price"] * item["quantity"] 
    for item in sales_data 
    if item["quantity"] > 5
]

print(f"\nExpensive products: {expensive_products}")
print(f"High quantity totals: {high_quantity_totals}")

# Multiple conditions
mid_range_products: List[str] = [
    item["product"] 
    for item in sales_data 
    if 50 <= item["price"] <= 200 and item["quantity"] >= 3
]

print(f"Mid-range products (price 50-200, qty >= 3): {mid_range_products}")

## 3. Dictionary Comprehensions

In [None]:
# Basic syntax: {key_expression: value_expression for item in iterable}

# Create a dictionary of squares
numbers: List[int] = [1, 2, 3, 4, 5]
squares_dict: Dict[int, int] = {num: num ** 2 for num in numbers}

print(f"Numbers to squares: {squares_dict}")

# Create dictionary from two lists
products: List[str] = ["Laptop", "Mouse", "Keyboard", "Monitor"]
prices: List[float] = [999.99, 29.99, 79.99, 299.99]

product_prices: Dict[str, float] = {product: price for product, price in zip(products, prices)}
print(f"\nProduct prices: {product_prices}")

# Transform existing dictionary
original_scores: Dict[str, int] = {"Alice": 85, "Bob": 92, "Charlie": 78, "Diana": 96}

# Convert scores to letter grades
def get_letter_grade(score: int) -> str:
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    elif score >= 70:
        return "C"
    else:
        return "F"

letter_grades: Dict[str, str] = {name: get_letter_grade(score) for name, score in original_scores.items()}
print(f"\nOriginal scores: {original_scores}")
print(f"Letter grades: {letter_grades}")

# Dictionary comprehension with filtering
high_performers: Dict[str, int] = {name: score for name, score in original_scores.items() if score >= 90}
print(f"High performers (>= 90): {high_performers}")

# Invert dictionary (swap keys and values)
score_to_name: Dict[int, str] = {score: name for name, score in original_scores.items()}
print(f"Score to name mapping: {score_to_name}")

# Working with string data
employee_data: List[str] = ["Alice:Engineering:75000", "Bob:Marketing:65000", "Charlie:Sales:70000"]

# Parse and create dictionary
employee_salaries: Dict[str, int] = {
    parts[0]: int(parts[2]) 
    for line in employee_data 
    for parts in [line.split(":")]
}

print(f"\nEmployee salaries: {employee_salaries}")

## 4. Nested Comprehensions

In [None]:
# Nested list comprehensions for 2D data

# Create a multiplication table
multiplication_table: List[List[int]] = [[i * j for j in range(1, 6)] for i in range(1, 6)]

print("Multiplication table (1-5):")
for row in multiplication_table:
    print(row)

# Flatten a 2D list
matrix: List[List[int]] = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flattened: List[int] = [num for row in matrix for num in row]

print(f"\nOriginal matrix: {matrix}")
print(f"Flattened: {flattened}")

# Process nested data structures
departments: List[Dict[str, Any]] = [
    {
        "name": "Engineering",
        "employees": [
            {"name": "Alice", "salary": 75000},
            {"name": "Bob", "salary": 80000},
            {"name": "Charlie", "salary": 85000}
        ]
    },
    {
        "name": "Marketing",
        "employees": [
            {"name": "Diana", "salary": 65000},
            {"name": "Eve", "salary": 70000}
        ]
    }
]

# Extract all employee names across departments
all_employee_names: List[str] = [
    employee["name"] 
    for dept in departments 
    for employee in dept["employees"]
]

# Extract high-salary employees across departments
high_salary_employees: List[str] = [
    employee["name"] 
    for dept in departments 
    for employee in dept["employees"] 
    if employee["salary"] > 70000
]

print(f"\nAll employees: {all_employee_names}")
print(f"High salary employees (>70k): {high_salary_employees}")

# Create nested dictionary from data
dept_salary_summary: Dict[str, Dict[str, float]] = {
    dept["name"]: {
        "total_salary": sum(emp["salary"] for emp in dept["employees"]),
        "avg_salary": sum(emp["salary"] for emp in dept["employees"]) / len(dept["employees"]),
        "employee_count": len(dept["employees"])
    }
    for dept in departments
}

print(f"\nDepartment salary summary: {dept_salary_summary}")

## 5. Set Comprehensions

In [None]:
# Set comprehensions: {expression for item in iterable}
# Useful for getting unique values

# Get unique word lengths
words: List[str] = ["apple", "banana", "cherry", "date", "elderberry", "fig", "grape"]
unique_lengths: Set[int] = {len(word) for word in words}

print(f"Words: {words}")
print(f"Unique word lengths: {sorted(unique_lengths)}")

# Get unique categories from sales data
sales_records: List[Dict[str, Any]] = [
    {"product": "Laptop", "category": "Electronics", "region": "North"},
    {"product": "Mouse", "category": "Electronics", "region": "South"},
    {"product": "Desk", "category": "Furniture", "region": "North"},
    {"product": "Chair", "category": "Furniture", "region": "East"},
    {"product": "Monitor", "category": "Electronics", "region": "West"},
    {"product": "Table", "category": "Furniture", "region": "South"}
]

unique_categories: Set[str] = {record["category"] for record in sales_records}
unique_regions: Set[str] = {record["region"] for record in sales_records}

print(f"\nUnique categories: {sorted(unique_categories)}")
print(f"Unique regions: {sorted(unique_regions)}")

# Get unique first letters of product names
first_letters: Set[str] = {record["product"][0].lower() for record in sales_records}
print(f"Unique first letters: {sorted(first_letters)}")

# Filter and get unique values
electronics_regions: Set[str] = {
    record["region"] 
    for record in sales_records 
    if record["category"] == "Electronics"
}

print(f"Regions selling electronics: {sorted(electronics_regions)}")

## 6. Practical Data Engineering Examples

In [None]:
# Example 1: Processing log data
log_entries: List[str] = [
    "2024-01-15 10:30:00 INFO User login: user123",
    "2024-01-15 10:31:00 ERROR Database connection failed",
    "2024-01-15 10:32:00 INFO User logout: user123",
    "2024-01-15 10:33:00 WARNING High memory usage detected",
    "2024-01-15 10:34:00 ERROR API timeout occurred",
    "2024-01-15 10:35:00 INFO User login: user456"
]

# Parse log entries into structured data
parsed_logs: List[Dict[str, str]] = [
    {
        "timestamp": parts[0] + " " + parts[1],
        "level": parts[2],
        "message": " ".join(parts[3:])
    }
    for entry in log_entries
    for parts in [entry.split(" ", 3)]
]

print("Parsed log entries:")
for log in parsed_logs[:2]:  # Show first 2
    print(f"  {log}")

# Extract error messages only
error_messages: List[str] = [
    log["message"] 
    for log in parsed_logs 
    if log["level"] == "ERROR"
]

print(f"\nError messages: {error_messages}")

# Count log levels
from collections import Counter
level_counts: Dict[str, int] = dict(Counter(log["level"] for log in parsed_logs))
print(f"Log level counts: {level_counts}")

# Example 2: Data validation and cleaning
raw_customer_data: List[str] = [
    "John Doe,30,john@email.com,New York",
    "Jane Smith,25,jane@email.com,",  # Missing city
    "Bob Johnson,,bob@email.com,Chicago",  # Missing age
    "Alice Brown,35,alice@email.com,Boston",
    "Charlie Wilson,28,invalid-email,Seattle",  # Invalid email
    "Diana Davis,32,diana@email.com,Miami"
]

# Parse and validate customer data
def is_valid_email(email: str) -> bool:
    return "@" in email and "." in email

valid_customers: List[Dict[str, str]] = [
    {
        "name": parts[0],
        "age": parts[1],
        "email": parts[2],
        "city": parts[3]
    }
    for line in raw_customer_data
    for parts in [line.split(",")]
    if len(parts) == 4 and all(parts) and is_valid_email(parts[2])
]

print(f"\nValid customers: {len(valid_customers)} out of {len(raw_customer_data)}")
for customer in valid_customers:
    print(f"  {customer['name']} ({customer['age']}) - {customer['city']}")

# Example 3: Data aggregation
transaction_data: List[Dict[str, Any]] = [
    {"customer_id": "C001", "product": "Laptop", "amount": 999.99, "date": "2024-01-15"},
    {"customer_id": "C002", "product": "Mouse", "amount": 29.99, "date": "2024-01-15"},
    {"customer_id": "C001", "product": "Keyboard", "amount": 79.99, "date": "2024-01-16"},
    {"customer_id": "C003", "product": "Monitor", "amount": 299.99, "date": "2024-01-16"},
    {"customer_id": "C002", "product": "Webcam", "amount": 89.99, "date": "2024-01-17"}
]

# Calculate total spending per customer
customer_totals: Dict[str, float] = {}
for customer_id in {t["customer_id"] for t in transaction_data}:
    customer_totals[customer_id] = sum(
        t["amount"] for t in transaction_data if t["customer_id"] == customer_id
    )

print(f"\nCustomer spending totals: {customer_totals}")

# Get high-value transactions (> $100)
high_value_transactions: List[Dict[str, Any]] = [
    {"customer": t["customer_id"], "product": t["product"], "amount": t["amount"]}
    for t in transaction_data
    if t["amount"] > 100
]

print(f"High-value transactions: {high_value_transactions}")

# Group transactions by date
transactions_by_date: Dict[str, List[str]] = {
    date: [t["product"] for t in transaction_data if t["date"] == date]
    for date in {t["date"] for t in transaction_data}
}

print(f"\nTransactions by date: {transactions_by_date}")

## 7. Performance Considerations

In [None]:
import time

# Compare performance: comprehension vs traditional loop
large_numbers: List[int] = list(range(100000))

# Traditional loop approach
start_time = time.time()
squares_loop: List[int] = []
for num in large_numbers:
    squares_loop.append(num ** 2)
loop_time = time.time() - start_time

# List comprehension approach
start_time = time.time()
squares_comp: List[int] = [num ** 2 for num in large_numbers]
comp_time = time.time() - start_time

print(f"Traditional loop time: {loop_time:.4f} seconds")
print(f"List comprehension time: {comp_time:.4f} seconds")
print(f"Comprehension is {loop_time/comp_time:.2f}x faster")

# Memory efficiency example
# Generator expression (memory efficient for large datasets)
squares_generator = (num ** 2 for num in range(1000000))  # Note: parentheses instead of brackets

# This doesn't create the entire list in memory at once
first_10_squares = [next(squares_generator) for _ in range(10)]
print(f"\nFirst 10 squares from generator: {first_10_squares}")

# When to use comprehensions vs loops
print("\nWhen to use comprehensions:")
print("✓ Simple transformations and filtering")
print("✓ Creating new data structures from existing ones")
print("✓ When readability is maintained")
print("\nWhen to use traditional loops:")
print("✓ Complex logic that would make comprehensions hard to read")
print("✓ When you need to perform multiple operations")
print("✓ When you need error handling within the loop")

## Practice Exercises

Complete the following exercises to reinforce your understanding:

In [None]:
# Exercise 1: Data transformation with filtering
sensor_readings: List[str] = [
    "23.5", "24.1", "invalid", "22.8", "25.0", "error", "23.9", "24.3", "22.5"
]

# TODO: Create a list comprehension that:
# 1. Filters out invalid readings (non-numeric strings)
# 2. Converts valid readings to float
# 3. Only includes readings between 22.0 and 25.0

# Hint: Use str.replace('.', '').isdigit() to check if a string is numeric
# valid_readings = [your comprehension here]
# print(f"Valid readings: {valid_readings}")
# Expected: [23.5, 24.1, 22.8, 25.0, 23.9, 24.3, 22.5]

In [None]:
# Exercise 2: Dictionary comprehension for data grouping
employee_records: List[Dict[str, Any]] = [
    {"name": "Alice", "department": "Engineering", "salary": 75000},
    {"name": "Bob", "department": "Marketing", "salary": 65000},
    {"name": "Charlie", "department": "Engineering", "salary": 80000},
    {"name": "Diana", "department": "Sales", "salary": 70000},
    {"name": "Eve", "department": "Marketing", "salary": 68000}
]

# TODO: Create a dictionary comprehension that maps department names to average salaries
# Hint: You'll need to get unique departments first, then calculate averages

# Step 1: Get unique departments
# departments = {record["department"] for record in employee_records}

# Step 2: Create dictionary with average salaries
# dept_avg_salaries = {your comprehension here}
# print(f"Department average salaries: {dept_avg_salaries}")
# Expected: {'Engineering': 77500.0, 'Marketing': 66500.0, 'Sales': 70000.0}

In [None]:
# Exercise 3: Nested comprehension for matrix operations
matrix_a: List[List[int]] = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

matrix_b: List[List[int]] = [
    [9, 8, 7],
    [6, 5, 4],
    [3, 2, 1]
]

# TODO: Create a nested list comprehension that adds corresponding elements
# from matrix_a and matrix_b

# matrix_sum = [your nested comprehension here]
# print(f"Matrix sum: {matrix_sum}")
# Expected: [[10, 10, 10], [10, 10, 10], [10, 10, 10]]

# TODO: Create a comprehension that transposes matrix_a
# (swap rows and columns)

# transposed = [your comprehension here]
# print(f"Transposed matrix: {transposed}")
# Expected: [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

In [None]:
# Exercise 4: Complex data processing
order_data: List[str] = [
    "ORD001,CUST001,Laptop,2,999.99,0.10",
    "ORD002,CUST002,Mouse,5,29.99,0.00",
    "ORD003,CUST001,Keyboard,1,79.99,0.05",
    "ORD004,CUST003,Monitor,1,299.99,0.15",
    "ORD005,CUST002,Webcam,2,89.99,0.00"
]

# Data format: order_id,customer_id,product,quantity,unit_price,discount_rate

# TODO: Create a list comprehension that:
# 1. Parses each order string
# 2. Calculates the total amount (quantity * unit_price * (1 - discount_rate))
# 3. Only includes orders with total > $100
# 4. Returns a dictionary with order_id, customer_id, and total

# high_value_orders = [your comprehension here]
# print(f"High value orders: {high_value_orders}")

# TODO: Create a dictionary comprehension that maps customer_id to total spending
# across all their orders

# customer_spending = {your comprehension here}
# print(f"Customer spending: {customer_spending}")

## Summary

In this notebook, you learned:

1. **List Comprehensions**: Concise syntax for creating lists with transformations and filtering
2. **Dictionary Comprehensions**: Efficient way to create dictionaries from iterables
3. **Set Comprehensions**: Getting unique values with transformations
4. **Nested Comprehensions**: Handling complex data structures and matrix operations
5. **Filtering**: Using conditions to select specific elements
6. **Performance**: Comprehensions are generally faster than traditional loops
7. **Real-world Applications**: Log processing, data validation, and aggregation

**Key Benefits of Comprehensions:**
- More concise and readable code
- Better performance than traditional loops
- Functional programming style
- Memory efficient (especially generator expressions)

**When to Use Comprehensions:**
- ✅ Simple transformations and filtering
- ✅ Creating new data structures from existing ones
- ✅ When the logic fits on one readable line

**When to Use Traditional Loops:**
- ❌ Complex logic that makes comprehensions hard to read
- ❌ Multiple operations or side effects needed
- ❌ Error handling required within the iteration

**Next Steps**: Practice the exercises above and move on to Zip and Unpacking to learn about combining and separating data structures.