# Module 2: Data Structures

## Topics Covered
1. Lists - Creation, Indexing, Slicing
2. List Methods and Operations
3. Tuples and When to Use Them
4. Dictionaries - Key-Value Pairs
5. Dictionary Methods and Nested Dictionaries
6. Sets and Set Operations
7. Choosing the Right Data Structure

## Learning Objectives

By the end of this module, you will be able to:
- Create and manipulate lists to store ordered collections of data
- Use list methods to add, remove, and transform data
- Understand when to use tuples for immutable sequences
- Work with dictionaries to store key-value pairs
- Use sets for unique collections and set operations
- Choose the appropriate data structure for different scenarios

---

---
# Section 1: Lists - Creation, Indexing, Slicing
---

## What is a List?

A **list** is an ordered, mutable collection of items. Think of it like a shopping list or a spreadsheet column – you can add items, remove them, and change their order.

Lists are defined using square brackets `[]` and can hold any type of data, including mixed types.

### Why This Matters in Data Science

Lists are fundamental for:
- Storing rows of data before loading into pandas
- Collecting results from loops and iterations
- Holding column names, file paths, or configuration values
- Building datasets programmatically

## Syntax

```python
# Creating a list
my_list = [item1, item2, item3]

# Empty list
empty_list = []

# Accessing items by index
first_item = my_list[0]      # First item (index 0)
last_item = my_list[-1]      # Last item

# Slicing
subset = my_list[start:end]  # Items from start to end-1
```

In [None]:
# Example: Creating lists

# List of numbers (like a column of sales data)
monthly_sales = [12500, 15000, 18200, 14800, 22000, 19500]
print(f"Monthly sales: {monthly_sales}")

# List of strings (like product names)
products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones"]
print(f"Products: {products}")

# Mixed types (less common, but possible)
customer_record = ["John Smith", 35, "New York", True, 75000.50]
print(f"Customer record: {customer_record}")

# Empty list (useful for collecting data)
results = []
print(f"Empty results list: {results}")

In [None]:
# Example: List indexing

products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones"]

# Positive indexing (from the start)
#             0         1          2          3           4
print(f"First product (index 0): {products[0]}")
print(f"Third product (index 2): {products[2]}")

# Negative indexing (from the end)
#            -5        -4         -3         -2          -1
print(f"Last product (index -1): {products[-1]}")
print(f"Second to last (index -2): {products[-2]}")

In [None]:
# Example: List slicing

monthly_sales = [12500, 15000, 18200, 14800, 22000, 19500]
#   Index:          0       1       2       3       4       5
#   Month:         Jan     Feb     Mar     Apr     May     Jun

# Basic slicing [start:end] - end is exclusive
q1_sales = monthly_sales[0:3]  # Jan, Feb, Mar
print(f"Q1 Sales: {q1_sales}")

q2_sales = monthly_sales[3:6]  # Apr, May, Jun
print(f"Q2 Sales: {q2_sales}")

# Omitting start or end
first_three = monthly_sales[:3]   # From beginning to index 3
last_three = monthly_sales[3:]    # From index 3 to end
print(f"First three: {first_three}")
print(f"Last three: {last_three}")

# Negative slicing
last_two = monthly_sales[-2:]     # Last 2 items
print(f"Last two months: {last_two}")

In [None]:
# Example: Slicing with step

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# [start:end:step]
every_second = numbers[::2]    # Every 2nd item
print(f"Every second: {every_second}")

every_third = numbers[::3]     # Every 3rd item
print(f"Every third: {every_third}")

# Reverse a list
reversed_list = numbers[::-1]
print(f"Reversed: {reversed_list}")

In [None]:
# Example: Modifying list items

prices = [29.99, 49.99, 19.99, 99.99]
print(f"Original prices: {prices}")

# Change a single item
prices[1] = 44.99  # Update second item
print(f"After updating index 1: {prices}")

# Change multiple items with slicing
prices[0:2] = [24.99, 39.99]  # Update first two items
print(f"After updating slice: {prices}")

## List Length and Membership

In [None]:
# Example: len() and membership testing

products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones"]

# Get the number of items
num_products = len(products)
print(f"Number of products: {num_products}")

# Check if item exists in list
print(f"Is 'Mouse' in products? {'Mouse' in products}")
print(f"Is 'Tablet' in products? {'Tablet' in products}")

# Check if item is NOT in list
print(f"Is 'Tablet' NOT in products? {'Tablet' not in products}")

## Practice Exercise 1.1

**Task:** Create a list of daily temperatures for a week (7 values). Then:
1. Print the temperature for Wednesday (index 2)
2. Print the weekend temperatures (last 2 days)
3. Print the weekday temperatures (first 5 days)
4. Check if 75 degrees appears in your data

**Expected Output:** (values will vary based on your data)
```
Wednesday temp: 72
Weekend temps: [68, 70]
Weekday temps: [70, 73, 72, 75, 71]
75 in temperatures: True
```

In [None]:
# Your code here


In [None]:
# Solution 1.1

# Daily temperatures (Mon-Sun)
temperatures = [70, 73, 72, 75, 71, 68, 70]

# Wednesday temperature (index 2)
print(f"Wednesday temp: {temperatures[2]}")

# Weekend temperatures (Saturday and Sunday)
print(f"Weekend temps: {temperatures[-2:]}")

# Weekday temperatures (Monday through Friday)
print(f"Weekday temps: {temperatures[:5]}")

# Check if 75 appears
print(f"75 in temperatures: {75 in temperatures}")

## Practice Exercise 1.2

**Task:** You have quarterly revenue data for 2 years:
```python
revenue = [45000, 52000, 48000, 61000, 55000, 63000, 58000, 72000]
```
Each value represents a quarter (Q1 2023, Q2 2023, ..., Q4 2024).

Extract and print:
1. Year 1 (2023) revenue - first 4 values
2. Year 2 (2024) revenue - last 4 values
3. All Q4 values (index 3 and 7)
4. Total number of quarters

**Expected Output:**
```
Year 1 Revenue: [45000, 52000, 48000, 61000]
Year 2 Revenue: [55000, 63000, 58000, 72000]
Q4 Values: 61000, 72000
Total Quarters: 8
```

In [None]:
# Your code here


In [None]:
# Solution 1.2

revenue = [45000, 52000, 48000, 61000, 55000, 63000, 58000, 72000]

# Year 1 (first 4 quarters)
year1_revenue = revenue[:4]
print(f"Year 1 Revenue: {year1_revenue}")

# Year 2 (last 4 quarters)
year2_revenue = revenue[4:]
print(f"Year 2 Revenue: {year2_revenue}")

# Q4 values (indices 3 and 7)
q4_2023 = revenue[3]
q4_2024 = revenue[7]
print(f"Q4 Values: {q4_2023}, {q4_2024}")

# Total quarters
print(f"Total Quarters: {len(revenue)}")

---
# Section 2: List Methods and Operations
---

## Adding Items to Lists

Python provides several methods to add items to a list:

| Method | Description | Example |
|--------|-------------|--------|
| `append()` | Add item to end | `list.append(item)` |
| `insert()` | Add item at position | `list.insert(index, item)` |
| `extend()` | Add multiple items | `list.extend([items])` |
| `+` | Concatenate lists | `list1 + list2` |

In [None]:
# Example: Adding items to lists

# Start with a list of customers
customers = ["Alice", "Bob", "Charlie"]
print(f"Original: {customers}")

# append() - add to the end
customers.append("Diana")
print(f"After append: {customers}")

# insert() - add at specific position
customers.insert(1, "Eve")  # Insert at index 1
print(f"After insert at 1: {customers}")

# extend() - add multiple items
new_customers = ["Frank", "Grace"]
customers.extend(new_customers)
print(f"After extend: {customers}")

In [None]:
# Example: Concatenating lists with +

q1_sales = [15000, 18000, 22000]
q2_sales = [19000, 21000, 25000]

# Combine into half-year data
h1_sales = q1_sales + q2_sales
print(f"H1 Sales: {h1_sales}")

# Note: Original lists are unchanged
print(f"Q1 still: {q1_sales}")

## Removing Items from Lists

| Method | Description | Example |
|--------|-------------|--------|
| `remove()` | Remove by value | `list.remove(value)` |
| `pop()` | Remove by index (returns item) | `list.pop(index)` |
| `del` | Delete by index/slice | `del list[index]` |
| `clear()` | Remove all items | `list.clear()` |

In [None]:
# Example: Removing items from lists

products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Mouse", "Headphones"]
print(f"Original: {products}")

# remove() - removes first occurrence of value
products.remove("Mouse")  # Only removes the first "Mouse"
print(f"After remove('Mouse'): {products}")

# pop() - removes and returns item at index
removed_item = products.pop(2)  # Remove item at index 2
print(f"Popped item: {removed_item}")
print(f"After pop(2): {products}")

# pop() without index removes last item
last_item = products.pop()
print(f"Popped last: {last_item}")
print(f"After pop(): {products}")

In [None]:
# Example: Using del and clear

numbers = [10, 20, 30, 40, 50, 60, 70]
print(f"Original: {numbers}")

# del - delete by index
del numbers[0]  # Delete first item
print(f"After del [0]: {numbers}")

# del - delete a slice
del numbers[1:3]  # Delete indices 1 and 2
print(f"After del [1:3]: {numbers}")

# clear() - remove all items
numbers.clear()
print(f"After clear(): {numbers}")

## Sorting and Reversing

In [None]:
# Example: Sorting lists

prices = [99.99, 29.99, 149.99, 49.99, 79.99]
print(f"Original: {prices}")

# sort() - modifies the list in place
prices.sort()
print(f"Sorted ascending: {prices}")

prices.sort(reverse=True)
print(f"Sorted descending: {prices}")

# sorted() - returns a new sorted list (original unchanged)
names = ["Charlie", "Alice", "Bob", "Diana"]
sorted_names = sorted(names)
print(f"Original names: {names}")
print(f"Sorted names: {sorted_names}")

In [None]:
# Example: Reversing lists

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
print(f"Original: {months}")

# reverse() - modifies in place
months.reverse()
print(f"After reverse(): {months}")

# reversed() - returns iterator (use list() to convert)
numbers = [1, 2, 3, 4, 5]
reversed_numbers = list(reversed(numbers))
print(f"Original numbers: {numbers}")
print(f"Reversed copy: {reversed_numbers}")

## Finding Items and Counting

In [None]:
# Example: Finding and counting

sales_reps = ["Alice", "Bob", "Alice", "Charlie", "Alice", "Bob"]

# index() - find position of first occurrence
bob_index = sales_reps.index("Bob")
print(f"First 'Bob' at index: {bob_index}")

# count() - count occurrences
alice_count = sales_reps.count("Alice")
bob_count = sales_reps.count("Bob")
print(f"Alice appears: {alice_count} times")
print(f"Bob appears: {bob_count} times")

## Useful Built-in Functions for Lists

In [None]:
# Example: Built-in functions for numeric lists

sales = [45000, 52000, 48000, 61000, 55000, 63000]

print(f"Sales data: {sales}")
print(f"Number of entries: {len(sales)}")
print(f"Total: ${sum(sales):,}")
print(f"Minimum: ${min(sales):,}")
print(f"Maximum: ${max(sales):,}")
print(f"Average: ${sum(sales) / len(sales):,.2f}")

## Practice Exercise 2.1

**Task:** You're managing a to-do list for data projects:
1. Start with: `["Clean data", "Build model", "Create report"]`
2. Add "Gather requirements" at the beginning
3. Add "Present findings" at the end
4. Remove "Build model" (it's been reassigned)
5. Print the final list and its length

**Expected Output:**
```
Final to-do list: ['Gather requirements', 'Clean data', 'Create report', 'Present findings']
Number of tasks: 4
```

In [None]:
# Your code here


In [None]:
# Solution 2.1

# Start with initial list
todo_list = ["Clean data", "Build model", "Create report"]

# Add "Gather requirements" at the beginning
todo_list.insert(0, "Gather requirements")

# Add "Present findings" at the end
todo_list.append("Present findings")

# Remove "Build model"
todo_list.remove("Build model")

# Print results
print(f"Final to-do list: {todo_list}")
print(f"Number of tasks: {len(todo_list)}")

## Practice Exercise 2.2

**Task:** Analyze this list of daily website visitors:
```python
visitors = [1250, 1480, 1320, 1890, 1650, 1420, 1780]
```

Calculate and print:
1. Total visitors for the week
2. Average daily visitors
3. Day with most visitors (value)
4. Day with fewest visitors (value)
5. The visitors list sorted from highest to lowest

**Expected Output:**
```
Total visitors: 10,790
Average daily: 1,541.43
Best day: 1,890 visitors
Worst day: 1,250 visitors
Sorted (high to low): [1890, 1780, 1650, 1480, 1420, 1320, 1250]
```

In [None]:
# Your code here


In [None]:
# Solution 2.2

visitors = [1250, 1480, 1320, 1890, 1650, 1420, 1780]

# Calculate metrics
total = sum(visitors)
average = total / len(visitors)
best_day = max(visitors)
worst_day = min(visitors)

# Sort (use sorted() to not modify original)
sorted_visitors = sorted(visitors, reverse=True)

# Print results
print(f"Total visitors: {total:,}")
print(f"Average daily: {average:,.2f}")
print(f"Best day: {best_day:,} visitors")
print(f"Worst day: {worst_day:,} visitors")
print(f"Sorted (high to low): {sorted_visitors}")

## Practice Exercise 2.3

**Task:** You're tracking which products a customer has viewed:
```python
viewed = ["Laptop", "Mouse", "Keyboard", "Laptop", "Monitor", "Mouse", "Laptop"]
```

Find:
1. How many total product views?
2. How many times was "Laptop" viewed?
3. What index is "Monitor" at?
4. Was "Headphones" viewed? (True/False)

**Expected Output:**
```
Total views: 7
Laptop views: 3
Monitor first seen at index: 4
Headphones viewed: False
```

In [None]:
# Your code here


In [None]:
# Solution 2.3

viewed = ["Laptop", "Mouse", "Keyboard", "Laptop", "Monitor", "Mouse", "Laptop"]

# Total views
print(f"Total views: {len(viewed)}")

# Laptop view count
print(f"Laptop views: {viewed.count('Laptop')}")

# Monitor index
print(f"Monitor first seen at index: {viewed.index('Monitor')}")

# Check for Headphones
print(f"Headphones viewed: {'Headphones' in viewed}")

---
# Section 3: Tuples and When to Use Them
---

## What is a Tuple?

A **tuple** is like a list, but **immutable** – once created, you cannot change its contents. Tuples are defined using parentheses `()` instead of square brackets.

Think of tuples as "read-only" lists – perfect for data that shouldn't change.

### Why This Matters in Data Science

Tuples are useful for:
- Storing fixed records (like database rows)
- Returning multiple values from functions
- Dictionary keys (lists can't be keys)
- Geographic coordinates (latitude, longitude)
- RGB color values

## Syntax

```python
# Creating a tuple
my_tuple = (item1, item2, item3)

# Single item tuple (note the comma!)
single_tuple = (item,)

# Tuple without parentheses (tuple packing)
my_tuple = item1, item2, item3

# Accessing items (same as lists)
first = my_tuple[0]
```

In [None]:
# Example: Creating tuples

# Tuple of coordinates
location = (40.7128, -74.0060)  # NYC lat, long
print(f"Location: {location}")
print(f"Latitude: {location[0]}")
print(f"Longitude: {location[1]}")

# Tuple for a data record
employee = ("John Smith", "Engineering", 75000, True)
print(f"\nEmployee record: {employee}")

# Single-item tuple (comma required!)
single = (42,)
not_a_tuple = (42)  # This is just an integer!
print(f"\nSingle tuple: {single}, type: {type(single)}")
print(f"Not a tuple: {not_a_tuple}, type: {type(not_a_tuple)}")

In [None]:
# Example: Tuple immutability

coordinates = (10, 20, 30)
print(f"Coordinates: {coordinates}")

# This would cause an error:
# coordinates[0] = 15  # TypeError: 'tuple' object does not support item assignment

# You can access items, just can't change them
x = coordinates[0]
print(f"X coordinate: {x}")

In [None]:
# Example: Tuple unpacking

# Unpacking is very useful for working with structured data
employee = ("Alice", "Data Science", 85000)

# Unpack into separate variables
name, department, salary = employee
print(f"Name: {name}")
print(f"Department: {department}")
print(f"Salary: ${salary:,}")

# Swap variables easily
a = 10
b = 20
print(f"\nBefore swap: a={a}, b={b}")
a, b = b, a  # Tuple unpacking for swap
print(f"After swap: a={a}, b={b}")

In [None]:
# Example: Functions returning tuples

def calculate_stats(numbers):
    """Calculate and return multiple statistics."""
    total = sum(numbers)
    average = total / len(numbers)
    minimum = min(numbers)
    maximum = max(numbers)
    return total, average, minimum, maximum  # Returns a tuple

sales = [1500, 2200, 1800, 2500, 1900]

# Get all values at once
stats = calculate_stats(sales)
print(f"Stats tuple: {stats}")

# Or unpack directly
total, avg, min_val, max_val = calculate_stats(sales)
print(f"\nTotal: {total}")
print(f"Average: {avg}")
print(f"Min: {min_val}")
print(f"Max: {max_val}")

In [None]:
# Example: Lists vs Tuples - when to use each

# Use LISTS when data might change
shopping_cart = ["Laptop", "Mouse"]  # Items can be added/removed
shopping_cart.append("Keyboard")
print(f"Cart (list): {shopping_cart}")

# Use TUPLES for fixed data
rgb_red = (255, 0, 0)  # Color values shouldn't change
date_of_birth = (1990, 5, 15)  # Historical date won't change
db_connection = ("localhost", 5432, "mydb")  # Config that shouldn't change

print(f"Red color (tuple): {rgb_red}")
print(f"DOB (tuple): {date_of_birth}")

## Practice Exercise 3.1

**Task:** Create a tuple representing a data record for a sensor reading:
- Sensor ID: "TEMP-001"
- Timestamp: "2024-01-15 10:30:00"
- Temperature: 72.5
- Humidity: 45.2
- Is Valid: True

Then unpack it into individual variables and print each.

**Expected Output:**
```
Sensor: TEMP-001
Time: 2024-01-15 10:30:00
Temperature: 72.5°F
Humidity: 45.2%
Valid Reading: True
```

In [None]:
# Your code here


In [None]:
# Solution 3.1

# Create the sensor reading tuple
sensor_reading = ("TEMP-001", "2024-01-15 10:30:00", 72.5, 45.2, True)

# Unpack into variables
sensor_id, timestamp, temperature, humidity, is_valid = sensor_reading

# Print each value
print(f"Sensor: {sensor_id}")
print(f"Time: {timestamp}")
print(f"Temperature: {temperature}°F")
print(f"Humidity: {humidity}%")
print(f"Valid Reading: {is_valid}")

## Practice Exercise 3.2

**Task:** Create a function called `get_min_max` that takes a list of numbers and returns both the minimum and maximum as a tuple. Test it with sales data.

```python
sales = [4500, 6200, 3800, 7100, 5500]
```

**Expected Output:**
```
Sales range: (3800, 7100)
Minimum: $3,800
Maximum: $7,100
```

In [None]:
# Your code here


In [None]:
# Solution 3.2

def get_min_max(numbers):
    """Return the minimum and maximum values as a tuple."""
    return (min(numbers), max(numbers))

sales = [4500, 6200, 3800, 7100, 5500]

# Get the result
result = get_min_max(sales)
print(f"Sales range: {result}")

# Unpack for formatted output
min_sale, max_sale = get_min_max(sales)
print(f"Minimum: ${min_sale:,}")
print(f"Maximum: ${max_sale:,}")

---
# Section 4: Dictionaries - Key-Value Pairs
---

## What is a Dictionary?

A **dictionary** is a collection of key-value pairs. Instead of accessing items by position (index), you access them by their key (name).

Think of it like a real dictionary: you look up a word (key) to find its definition (value). Or like a contact list: you look up a name (key) to find a phone number (value).

### Why This Matters in Data Science

Dictionaries are essential for:
- Storing structured records (like JSON data)
- Counting occurrences of items
- Mapping categories to values
- Configuring parameters for functions
- Working with APIs (most return JSON = dictionaries)

## Syntax

```python
# Creating a dictionary
my_dict = {
    "key1": value1,
    "key2": value2,
    "key3": value3
}

# Empty dictionary
empty_dict = {}

# Accessing values
value = my_dict["key1"]
value = my_dict.get("key1")  # Safer - returns None if key missing

# Adding/updating
my_dict["new_key"] = new_value
```

In [None]:
# Example: Creating dictionaries

# Customer record
customer = {
    "name": "Alice Johnson",
    "email": "alice@example.com",
    "age": 32,
    "is_premium": True,
    "balance": 1250.50
}

print("Customer record:")
print(customer)

In [None]:
# Example: Accessing dictionary values

customer = {
    "name": "Alice Johnson",
    "email": "alice@example.com",
    "age": 32,
    "is_premium": True
}

# Using bracket notation
print(f"Name: {customer['name']}")
print(f"Email: {customer['email']}")

# Using .get() - safer for possibly missing keys
phone = customer.get("phone")  # Returns None if missing
print(f"Phone: {phone}")

# .get() with default value
phone = customer.get("phone", "Not provided")
print(f"Phone: {phone}")

In [None]:
# Example: Adding and updating values

product = {
    "name": "Laptop",
    "price": 999.99
}
print(f"Original: {product}")

# Add new key-value pair
product["category"] = "Electronics"
product["in_stock"] = True
print(f"After adding: {product}")

# Update existing value
product["price"] = 899.99  # Price drop!
print(f"After update: {product}")

In [None]:
# Example: Removing items from dictionaries

inventory = {
    "laptops": 50,
    "mice": 150,
    "keyboards": 75,
    "monitors": 30
}
print(f"Original: {inventory}")

# pop() - remove and return value
mice_count = inventory.pop("mice")
print(f"Removed mice: {mice_count}")
print(f"After pop: {inventory}")

# del - delete key-value pair
del inventory["monitors"]
print(f"After del: {inventory}")

In [None]:
# Example: Checking if key exists

user = {
    "username": "data_analyst",
    "email": "analyst@company.com"
}

# Check with 'in'
print(f"Has 'email': {'email' in user}")
print(f"Has 'phone': {'phone' in user}")

# Conditional access
if "email" in user:
    print(f"Email: {user['email']}")
else:
    print("No email on file")

## Practice Exercise 4.1

**Task:** Create a dictionary for a movie with:
- title: "The Matrix"
- year: 1999
- rating: 8.7
- genres: ["Sci-Fi", "Action"] (a list!)

Then:
1. Print the title and year
2. Add a "director" key with value "Wachowskis"
3. Update the rating to 9.0
4. Print the first genre

**Expected Output:**
```
Title: The Matrix (1999)
Director added: Wachowskis
Updated rating: 9.0
First genre: Sci-Fi
```

In [None]:
# Your code here


In [None]:
# Solution 4.1

# Create the movie dictionary
movie = {
    "title": "The Matrix",
    "year": 1999,
    "rating": 8.7,
    "genres": ["Sci-Fi", "Action"]
}

# Print title and year
print(f"Title: {movie['title']} ({movie['year']})")

# Add director
movie["director"] = "Wachowskis"
print(f"Director added: {movie['director']}")

# Update rating
movie["rating"] = 9.0
print(f"Updated rating: {movie['rating']}")

# Access first genre (list inside dict)
print(f"First genre: {movie['genres'][0]}")

## Practice Exercise 4.2

**Task:** Create a dictionary to track product inventory:
```python
inventory = {"apples": 50, "bananas": 30, "oranges": 45}
```

1. Print the count of bananas
2. Add "grapes" with count 25
3. Someone bought 10 apples - update the count
4. Check if "mangoes" is in inventory
5. Get the count of "mangoes" using .get() with default 0

**Expected Output:**
```
Bananas in stock: 30
Added grapes: 25
Apples after sale: 40
Mangoes in inventory: False
Mangoes count: 0
```

In [None]:
# Your code here


In [None]:
# Solution 4.2

inventory = {"apples": 50, "bananas": 30, "oranges": 45}

# Print bananas count
print(f"Bananas in stock: {inventory['bananas']}")

# Add grapes
inventory["grapes"] = 25
print(f"Added grapes: {inventory['grapes']}")

# Update apples (sold 10)
inventory["apples"] = inventory["apples"] - 10
print(f"Apples after sale: {inventory['apples']}")

# Check for mangoes
print(f"Mangoes in inventory: {'mangoes' in inventory}")

# Get mangoes count with default
print(f"Mangoes count: {inventory.get('mangoes', 0)}")

---
# Section 5: Dictionary Methods and Nested Dictionaries
---

## Essential Dictionary Methods

| Method | Description | Returns |
|--------|-------------|--------|
| `.keys()` | All keys | dict_keys object |
| `.values()` | All values | dict_values object |
| `.items()` | All key-value pairs | dict_items object |
| `.update()` | Merge another dict | None (modifies in place) |
| `.copy()` | Shallow copy | New dictionary |

In [None]:
# Example: Dictionary methods

sales = {
    "January": 15000,
    "February": 18000,
    "March": 22000
}

# Get all keys
print(f"Months: {list(sales.keys())}")

# Get all values
print(f"Sales values: {list(sales.values())}")

# Get key-value pairs
print(f"Items: {list(sales.items())}")

# Sum of all values
total_sales = sum(sales.values())
print(f"Total Q1 sales: ${total_sales:,}")

In [None]:
# Example: Looping through dictionaries

scores = {
    "Alice": 95,
    "Bob": 87,
    "Charlie": 92
}

# Loop through keys
print("Students:")
for name in scores:
    print(f"  {name}")

# Loop through key-value pairs
print("\nScores:")
for name, score in scores.items():
    print(f"  {name}: {score}")

In [None]:
# Example: Merging dictionaries with update()

user_defaults = {
    "theme": "light",
    "language": "en",
    "notifications": True
}

user_preferences = {
    "theme": "dark",
    "font_size": 14
}

# Start with defaults
settings = user_defaults.copy()  # Make a copy first!
print(f"Default settings: {settings}")

# Update with user preferences (overwrites existing keys)
settings.update(user_preferences)
print(f"Final settings: {settings}")

## Nested Dictionaries

Dictionaries can contain other dictionaries, creating hierarchical data structures. This is very common when working with JSON data from APIs.

In [None]:
# Example: Nested dictionaries

# Company data with departments
company = {
    "name": "TechCorp",
    "departments": {
        "engineering": {
            "head": "Alice Smith",
            "employees": 50,
            "budget": 2000000
        },
        "marketing": {
            "head": "Bob Johnson",
            "employees": 25,
            "budget": 1000000
        },
        "sales": {
            "head": "Carol Williams",
            "employees": 40,
            "budget": 1500000
        }
    }
}

print(f"Company: {company['name']}")

In [None]:
# Example: Accessing nested data

company = {
    "name": "TechCorp",
    "departments": {
        "engineering": {
            "head": "Alice Smith",
            "employees": 50,
            "budget": 2000000
        },
        "marketing": {
            "head": "Bob Johnson",
            "employees": 25,
            "budget": 1000000
        }
    }
}

# Access nested values by chaining keys
eng_head = company["departments"]["engineering"]["head"]
print(f"Engineering Head: {eng_head}")

eng_budget = company["departments"]["engineering"]["budget"]
print(f"Engineering Budget: ${eng_budget:,}")

# Calculate total employees across departments
total_employees = 0
for dept_name, dept_info in company["departments"].items():
    total_employees += dept_info["employees"]
    print(f"{dept_name.title()}: {dept_info['employees']} employees")

print(f"Total employees: {total_employees}")

In [None]:
# Example: Modifying nested dictionaries

# Student records
students = {
    "S001": {
        "name": "Alice",
        "grades": {"math": 95, "science": 88}
    },
    "S002": {
        "name": "Bob",
        "grades": {"math": 82, "science": 90}
    }
}

# Update Alice's math grade
students["S001"]["grades"]["math"] = 97
print(f"Alice's new math grade: {students['S001']['grades']['math']}")

# Add a new subject for Bob
students["S002"]["grades"]["english"] = 85
print(f"Bob's grades: {students['S002']['grades']}")

# Add a new student
students["S003"] = {
    "name": "Charlie",
    "grades": {"math": 91, "science": 89, "english": 94}
}
print(f"New student added: {students['S003']['name']}")

## Practice Exercise 5.1

**Task:** Given this sales dictionary:
```python
sales = {"Mon": 1500, "Tue": 2200, "Wed": 1800, "Thu": 2500, "Fri": 3000}
```

1. Print all the days (keys)
2. Calculate the total weekly sales
3. Find the average daily sales
4. Print each day with its sales amount

**Expected Output:**
```
Days: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
Total weekly sales: $11,000
Average daily sales: $2,200.00

Daily breakdown:
  Mon: $1,500
  Tue: $2,200
  Wed: $1,800
  Thu: $2,500
  Fri: $3,000
```

In [None]:
# Your code here


In [None]:
# Solution 5.1

sales = {"Mon": 1500, "Tue": 2200, "Wed": 1800, "Thu": 2500, "Fri": 3000}

# Print all days
print(f"Days: {list(sales.keys())}")

# Calculate total
total = sum(sales.values())
print(f"Total weekly sales: ${total:,}")

# Calculate average
average = total / len(sales)
print(f"Average daily sales: ${average:,.2f}")

# Print breakdown
print("\nDaily breakdown:")
for day, amount in sales.items():
    print(f"  {day}: ${amount:,}")

## Practice Exercise 5.2

**Task:** Work with this nested employee data:
```python
employees = {
    "E001": {"name": "Alice", "department": "Engineering", "salary": 85000},
    "E002": {"name": "Bob", "department": "Marketing", "salary": 72000},
    "E003": {"name": "Charlie", "department": "Engineering", "salary": 90000}
}
```

1. Print Alice's salary
2. Give Bob a 10% raise (update his salary)
3. Calculate and print the average salary
4. Print all Engineering department employees

**Expected Output:**
```
Alice's salary: $85,000
Bob's new salary: $79,200.00
Average salary: $84,733.33

Engineering employees:
  Alice
  Charlie
```

In [None]:
# Your code here


In [None]:
# Solution 5.2

employees = {
    "E001": {"name": "Alice", "department": "Engineering", "salary": 85000},
    "E002": {"name": "Bob", "department": "Marketing", "salary": 72000},
    "E003": {"name": "Charlie", "department": "Engineering", "salary": 90000}
}

# Alice's salary
print(f"Alice's salary: ${employees['E001']['salary']:,}")

# Bob's raise (10%)
employees["E002"]["salary"] = employees["E002"]["salary"] * 1.10
print(f"Bob's new salary: ${employees['E002']['salary']:,.2f}")

# Average salary
total_salary = 0
for emp_id, emp_info in employees.items():
    total_salary += emp_info["salary"]
average_salary = total_salary / len(employees)
print(f"Average salary: ${average_salary:,.2f}")

# Engineering employees
print("\nEngineering employees:")
for emp_id, emp_info in employees.items():
    if emp_info["department"] == "Engineering":
        print(f"  {emp_info['name']}")

---
# Section 6: Sets and Set Operations
---

## What is a Set?

A **set** is an unordered collection of **unique** items. Sets automatically remove duplicates and don't maintain any order.

Think of a set like a bag of unique items – you either have something or you don't, and you can't have duplicates.

### Why This Matters in Data Science

Sets are perfect for:
- Finding unique values in data
- Removing duplicates quickly
- Comparing datasets (finding common items, differences)
- Membership testing (very fast)

## Syntax

```python
# Creating a set
my_set = {item1, item2, item3}

# Empty set (note: {} creates empty dict!)
empty_set = set()

# Set from list (removes duplicates)
my_set = set([1, 2, 2, 3, 3, 3])
```

In [None]:
# Example: Creating sets

# Set of unique product categories
categories = {"Electronics", "Clothing", "Books", "Electronics"}  # Duplicate ignored
print(f"Categories: {categories}")
print(f"Number of unique categories: {len(categories)}")

# Create set from list to remove duplicates
customer_ids = [101, 102, 103, 101, 104, 102, 105]
unique_customers = set(customer_ids)
print(f"\nOriginal list: {customer_ids}")
print(f"Unique customers: {unique_customers}")
print(f"Unique count: {len(unique_customers)}")

In [None]:
# Example: Adding and removing items

skills = {"Python", "SQL", "Excel"}
print(f"Original skills: {skills}")

# Add single item
skills.add("Tableau")
print(f"After add: {skills}")

# Adding duplicate has no effect
skills.add("Python")
print(f"After adding 'Python' again: {skills}")

# Remove item (raises error if not found)
skills.remove("Excel")
print(f"After remove: {skills}")

# Discard - like remove but no error if missing
skills.discard("Java")  # No error even though Java isn't there
print(f"After discard 'Java': {skills}")

## Set Operations

Sets support mathematical operations like union, intersection, and difference.

In [None]:
# Example: Set operations - data analysis scenario

# Customers who bought in January vs February
jan_customers = {"Alice", "Bob", "Charlie", "Diana"}
feb_customers = {"Bob", "Diana", "Eve", "Frank"}

print(f"January customers: {jan_customers}")
print(f"February customers: {feb_customers}")

In [None]:
# Union - all unique customers from both months
all_customers = jan_customers | feb_customers  # or: jan_customers.union(feb_customers)
print(f"All customers (union): {all_customers}")

In [None]:
# Intersection - customers who bought in BOTH months
repeat_customers = jan_customers & feb_customers  # or: .intersection()
print(f"Repeat customers (intersection): {repeat_customers}")

In [None]:
# Difference - customers ONLY in January (didn't return in Feb)
jan_only = jan_customers - feb_customers  # or: .difference()
print(f"January only (churned): {jan_only}")

# New customers in February
feb_only = feb_customers - jan_customers
print(f"February only (new): {feb_only}")

In [None]:
# Symmetric difference - customers in one month but not both
one_month_only = jan_customers ^ feb_customers  # or: .symmetric_difference()
print(f"One month only: {one_month_only}")

In [None]:
# Example: Practical use - finding unique values

# Transaction categories with many duplicates
transactions = [
    "Food", "Transport", "Food", "Entertainment", "Food",
    "Bills", "Transport", "Food", "Shopping", "Bills"
]

# Get unique categories
unique_categories = set(transactions)
print(f"Transaction count: {len(transactions)}")
print(f"Unique categories: {unique_categories}")
print(f"Number of categories: {len(unique_categories)}")

## Practice Exercise 6.1

**Task:** You have two lists of email subscribers from different campaigns:
```python
campaign_a = ["alice@email.com", "bob@email.com", "charlie@email.com", "diana@email.com"]
campaign_b = ["charlie@email.com", "diana@email.com", "eve@email.com", "frank@email.com"]
```

Using sets, find:
1. All unique subscribers across both campaigns
2. Subscribers who are in BOTH campaigns
3. Subscribers ONLY in Campaign A
4. Subscribers ONLY in Campaign B

**Expected Output:**
```
Total unique subscribers: 6
In both campaigns: {'charlie@email.com', 'diana@email.com'}
Only in Campaign A: {'alice@email.com', 'bob@email.com'}
Only in Campaign B: {'eve@email.com', 'frank@email.com'}
```

In [None]:
# Your code here


In [None]:
# Solution 6.1

campaign_a = ["alice@email.com", "bob@email.com", "charlie@email.com", "diana@email.com"]
campaign_b = ["charlie@email.com", "diana@email.com", "eve@email.com", "frank@email.com"]

# Convert to sets
set_a = set(campaign_a)
set_b = set(campaign_b)

# All unique subscribers (union)
all_subscribers = set_a | set_b
print(f"Total unique subscribers: {len(all_subscribers)}")

# In both campaigns (intersection)
in_both = set_a & set_b
print(f"In both campaigns: {in_both}")

# Only in Campaign A (difference)
only_a = set_a - set_b
print(f"Only in Campaign A: {only_a}")

# Only in Campaign B (difference)
only_b = set_b - set_a
print(f"Only in Campaign B: {only_b}")

## Practice Exercise 6.2

**Task:** Remove duplicates from this list of product IDs and find how many duplicates there were:
```python
product_ids = ["P001", "P002", "P003", "P001", "P004", "P002", "P005", "P001", "P003"]
```

**Expected Output:**
```
Original count: 9
Unique products: 5
Duplicate entries: 4
Unique IDs: {'P001', 'P002', 'P003', 'P004', 'P005'}
```

In [None]:
# Your code here


In [None]:
# Solution 6.2

product_ids = ["P001", "P002", "P003", "P001", "P004", "P002", "P005", "P001", "P003"]

original_count = len(product_ids)
unique_ids = set(product_ids)
unique_count = len(unique_ids)
duplicate_count = original_count - unique_count

print(f"Original count: {original_count}")
print(f"Unique products: {unique_count}")
print(f"Duplicate entries: {duplicate_count}")
print(f"Unique IDs: {unique_ids}")

---
# Section 7: Choosing the Right Data Structure
---

## Quick Reference Guide

| Data Structure | Ordered | Mutable | Duplicates | Best For |
|----------------|---------|---------|------------|----------|
| **List** | Yes | Yes | Allowed | Ordered collections, sequences |
| **Tuple** | Yes | ❌ No | Allowed | Fixed records, function returns |
| **Dictionary** | Yes* | Yes | Keys: No, Values: Yes | Key-value mappings, records |
| **Set** | ❌ No | Yes | ❌ No | Unique items, membership testing |

*Dictionaries maintain insertion order since Python 3.7

## Decision Guide

**Use a LIST when:**
- Order matters
- You need to modify the collection
- Duplicates are allowed/expected
- Examples: time series data, rankings, sequences

**Use a TUPLE when:**
- Data should not change
- Returning multiple values from a function
- Using as a dictionary key
- Examples: coordinates, RGB values, database records

**Use a DICTIONARY when:**
- You need to look up values by key
- Data has labels/names
- Working with JSON/structured data
- Examples: configuration, user profiles, counters

**Use a SET when:**
- You need unique values only
- Order doesn't matter
- Checking membership frequently
- Examples: unique IDs, tags, comparing groups

In [None]:
# Example: Choosing the right structure for different scenarios

# Scenario 1: Daily stock prices (ordered, can change)
# Best choice: LIST
stock_prices = [142.50, 145.25, 143.75, 148.00, 146.50]
stock_prices.append(149.25)  # Add new day's price
print(f"Stock prices (list): {stock_prices}")

# Scenario 2: Geographic coordinates (fixed, shouldn't change)
# Best choice: TUPLE
nyc_coords = (40.7128, -74.0060)
# Can't accidentally modify latitude/longitude
print(f"NYC coordinates (tuple): {nyc_coords}")

# Scenario 3: User profile with named fields
# Best choice: DICTIONARY
user = {
    "username": "data_ninja",
    "email": "ninja@data.com",
    "premium": True
}
print(f"User profile (dict): {user}")

# Scenario 4: Unique tags on a blog post
# Best choice: SET
tags = {"python", "data-science", "tutorial", "python"}  # Duplicate ignored
print(f"Blog tags (set): {tags}")

In [None]:
# Example: Combining data structures

# List of dictionaries (very common for tabular data)
employees = [
    {"id": 1, "name": "Alice", "department": "Engineering"},
    {"id": 2, "name": "Bob", "department": "Marketing"},
    {"id": 3, "name": "Charlie", "department": "Engineering"}
]

# Dictionary with list values
sales_by_region = {
    "North": [15000, 18000, 22000],
    "South": [12000, 14000, 16000],
    "East": [20000, 25000, 23000]
}

# Print Engineering employees
print("Engineering team:")
for emp in employees:
    if emp["department"] == "Engineering":
        print(f"  {emp['name']}")

# Calculate total sales by region
print("\nTotal sales by region:")
for region, sales in sales_by_region.items():
    print(f"  {region}: ${sum(sales):,}")

## Practice Exercise 7.1

**Task:** For each scenario below, choose the best data structure and explain why:

1. Storing the names of students in a class (need to add/remove students)
2. A phone book mapping names to phone numbers
3. The RGB values for a specific color
4. A list of unique product categories in a store
5. Monthly revenue figures for the past year

In [None]:
# Write your answers as comments and create example data structures


In [None]:
# Solution 7.1

# 1. Student names - LIST (ordered, mutable, may have duplicates with same name)
students = ["Alice", "Bob", "Charlie"]
students.append("Diana")  # Can add new students

# 2. Phone book - DICTIONARY (key-value pairs, lookup by name)
phone_book = {
    "Alice": "555-1234",
    "Bob": "555-5678"
}

# 3. RGB color - TUPLE (fixed values that shouldn't change)
ocean_blue = (0, 105, 148)

# 4. Unique categories - SET (uniqueness required, order doesn't matter)
categories = {"Electronics", "Clothing", "Books", "Home"}

# 5. Monthly revenue - LIST (ordered sequence, values may repeat)
monthly_revenue = [45000, 52000, 48000, 61000, 55000, 63000, 58000, 72000, 68000, 75000, 82000, 95000]

print("1. Students (list):", students)
print("2. Phone book (dict):", phone_book)
print("3. Ocean blue (tuple):", ocean_blue)
print("4. Categories (set):", categories)
print("5. Revenue (list):", monthly_revenue)

## Practice Exercise 7.2

**Task:** Create a small customer database using appropriate data structures:

Create data for 3 customers, each with:
- Customer ID
- Name
- Email
- Purchase history (list of amounts)
- Tags (unique labels like "premium", "new", "frequent")

Then:
1. Print the first customer's name
2. Calculate total purchases for the second customer
3. Check if the third customer has the "premium" tag

**Hint:** Use a list of dictionaries, with a list for purchases and a set for tags.

In [None]:
# Your code here


In [None]:
# Solution 7.2

# Customer database
customers = [
    {
        "id": "C001",
        "name": "Alice Johnson",
        "email": "alice@email.com",
        "purchases": [150.00, 75.50, 220.00],
        "tags": {"premium", "frequent"}
    },
    {
        "id": "C002",
        "name": "Bob Smith",
        "email": "bob@email.com",
        "purchases": [50.00, 125.00],
        "tags": {"new"}
    },
    {
        "id": "C003",
        "name": "Charlie Brown",
        "email": "charlie@email.com",
        "purchases": [300.00, 450.00, 125.00, 89.99],
        "tags": {"frequent", "bulk-buyer"}
    }
]

# 1. First customer's name
print(f"First customer: {customers[0]['name']}")

# 2. Total purchases for second customer
total_purchases = sum(customers[1]["purchases"])
print(f"Bob's total purchases: ${total_purchases:.2f}")

# 3. Check if third customer has "premium" tag
has_premium = "premium" in customers[2]["tags"]
print(f"Charlie is premium: {has_premium}")

---
# Module Summary

## Key Takeaways

### Lists
- Ordered, mutable collections using `[]`
- Access by index: `list[0]`, `list[-1]`
- Slicing: `list[start:end:step]`
- Key methods: `append()`, `insert()`, `remove()`, `pop()`, `sort()`
- Built-ins: `len()`, `sum()`, `min()`, `max()`

### Tuples
- Immutable sequences using `()`
- Perfect for fixed data that shouldn't change
- Tuple unpacking: `a, b, c = my_tuple`
- Great for returning multiple values from functions

### Dictionaries
- Key-value pairs using `{}`
- Access: `dict["key"]` or `dict.get("key")`
- Key methods: `.keys()`, `.values()`, `.items()`, `.update()`
- Can be nested for complex data structures

### Sets
- Unordered collections of unique items using `{}`
- Automatically removes duplicates
- Operations: union `|`, intersection `&`, difference `-`
- Perfect for membership testing and finding unique values

### Choosing the Right Structure
- **List**: Ordered data that may change
- **Tuple**: Fixed data that shouldn't change
- **Dictionary**: Named/labeled data, key-value mappings
- **Set**: Unique items, membership testing, comparing groups

---

## Next Module

In **Module 3: Control Flow**, we'll learn about:
- **Conditional statements** (if, elif, else)
- **Loops** (for and while)
- **List comprehensions** for elegant data transformations
- **Exception handling** for robust code

These tools will let you write code that makes decisions and processes collections of data!

---

## Additional Practice

For extra practice, try these challenges:

1. **Shopping Cart**: Create a program that manages a shopping cart. Allow adding/removing items, track quantities using a dictionary, and calculate totals.

2. **Contact Manager**: Build a simple contact book using a list of dictionaries. Include name, phone, email, and tags for each contact. Implement search by name.

3. **Survey Analysis**: Given a list of survey responses with duplicates, find unique responses, count how many times each appears, and identify the most common response.

4. **Team Comparison**: Create two sets of team member skills. Find what skills are common, unique to each team, and the total unique skills combined.