# Chapter 4: Core Data Structures

Data structures are the containers that hold your application's information. Choosing the right data structure is one of the most critical decisions in software engineering—it affects the readability, performance, and maintainability of your code. Python provides a rich set of built-in data structures that are optimized, well-tested, and tightly integrated with the language syntax.

In this chapter, we will explore Python's fundamental collections: **Lists**, **Tuples**, **Dictionaries**, and **Sets**. We will examine their time complexities (Big O notation), mutability characteristics, and memory implications. By the end, you will understand not just *how* to use these structures, but *when* to use each one based on your specific requirements.

## 4.1 Lists

The **list** is Python's most versatile sequence type. It is an ordered, mutable collection that can hold items of any type (though homogeneous lists—containing one type—are preferred for clarity).

### Creation and Basic Operations

```python
# Creating lists
empty_list: list = []
empty_list_alt: list = list()  # Less common, but useful for converting iterables
numbers: list[int] = [1, 2, 3, 4, 5]
mixed_types: list = [1, "hello", 3.14, True]  # Possible but discouraged

# Pre-allocating with repetition (caution with mutable objects)
zeros: list[int] = [0] * 5  # [0, 0, 0, 0, 0]
# Dangerous with mutable objects:
matrix_wrong: list[list[int]] = [[0] * 3] * 3
# [[0, 0, 0], [0, 0, 0], [0, 0, 0]] - but all inner lists are the SAME object!
matrix_wrong[0][0] = 1
print(matrix_wrong)  # [[1, 0, 0], [1, 0, 0], [1, 0, 0]] - unexpected!

# Correct way to create a matrix:
matrix_correct: list[list[int]] = [[0 for _ in range(3)] for _ in range(3)]
```

### Indexing, Slicing, and Striding

Lists support the same indexing and slicing syntax as strings (see Chapter 2), but unlike strings, lists are mutable—you can change their contents.

```python
fruits: list[str] = ["apple", "banana", "cherry", "date", "elderberry"]

# Indexing
first: str = fruits[0]      # "apple"
last: str = fruits[-1]      # "elderberry"

# Slicing [start:stop:step]
first_three: list[str] = fruits[:3]     # ["apple", "banana", "cherry"]
last_two: list[str] = fruits[-2:]       # ["date", "elderberry"]
every_other: list[str] = fruits[::2]    # ["apple", "cherry", "elderberry"]
reversed_list: list[str] = fruits[::-1] # Reverses in-place view (creates new list)

# Slice assignment - replacing sections
fruits[1:3] = ["blueberry", "cranberry"]  # Replaces indices 1 and 2
print(fruits)  # ["apple", "blueberry", "cranberry", "date", "elderberry"]

# Deleting with slices
del fruits[::2]  # Removes every other element
```

### List Methods and Algorithmic Complexity

Understanding the time complexity (Big O notation) of list operations is essential for writing performant code, especially with large datasets.

| Operation | Method | Time Complexity | Notes |
|-----------|--------|----------------|-------|
| Access by index | `lst[i]` | O(1) | Direct pointer arithmetic |
| Append | `lst.append(x)` | O(1) amortized | Occasionally needs to resize array |
| Pop from end | `lst.pop()` | O(1) | |
| Pop from beginning | `lst.pop(0)` | O(n) | Must shift all elements left |
| Insert | `lst.insert(i, x)` | O(n) | May need to shift elements |
| Delete | `del lst[i]` | O(n) | Must shift elements left |
| Search (contains) | `x in lst` | O(n) | Linear search |
| Sort | `lst.sort()` | O(n log n) | Timsort algorithm |
| Reverse | `lst.reverse()` | O(n) | In-place |
| Copy | `lst.copy()` | O(n) | Shallow copy |

```python
# Demonstrating key methods
data: list[int] = [3, 1, 4, 1, 5]

# Adding elements
data.append(9)           # [3, 1, 4, 1, 5, 9] - O(1)
data.insert(0, 0)        # [0, 3, 1, 4, 1, 5, 9] - O(n), slow for large lists
data.extend([2, 6])      # [0, 3, 1, 4, 1, 5, 9, 2, 6] - O(k) where k is length of extension

# Removing elements
last: int = data.pop()   # Returns 6, removes from end - O(1)
first: int = data.pop(0) # Returns 0, removes from beginning - O(n), inefficient!
data.remove(1)           # Removes first occurrence of 1 - O(n)

# Ordering
data.sort()              # In-place sort, returns None - O(n log n)
sorted_data: list[int] = sorted(data)  # Returns new sorted list, original unchanged

# Finding
index: int = data.index(5)  # Returns index of first 5, raises ValueError if not found - O(n)
count: int = data.count(1)  # Returns number of occurrences - O(n)
```

**Industry Performance Tip**: If you frequently need to add/remove items from the beginning of a collection, use `collections.deque` (discussed later) instead of a list, as it provides O(1) operations at both ends.

### Shallow vs. Deep Copying

When you copy a list containing mutable objects, you must understand the difference between shallow and deep copies.

```python
import copy

# Original list with nested mutable object
original: list = [[1, 2, 3], ["a", "b", "c"]]

# Assignment (reference, not copy)
reference: list = original
reference[0][0] = 999
print(original[0][0])  # 999 - original modified!

# Shallow copy: copies the list container, but not the nested objects
shallow: list = original.copy()  # or original[:], or list(original)
shallow[0] = [7, 8, 9]        # Only affects shallow - replaces entire sublist
shallow[1][0] = "CHANGED"     # Affects both! - modifies nested object in place
print(original[1][0])         # "CHANGED"

# Deep copy: recursively copies all nested objects
deep: list = copy.deepcopy(original)
deep[1][1] = "DEEP"
print(original[1][1])         # "b" - unchanged
```

## 4.2 Tuples

**Tuples** are immutable, ordered sequences. Once created, they cannot be modified. This immutability makes them hashable (usable as dictionary keys) and safer for data integrity.

### Creation and Characteristics

```python
# Creating tuples
empty: tuple = ()
singleton: tuple = (42,)  # Note the comma! Without it, it's just an integer in parentheses
coordinates: tuple[float, float] = (10.5, 20.8)
person: tuple[str, int, bool] = ("Alice", 30, True)

# Parentheses are optional (tuple packing)
implicit_tuple: tuple[int, int] = 1, 2, 3  # Valid but less readable, parentheses preferred
```

### Immutability Implications

Immutability means the tuple's structure cannot change, but if it contains mutable objects, those objects can still be modified.

```python
# Tuple with mutable list inside
data: tuple[int, list[int]] = (1, [2, 3])
# data[0] = 10  # TypeError: 'tuple' object does not support item assignment
# data[1] = [4, 5]  # TypeError

data[1].append(4)  # Valid! The list inside is mutable
print(data)  # (1, [2, 3, 4])
```

### Tuple Unpacking

Tuple unpacking is a powerful Python feature for assigning multiple variables simultaneously.

```python
# Basic unpacking
x, y = (10, 20)
print(x)  # 10
print(y)  # 20

# Swapping variables (Pythonic, no temp variable needed)
a, b = 5, 10
a, b = b, a  # a is now 10, b is now 5

# Extended unpacking (Python 3+)
first, *rest = [1, 2, 3, 4, 5]
print(first)  # 1
print(rest)   # [2, 3, 4, 5]

first, *middle, last = [1, 2, 3, 4, 5]
print(middle)  # [2, 3, 4]

# Ignoring values with underscore convention
x, _, y = (1, 2, 3)  # Convention: _ means "don't care"

# Unpacking in function arguments
def process_point(x: int, y: int, z: int) -> None:
    print(f"Point: ({x}, {y}, {z})")

point: tuple[int, int, int] = (1, 2, 3)
process_point(*point)  # Unpacks tuple into positional arguments
```

### Named Tuples

Standard tuples are accessed by index, which reduces readability. `namedtuple` (from `collections`) or `typing.NamedTuple` (modern approach) creates tuple subclasses with named fields.

```python
from typing import NamedTuple

# Modern approach (Python 3.6+)
class Point(NamedTuple):
    x: float
    y: float
    z: float = 0.0  # Default values supported

p = Point(10.0, 20.0)
print(p.x)      # 10.0 - readable access
print(p[0])     # 10.0 - index access still works
x, y, z = p     # Unpacking works

# Immutability maintained
# p.x = 15.0  # AttributeError: can't set attribute

# Use case: Returning multiple named values from function
def get_user() -> tuple[str, int]:
    # Unclear what these represent without documentation
    return "Alice", 30

def get_user_named() -> NamedTuple("User", [("name", str), ("age", int)]):
    User = NamedTuple("User", [("name", str), ("age", int)])
    return User("Alice", 30)

# Even better with class definition (as shown above)
```

**When to use Tuples vs Lists:**
*   **Tuple**: Fixed data (coordinates, RGB values), dictionary keys, function returns, data integrity requirements.
*   **List**: Dynamic collections that need to grow/shrink or be modified in place.

## 4.3 Dictionaries

**Dictionaries** are Python's implementation of a hash map or associative array. They store key-value pairs with O(1) average time complexity for lookup, insertion, and deletion. As of Python 3.7+ (guaranteed in 3.7+, implementation detail in 3.6), dictionaries maintain insertion order.

### Creation and Basic Operations

```python
# Creating dictionaries
empty: dict = {}
empty_alt: dict = dict()
user: dict[str, str | int] = {
    "name": "Alice",
    "age": "30",
    "email": "alice@example.com"
}

# From keyword arguments (keys must be valid identifiers)
person: dict[str, str] = dict(name="Bob", age="25", city="NYC")

# From sequences of pairs
pairs: list[tuple[str, int]] = [("a", 1), ("b", 2)]
from_pairs: dict[str, int] = dict(pairs)

# Dictionary comprehension
squares: dict[int, int] = {x: x**2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
```

### Accessing and Modifying Data

```python
scores: dict[str, int] = {"Alice": 95, "Bob": 87, "Charlie": 92}

# Accessing
alice_score: int = scores["Alice"]  # 95
# scores["David"]  # KeyError: 'David' - crashes if key missing

# Safe access methods
bob_score: int | None = scores.get("Bob")        # 87
david_score: int | None = scores.get("David")    # None (no error)
david_score_alt: int = scores.get("David", 0)    # 0 (default value)

# Adding/Updating
scores["David"] = 88          # Adds new key
scores["Alice"] = 98          # Updates existing key

# Modern update (Python 3.9+)
scores |= {"Eve": 91}         # Merge operator (in-place update)
new_scores: dict[str, int] = scores | {"Frank": 85}  # Creates new merged dict

# Removing
del scores["Bob"]             # Removes key, raises KeyError if missing
charlie_score: int = scores.pop("Charlie")  # Removes and returns value
last_item: tuple[str, int] = scores.popitem()  # Removes and returns (key, value) in LIFO order (since 3.7+)
```

### Dictionary Views

Dictionaries provide dynamic views of their keys, values, and items. These views reflect dictionary changes in real-time.

```python
data: dict[str, int] = {"a": 1, "b": 2, "c": 3}

keys = data.keys()      # dict_keys(['a', 'b', 'c'])
values = data.values()  # dict_values([1, 2, 3])
items = data.items()    # dict_items([('a', 1), ('b', 2), ('c', 3)])

# Views are dynamic (references, not copies)
del data["a"]
print(keys)  # dict_keys(['b', 'c']) - automatically reflects deletion

# Iteration (most common use)
for key in data:  # Equivalent to for key in data.keys():
    print(key)

for key, value in data.items():  # Unpacking tuples
    print(f"{key}: {value}")
```

### Hashing and Key Requirements

Dictionary keys must be **hashable**—they must have a hash value that never changes during their lifetime. Immutable types (strings, numbers, tuples containing only immutable items) are hashable. Lists and dictionaries are not.

```python
# Valid keys
valid: dict = {
    "string": 1,
    42: 2,
    (1, 2, 3): 3,  # Tuple of immutable items
}

# Invalid keys
# invalid = {[1, 2, 3]: "list"}  # TypeError: unhashable type: 'list'
```

### Advanced Dictionary Techniques

**Setdefault**: Get a value, or set it if missing (useful for grouping).

```python
# Without setdefault (verbose)
groups: dict[str, list[int]] = {}
for key, value in [("a", 1), ("b", 2), ("a", 3)]:
    if key not in groups:
        groups[key] = []
    groups[key].append(value)

# With setdefault
groups = {}
for key, value in [("a", 1), ("b", 2), ("a", 3)]:
    groups.setdefault(key, []).append(value)
# Result: {'a': [1, 3], 'b': [2]}
```

**Note**: In production code, `collections.defaultdict` is often cleaner than `setdefault` (see section 4.5).

## 4.4 Sets

**Sets** are unordered collections of unique, hashable elements. They are mathematically defined and support operations like union, intersection, and difference.

### Creation and Properties

```python
# Creating sets
empty_set: set = set()  # {} creates an empty dict, not set!
numbers: set[int] = {1, 2, 3, 4, 5}
duplicates_removed: set[int] = {1, 2, 2, 3, 3, 3}  # {1, 2, 3}

# From iterable
from_list: set[int] = set([1, 2, 3, 2, 1])  # {1, 2, 3}

# Set comprehension
evens: set[int] = {x for x in range(10) if x % 2 == 0}
```

### Set Operations

Sets implement mathematical set theory operations efficiently (O(n) or better).

```python
a: set[int] = {1, 2, 3, 4}
b: set[int] = {3, 4, 5, 6}

# Union: elements in either set
union: set[int] = a | b           # {1, 2, 3, 4, 5, 6}
union_method: set[int] = a.union(b)

# Intersection: elements in both sets
intersection: set[int] = a & b    # {3, 4}
inter_method: set[int] = a.intersection(b)

# Difference: elements in a but not in b
diff: set[int] = a - b            # {1, 2}
diff_method: set[int] = a.difference(b)

# Symmetric Difference: elements in exactly one set (XOR)
sym_diff: set[int] = a ^ b        # {1, 2, 5, 6}
sym_method: set[int] = a.symmetric_difference(b)

# Subset/Superset checks
is_subset: bool = {1, 2}.issubset(a)      # True
is_superset: bool = a.issuperset({1, 2})  # True
is_disjoint: bool = {7, 8}.isdisjoint(a)  # True (no common elements)
```

### Modifying Sets

```python
s: set[int] = {1, 2, 3}

# Adding/Removing
s.add(4)           # {1, 2, 3, 4} - O(1)
s.remove(2)        # {1, 3, 4} - raises KeyError if missing
s.discard(5)       # No error if 5 not present
popped: int = s.pop()  # Removes and returns arbitrary element (sets are unordered)

# Update operations (in-place)
a: set[int] = {1, 2}
a.update({3, 4})   # a is now {1, 2, 3, 4} (union in-place)
a.intersection_update({2, 3, 5})  # a is now {2, 3}
```

### Frozensets

A **frozenset** is an immutable version of a set. Like tuples, they are hashable and can be used as dictionary keys or elements of other sets.

```python
frozen: frozenset[int] = frozenset([1, 2, 3])
# frozen.add(4)  # AttributeError: 'frozenset' object has no attribute 'add'

# Usage as dictionary key
cache: dict[frozenset[int], str] = {
    frozenset([1, 2, 3]): "group A",
    frozenset([4, 5, 6]): "group B"
}
```

**Practical Use Cases for Sets:**
1.  **Removing duplicates** from a list while preserving order (using `dict.fromkeys()` trick or manual set tracking).
2.  **Membership testing** (`x in collection`)—O(1) for sets vs O(n) for lists.
3.  **Finding common items** between two datasets efficiently.
4.  **Filtering**: Ensuring processed items are unique.

```python
# Efficient duplicate removal preserving order (Python 3.7+)
items: list[int] = [1, 2, 2, 3, 3, 3, 4]
unique_ordered: list[int] = list(dict.fromkeys(items))  # [1, 2, 3, 4]
```

## 4.5 The `collections` Module

The standard library's `collections` module provides specialized container datatypes that extend the functionality of built-in structures.

### `defaultdict`: Automatic Default Values

Eliminates the need to check if a key exists before appending or accumulating.

```python
from collections import defaultdict

# Grouping words by length
words: list[str] = ["apple", "bat", "bar", "atom", "book"]
by_length: defaultdict[int, list[str]] = defaultdict(list)

for word in words:
    by_length[len(word)].append(word)
    # No need for: if len(word) not in by_length: by_length[len(word)] = []

print(dict(by_length))  # {5: ['apple', 'atom'], 3: ['bat', 'bar', 'book']}

# Default value factory examples
counter: defaultdict[str, int] = defaultdict(int)  # Defaults to 0
counter["new_key"] += 1  # Automatically starts at 0, then becomes 1

# Custom default factory
from datetime import datetime
last_access: defaultdict[str, datetime] = defaultdict(datetime.now)
```

### `Counter`: Hashable Object Counting

A specialized dictionary for counting hashable objects.

```python
from collections import Counter

colors: list[str] = ["red", "blue", "red", "green", "blue", "blue"]
color_count: Counter = Counter(colors)
print(color_count)  # Counter({'blue': 3, 'red': 2, 'green': 1})

# Most common elements
print(color_count.most_common(2))  # [('blue', 3), ('red', 2)]

# Arithmetic operations between counters
c1: Counter = Counter(a=3, b=1)
c2: Counter = Counter(a=1, b=2)
print(c1 + c2)  # Counter({'a': 4, 'b': 3})
print(c1 - c2)  # Counter({'a': 2}) - negative counts removed
```

### `deque`: Double-Ended Queue

Optimized for fast appends and pops from both ends (O(1)), unlike lists which are O(n) for left operations.

```python
from collections import deque

# Creating a deque
d: deque[int] = deque([1, 2, 3])

# Efficient operations at both ends
d.append(4)       # Right side: deque([1, 2, 3, 4]) - O(1)
d.appendleft(0)   # Left side: deque([0, 1, 2, 3, 4]) - O(1)
right: int = d.pop()      # Returns 4 - O(1)
left: int = d.popleft()   # Returns 0 - O(1) (vs O(n) for list.pop(0))

# Useful for sliding windows and BFS algorithms
d.rotate(1)       # Rotate right by 1: deque([4, 1, 2, 3])
d.rotate(-2)      # Rotate left by 2: deque([2, 3, 4, 1])

# Max length (circular buffer)
recent: deque[int] = deque(maxlen=3)
for i in range(5):
    recent.append(i)
print(recent)  # deque([2, 3, 4]) - automatically discards oldest
```

### `OrderedDict` (Legacy Note)

Prior to Python 3.7, `OrderedDict` was necessary to guarantee insertion order. Since 3.7, built-in `dict` maintains order, making `OrderedDict` primarily useful for its specialized methods like `move_to_end()` or when equality must consider order (regular dicts consider `{'a': 1, 'b': 2}` equal to `{'b': 2, 'a': 1}`).

## Summary

You now possess a comprehensive understanding of Python's built-in data structures. You understand that **Lists** are dynamic arrays ideal for ordered collections that change frequently, but you must respect their O(n) performance characteristics for front insertions. **Tuples** provide data integrity and hashability through immutability, with named tuples offering readable field access. **Dictionaries** offer lightning-fast O(1) lookups via hashing, serving as the backbone for most Python data organization, while **Sets** provide mathematical set operations and efficient uniqueness validation.

We also explored the `collections` module's specialized containers—`defaultdict` for cleaner accumulation logic, `Counter` for frequency analysis, and `deque` for high-performance queue operations. These tools separate novice Python programmers from professionals who can select the optimal structure based on algorithmic complexity and data access patterns.

However, data structures alone do not make a program. To perform meaningful operations on this data, to avoid repetition, and to organize our code logically, we need to encapsulate behavior into reusable blocks. In the next chapter, we will explore the cornerstone of modular programming: functions.

**Next Chapter**: Chapter 5: Functions and Modular Programming.