In [None]:
# === Environment Setup ===
import os, sys, math, time, random, json, textwrap, warnings, timeit
from collections import defaultdict, Counter, ChainMap
from types import MappingProxyType
import numpy as np, pandas as pd, matplotlib.pyplot as plt

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams["figure.dpi"] = 130

# --- Utility Functions ---
def note(msg, **kwargs):
    """Prints a formatted message with a notebook icon."""
    formatted_msg = textwrap.fill(msg, width=100, subsequent_indent='   ')
    print(f"\n📝 {formatted_msg}", **kwargs)
def sec(title):
    """Prints a formatted section title for code blocks."""
    print(f"\n{100*'='}\n| {title.upper()} |\n{100*'='}")

note("Environment initialized.")

# Part 1: Foundations
## Chapter 1.7: Dictionaries and Hash Maps: The Engine of Modern Python

### Introduction: The Most Important Data Structure

The dictionary, `dict`, is arguably Python's most important data structure. It is the fundamental implementation behind many core features of the language: object attributes are stored in an instance's `__dict__`, module namespaces are dictionaries, and class methods are looked up in a class's dictionary. Its defining characteristic is exceptional performance: insertion, deletion, and lookup operations have an **O(1) average time complexity**, meaning their speed is independent of the dictionary's size. This remarkable efficiency is the result of its underlying implementation as a **hash map** (also known as a hash table).

This chapter examines the `dict`, moving beyond basic key-value usage to explore the internal mechanics of its hash map implementation. We will cover the critical requirement of key immutability, modern usage patterns, and the specialized, high-performance dictionary subclasses provided by the `collections` module. A firm grasp of dictionary mechanics is essential for writing efficient, robust, and idiomatic Python for computational research, especially when dealing with the sparse representations of objects (e.g., utility functions defined over a sparse grid) that are common in economics.

### 1. The Hash Map Mechanism

A hash map achieves its speed by using a **hash function** to convert a key into an integer. This integer is then used to compute an index (a "slot" or "bucket") in an underlying array where the associated value is stored. This direct computation of a storage location, rather than a linear search through the keys, is the source of its O(1) performance.

The process for `my_dict[key] = value` is illustrated below:
1.  **Hashing:** Python computes `hash(key)`, which returns an integer (the hash value).
2.  **Indexing:** This hash value is used to calculate an index into an internal array of slots. To ensure the index is within the array bounds, a common method is to use the modulo operator: `index = hash(key) % array_size`.
3.  **Storage:** The key-value pair (or a reference to it) is stored at that index. If that index is already occupied (a **hash collision**), Python uses a probing algorithm to find the next available slot.

![Hash Map Mechanism](../images/png/1.7-hash-map.png)

A lookup for `my_dict[key]` follows the same process: hash the key, compute the initial index, and if the key stored there doesn't match, probe for the next slot until the key is found or an empty slot is encountered.

#### 1.1 The Hashable Requirement and Immutability
For this system to work, a dictionary key must be **hashable**. An object is hashable if it meets two criteria:
1.  It has a `__hash__` method that returns an integer. **This hash value must remain constant throughout the object's lifetime.**
2.  It has an `__eq__` method to compare for value equality.

The fundamental invariant of hashing is: **if `a == b`, then it must be true that `hash(a) == hash(b)`**. 

This is why only **immutable** types can be used as dictionary keys. If a key's value could change after being inserted into a dictionary, its hash value would also change, and the key would become "lost"—unfindable at the location computed from its original hash. All of Python's built-in immutable types (`str`, `int`, `float`, `bool`, `tuple`) are hashable. Mutable types like `list` and `dict` are not.

In [None]:
sec("Why Lists Cannot Be Dictionary Keys")
mutable_key = [1, 2] # A list is mutable
try:
    my_dict = {mutable_key: 'value'}
except TypeError as e:
    note(f"Python correctly raises an error for mutable keys: {e}")

immutable_key = (1, 2) # A tuple is immutable
my_dict = {immutable_key: 'value'}
note(f"Tuples, being immutable, are valid keys: {my_dict}")

##### The `frozenset`: A Hashable Set
Just as a `tuple` is an immutable version of a `list`, a `frozenset` is an immutable version of a `set`. Because they are immutable and their contents cannot change, frozensets are hashable and can be used as dictionary keys. This is useful when you need to use a collection of unique items as a key, for example, to map a set of model features to a specific coefficient.

In [None]:
feature_set = frozenset(['age', 'education', 'income'])
coefficient_map = {feature_set: 0.75}

note(f"A frozenset can be used as a dictionary key:")
print(coefficient_map)

#### 1.2 Hash Collisions, Load Factor, and Performance Degradation

A **hash collision** occurs when two different keys produce the same hash value (and thus map to the same initial slot in the array). Python resolves this through a process called **open addressing** with a pseudo-random probing sequence. If the initial slot is already occupied by a different key, Python applies a deterministic perturbation to the hash value to calculate a new slot, and probes that one. This continues until an empty slot is found.

The **load factor** of a hash table is the ratio of occupied slots to the total number of slots. As the load factor increases, the probability of collisions rises, and lookup times can degrade. To maintain O(1) average performance, Python's dictionaries are resized to a larger array once the load factor exceeds a certain threshold (typically 2/3). This resizing is an O(n) operation, but because it happens infrequently, the **amortized** cost of insertion remains O(1).

However, in the worst-case scenario (e.g., a deliberately crafted set of keys that all collide), the probing sequence can become long, and performance for those keys can degrade towards O(n). This is why Python uses a randomized hash function for strings to make such collision attacks more difficult.

**The Modern, Ordered Dictionary (Python 3.7+):**
A major evolution in Python was making dictionaries **insertion-ordered** by default (this was formalized as a language feature in Python 3.7). This was achieved by changing the internal structure. A modern dictionary uses two arrays, which provides the best of both worlds:
1.  A **sparse `indices` array** (the hash table proper) that stores indices into the second array. This allows for O(1) lookups.
2.  A **dense `entries` array** that stores the actual `(hash, key, value)` tuples in the order they were inserted. This allows for ordered iteration.

![Modern Dictionary Internals](../images/png/1.7-ordered-dict.png)

As the diagram shows, a lookup hashes the key to find a position in the sparse `indices` array, which then points to the correct location in the dense `entries` array. Iteration, however, simply walks down the dense `entries` array, preserving the original insertion order. This hybrid structure, while using slightly more memory, provides both fast lookups and predictable ordering, a feature so valuable it became a core part of the language.

#### 1.3 Performance: O(1) vs. O(n) in Practice
The theoretical difference between constant time O(1) and linear time O(n) is not academic; it has dramatic real-world consequences for large datasets. A `dict` lookup takes roughly the same amount of time whether the dictionary has 10 items or 10 million. A `list` lookup (`value in my_list`) requires scanning the list from the beginning, so its time scales linearly with the size of the list.

In [None]:
sec("Timing List vs. Dictionary Lookups")
N = 1_000_000
data_list = list(range(N))
data_dict = {k: k for k in range(N)}
search_term = N - 1 # Search for the last element (worst case for list)

list_setup = f"import random; data = list(range({N}))"
dict_setup = f"import random; data = {{k:k for k in range({N})}}"
search_code = f"term = {search_term}; term in data"

note("Timing list lookup (O(n)):")
list_time = timeit.timeit(search_code, setup=list_setup, number=10)
print(f"  -> Time taken: {list_time:.6f} seconds")

note("Timing dict lookup (O(1)):")
dict_time = timeit.timeit(search_code, setup=dict_setup, number=10)
print(f"  -> Time taken: {dict_time:.6f} seconds")

### 2. Idiomatic Dictionary Usage

Fluent Python programmers leverage several powerful patterns for creating and manipulating dictionaries.

#### 2.1 Dictionary Comprehensions
Similar to list comprehensions, **dictionary comprehensions** provide a concise and readable syntax for creating dictionaries from iterables. They are often used with `zip` to combine separate lists of keys and values.

In [None]:
sec("Creating Dictionaries with Comprehensions")
country_codes = ['USA', 'DEU', 'JPN']
gdp_data = [25.46, 4.07, 4.23] # In trillions USD

# Create a dictionary mapping country codes to GDP
gdp_map = {code: gdp for code, gdp in zip(country_codes, gdp_data)}
print(f"Created dictionary: {gdp_map}")

# Use a comprehension to create an inverted dictionary
gdp_to_country = {gdp: code for code, gdp in gdp_map.items()}
print(f"Inverted dictionary: {gdp_to_country}")

#### 2.2 Merging Dictionaries
As of Python 3.9, the union operators `|` (merge) and `|=` (update-in-place) provide a clean, readable syntax for combining dictionaries. When keys overlap, the value from the right-hand dictionary prevails.

In [None]:
sec("Modern Dictionary Merging (Python 3.9+)")
default_params = {'solver': 'newton', 'tolerance': 1e-6, 'verbose': False}
user_params = {'tolerance': 1e-8, 'max_iterations': 1000}

# The | operator creates a new dictionary
final_params = default_params | user_params
print(f"Default parameters: {default_params}")
print(f"User parameters:    {user_params}")
print(f"Final parameters:   {final_params}")
note("The 'tolerance' value from user_params (the right-hand side) overwrote the default value.")

#### 2.3 Handling Missing Keys
A common task is to read or update a dictionary key that may not exist. Attempting to access a missing key directly with `my_dict['key']` raises a `KeyError`. There are several idiomatic ways to handle this, each suited to a different situation.

- **`d.get(key, default)`**: The preferred way to look up a key that might be missing. It returns the corresponding value if the key exists, or a specified default value (which itself defaults to `None`) if it doesn't. It never raises a `KeyError` and never modifies the dictionary.

- **`d.setdefault(key, default)`**: This method is more specialized. It also retrieves the key's value, but if the key is missing, it **inserts** the key with the specified default value into the dictionary and then returns that default value. It is useful when you want to both retrieve a value and ensure it exists in the dictionary afterward.

- **`collections.defaultdict`**: This dictionary subclass is the most powerful tool for aggregating or grouping data. You provide a factory function (e.g., `list`, `int`, `set`) during its creation. When a missing key is accessed for the first time, the factory is automatically called to create and insert a default value for that key, which is then returned. This avoids the need for manual checks when building up collections.

In [None]:
sec("Idiomatic Ways to Handle Missing Keys")

model_config = {'learning_rate': 0.01}

note("Use .get() for safe lookups without modification.")
batch_size = model_config.get('batch_size', 32) # Safely get 'batch_size', defaulting to 32
print(f"  Batch size is: {batch_size}")
print(f"  Original config is unchanged: {model_config}")

note("Use .setdefault() to retrieve a key and ensure it exists.")
optimizer = model_config.setdefault('optimizer', 'Adam') # Get 'optimizer', setting it to 'Adam' if absent
print(f"  Optimizer is: {optimizer}")
print(f"  Config was modified: {model_config}")

note("Use defaultdict for efficient grouping/aggregation.")
# Application: Inverting a dictionary to map grades to students
grades = {'Alice': 'A', 'Bob': 'C', 'Charlie': 'A', 'David': 'B'}
students_by_grade = defaultdict(list)
for student, grade in grades.items():
    # If `grade` is a new key, defaultdict calls list() to create an empty list first.
    # This avoids a manual `if grade not in students_by_grade: ...` check.
    students_by_grade[grade].append(student)

print(f"Inverted mapping: {json.dumps(dict(students_by_grade), indent=2)}")

#### 2.4 Dictionary Views and Set Operations
The dictionary methods `.keys()`, `.values()`, and `.items()` do not return lists. They return special **view objects**. A view is a dynamic window into the dictionary's entries. If the dictionary changes, the view reflects these changes immediately. 

Key views and item views also behave like sets, allowing for highly efficient set-based operations like intersection (`&`), union (`|`), and difference (`-`). This is very useful for comparing the parameters or features of different models or datasets.

In [None]:
sec("Dictionary Views and Set Operations")
model1_params = {'alpha': 0.3, 'beta': 0.99, 'gamma': 0.5}
model2_params = {'alpha': 0.3, 'beta': 0.95, 'delta': 0.025}

keys1 = model1_params.keys()
keys2 = model2_params.keys()

print(f"Model 1 Keys: {keys1}")
print(f"Model 2 Keys: {keys2}")

note("Using set operations on key views:")
print(f"  Common parameters (intersection): {keys1 & keys2}")
print(f"  All unique parameters (union):    {keys1 | keys2}")
print(f"  Parameters only in Model 1:       {keys1 - keys2}")

### 3. Specialized Dictionaries in `collections`

The `collections` module provides several highly-optimized dictionary subclasses for specialized use cases.

#### 3.1 `collections.Counter`: Frequency Maps

A `Counter` is a `dict` subclass specifically designed for counting hashable objects. It provides a convenient way to create frequency distributions and includes useful methods like `most_common()`. It also supports multiset arithmetic, allowing you to combine counters in meaningful ways.

**Economic Application:** Analyzing the frequency of different industry sectors in a dataset of firms, or counting word frequencies in a central bank policy document to measure textual similarity using techniques like cosine similarity.

In [None]:
sec("Frequency Counting and Cosine Similarity with Counter")

doc1_words = ['inflation', 'risk', 'growth', 'inflation', 'labor', 'risk']
doc2_words = ['growth', 'jobs', 'inflation', 'risk', 'growth']

counts1 = Counter(doc1_words)
counts2 = Counter(doc2_words)

note("Use Counter to create term-frequency vectors:")
print(f"Document 1 Counts: {counts1}")
print(f"Document 2 Counts: {counts2}")

def cosine_similarity(c1, c2):
    """Calculates the cosine similarity between two Counter vectors."""
    # Get the set of all unique words across both documents
    all_words = c1.keys() | c2.keys()
    
    # Compute dot product and magnitudes
    dot_product = sum(c1[word] * c2[word] for word in all_words)
    mag1 = math.sqrt(sum(c1[word]**2 for word in all_words))
    mag2 = math.sqrt(sum(c2[word]**2 for word in all_words))
    
    return dot_product / (mag1 * mag2) if (mag1 * mag2) != 0 else 0

similarity = cosine_similarity(counts1, counts2)
note(f"The cosine similarity between the two documents is {similarity:.4f}. This provides a quantitative measure of their textual similarity.")

#### 3.2 `collections.ChainMap`: Hierarchical Configurations
A `ChainMap` groups multiple dictionaries into a single, updatable view. Lookups search the underlying mappings sequentially until a key is found. This is perfect for managing hierarchical configurations.

**Crucial Behavior:** Writes, updates, and deletions (`config['key'] = value`) operate *only* on the **first** mapping in the chain. This allows you to temporarily override settings without modifying the underlying default dictionaries.

**Economic Application:** Managing parameters for a complex economic model. You can have a dictionary of global default parameters, another with scenario-specific parameters, and a third with temporary runtime overrides. `ChainMap` provides a clean, logical view of this parameter stack.

In [None]:
sec("Hierarchical Configurations with ChainMap")

def run_model(params):
    """A dummy function that simulates running a model with a given config."""
    print(f"  Running model with beta={params['beta']:.2f}, delta={params['delta']:.3f}, rho={params['rho']:.2f}")

global_defaults = {'beta': 0.99, 'delta': 0.025, 'rho': 0.95, 'grid_size': 100}
high_impatience_scenario = {'beta': 0.90} # Scenario where agents are very impatient

# The search order is: scenario -> defaults
config = ChainMap(high_impatience_scenario, global_defaults)

note("Running the 'high impatience' scenario:")
run_model(config)
print(f"- Beta is read from the first map in the chain: {config['beta']}")
print(f"- Delta is not in the first map, so it's read from the second: {config['delta']}")

note("To run a sensitivity test, we can add a temporary, empty map to the front.")
# .new_child() creates a new ChainMap with a new empty dict at the front.
sensitivity_config = config.new_child()
print(f"Current config maps: {sensitivity_config.maps}")

note("Writes and updates ONLY affect the first mapping.")
sensitivity_config['rho'] = 0.5 # A temporary override
print("Running sensitivity analysis with rho=0.5:")
run_model(sensitivity_config)

print(f"- The new rho value is only in the first map: {sensitivity_config.maps[0]}")
print(f"- The underlying scenario map is unchanged: {sensitivity_config.maps[1]}")

note("After the sensitivity run, we can discard the temporary config.")
print("The original config is unaffected:")
run_model(config)

### 4. Advanced Patterns for Robust Code

Beyond the basics, several advanced patterns are crucial for building large-scale, maintainable computational projects.

#### 4.1 Encapsulation with Read-Only Views
A common challenge in object-oriented design is exposing an object's internal state (e.g., a configuration dictionary) without allowing external code to modify it accidentally. The `types.MappingProxyType` is the standard Python solution. It wraps a dictionary in a read-only view. Any attempt to modify the proxy will raise a `TypeError`, protecting the integrity of the original object.

In [None]:
sec("Protecting Internal State with MappingProxyType")
class Model:
    def __init__(self, params):
        self._params = dict(params)
        # Expose a read-only proxy, not the actual dictionary
        self.params = MappingProxyType(self._params)

    def update_beta(self, new_beta):
        """A controlled method for updating an internal parameter."""
        self._params['beta'] = new_beta
        print(f"[Model internal] Beta updated to {new_beta}")

m = Model({'alpha': 0.3, 'beta': 0.99})
print(f"Initial parameters (read-only view): {m.params}")
print(f"Can read beta: {m.params['beta']}")

note("Attempting to modify the exposed parameters directly fails:")
try:
    m.params['beta'] = 0.95
except TypeError as e:
    print(f"-> Caught expected error: {e}")

note("The proxy is dynamic: it reflects changes made through controlled methods.")
m.update_beta(0.98)
print(f"Parameters view now shows: {m.params}")

#### 4.2 Structural Pattern Matching (Python 3.10+)
The `match...case` statement provides a powerful way to deconstruct objects, including dictionaries, based on their structure. It is more expressive and often safer than a complex chain of `if...elif...else` statements, as it can ensure that all possible cases are handled.

**Economic Application:** Parsing complex configuration dictionaries for a simulation, where different solvers might require different sets of parameters.

In [None]:
sec("Pattern Matching for Config Dispatch")

def setup_solver(config: dict):
    match config:
        case {"solver": "VFI", "grid_size": g, "tolerance": t}:
            print(f"Setting up Value Function Iteration with grid={g}, tol={t}")
        case {"solver": "EGM", "grid_size": g, "max_iter": m, **rest}:
            print(f"Setting up Endogenous Grid Method with grid={g}, max_iter={m}")
            print(f"  (Other params: {rest})")
        case {"solver": s}:
            print(f"Warning: Unknown solver '{s}'. Using defaults.")
        case _:
            print("Error: Invalid configuration object.")

note("Dispatching based on different config dictionaries:")
setup_solver({"solver": "VFI", "grid_size": 200, "tolerance": 1e-7})
setup_solver({"solver": "EGM", "grid_size": 500, "max_iter": 1000, 'verbose': True})
setup_solver({"solver": "PolicyIteration"})
setup_solver({'model_type': 'RBC'})

#### 4.3 Dictionaries and Object Internals: `__dict__` and `__slots__`

As mentioned, dictionaries are central to Python's object model. By default, the attributes of a class instance are stored in a special dictionary called `__dict__`. This is what makes Python so flexible, as you can add new attributes to an instance at any time. 

However, for performance-critical applications where you might create millions of objects (e.g., agents in a simulation), the memory overhead of a `__dict__` for every object can be substantial. The `__slots__` feature provides a memory optimization by telling Python *not* to use a `__dict__` and to only allocate space for a fixed set of attributes. This trades flexibility for a significantly smaller memory footprint and faster attribute access.

In [None]:
sec("__dict__ vs. __slots__ Memory Optimization")

class AgentWithDict:
    def __init__(self, wealth, age):
        self.wealth = wealth
        self.age = age

class AgentWithSlots:
    __slots__ = ['wealth', 'age']
    def __init__(self, wealth, age):
        self.wealth = wealth
        self.age = age

a_dict = AgentWithDict(100.0, 40)
a_slots = AgentWithSlots(100.0, 40)

note(f"Instance with __dict__ has an attribute dictionary: {a_dict.__dict__}")
note("Memory for instance with __dict__: {sys.getsizeof(a_dict) + sys.getsizeof(a_dict.__dict__)} bytes")
note(f"Memory for instance with __slots__: {sys.getsizeof(a_slots)} bytes")

note("Attempting to add a new attribute to the slotted instance fails:")
try:
    a_slots.is_employed = True
except AttributeError as e:
    print(f"-> Caught expected error: {e}")

### 5. Exercises

#### Exercise 1: Choosing the Right Tool for Default Values
You are processing a list of sales transactions `transactions = [{'id': 1, 'region': 'NA'}, {'id': 2, 'region': 'EU'}, {'id': 3, 'region': 'NA'}]`. You want to create a dictionary mapping each region to a count of its transactions. This requires initializing the count for a new region to 0.
- **Task:** Implement this counting logic three times:
  a. Using a standard `dict` and the `.get()` method.
  b. Using a standard `dict` and the `.setdefault()` method.
  c. Using a `defaultdict(int)`.
- **Analysis:** Briefly explain which approach you find most elegant and why. Which is likely most performant for a very large list of transactions?

#### Exercise 2: `ChainMap` for Model Scenarios
You have a `baseline_params` dictionary for a macroeconomic model. Create two alternative scenario dictionaries, `high_inflation_shock` and `low_productivity_shock`, each changing just one or two parameters from the baseline. Use `ChainMap` to create `config_high_inf` and `config_low_prod` that represent the full parameter sets for each scenario without duplicating the baseline parameters. Demonstrate that you can access both an overridden and an unchanged parameter correctly from one of the configurations.

#### Exercise 3: Comparing Word Frequencies with `Counter`
- **Task:** Using the `counts1` and `counts2` `Counter` objects from Section 3.1, find the words that are common to both documents (`intersection`) and the words that appear in either document but not both (`symmetric difference`).
- **Hint:** `Counter` keys behave like sets. You can use the `&` operator for intersection and `^` for symmetric difference on the keys of the counters.

#### Exercise 4: Performance Deep Dive
- **Task:** Re-run the list vs. dict lookup timing experiment from Section 1.3, but this time search for an element at the *beginning* of the list (`search_term = 0`). 
- **Analysis:** Does the relative performance change? Explain your results in terms of the Big-O complexity of the lookup operations for each data structure.

--- 

### Challenge Exercise: Building an Inverted Index

An **inverted index** is a core data structure used in search engines. It maps content, such as words, to a list of locations where they can be found. This exercise simulates building one for a small corpus of economic policy documents.

**Objective:** Create an inverted index that maps each unique word to a list of document IDs where that word appears.

1.  **The Data:**
    ```python
    corpus = {
        'doc1': "The Fed raised interest rates due to inflation concerns.",
        'doc2': "Labor market remains strong, but inflation is a primary risk.",
        'doc3': "Geopolitical risk could impact economic growth."
    }
    ```

2.  **Build the Index:**
    - **Task:** Create an inverted index using a `defaultdict(list)`. Iterate through each document in the corpus. For each document, you'll need to:
        a. Convert the text to lowercase.
        b. Remove punctuation (e.g., using `str.replace('.', '').replace(',', '')`).
        c. Split the text into a list of words.
        d. Iterate through the words. For each word, append the current `doc_id` (e.g., 'doc1') to the list associated with that word in your inverted index.

3.  **Analyze the Index:**
    - Print the inverted index for the words 'inflation', 'risk', and 'growth'.

4.  **Bonus: Add Word Counts:**
    - Modify your inverted index. Instead of a list of document IDs, the value for each word should be another dictionary that maps a `doc_id` to the *frequency* of that word within that document. The final structure should look like: `{'inflation': {'doc1': 2, 'doc2': 1}, ...}`.
    - **Hint:** You might find a `defaultdict(Counter)` to be a particularly elegant tool for this task.