# Module 2: Masterclass in Python Data Structures

**The Scenario:** You are the Lead Architect for a high-frequency trading platform. You are moving beyond "making it work" to "making it scale." 

This notebook is a comprehensive reference guide. We will dissect the Python language features used to process financial data, handling everything from string parsing to high-performance caching.

**Comprehensive Roadmap:**
1.  **Strings & Numbers:** Advanced parsing, formatting, and precision.
2.  **Lists:** Slicing, comprehensions, and memory management.
3.  **Tuples:** Unpacking, immutability, and hashing.
4.  **Sets:** Venn diagram logic and O(1) membership.
5.  **Dictionaries:** The engine of Python (Hash Maps).
6.  **Internals:** Mutability traps, memory overhead, and Big-O.
7.  **The `collections` Module:** `defaultdict`, `Counter`, `deque`, `namedtuple`.

## 1. Strings & Numbers (The Raw Layer)
Data rarely arrives clean. We need to master string manipulation to normalize inputs.

In [4]:
# Raw Feed: Symbol, Price, Date, Tags (Messy formatting)
raw_feed = "  $AAPL : 150.2544 : 2023-10-01 : tech,hardware  "

# --- String Normalization ---
# 1. Cleaning: strip() removes leading/trailing whitespace
clean = raw_feed.strip()

# 2. Removal: replace() or slicing to remove '$'
clean = clean.replace('$', '')

# 3. Parsing: split() creates a list based on delimiter
parts = clean.split(' : ')
print(f"Split Parts: {parts}")

# --- Advanced formatting & Typing ---
symbol = parts[0].upper()
price = float(parts[1])

# f-strings allow embedded expressions and formatting
# .2f rounds to 2 decimal places
print(f"formatted: {symbol} trading at ${price:.2f}")

# --- Inspection Methods ---
# These are critical for validation pipelines
print(f"Is Alpha-Numeric? {symbol.isalnum()}")
print(f"Starts with 'A'? {symbol.startswith('A')}")
print(f"Index of 'PL': {symbol.find('PL')}")

Split Parts: ['AAPL', '150.2544', '2023-10-01', 'tech,hardware']
formatted: AAPL trading at $150.25
Is Alpha-Numeric? True
Starts with 'A'? True
Index of 'PL': 2


In [5]:
"14".isdigit()

True

## 2. Lists (The Workhorse)
Lists are dynamic arrays. Beyond basic appending, you need to understand **Slicing** and **Comprehensions**.

In [17]:
prices = [100, 105, 102, 108, 110, 115]

# --- 1. Slicing [start:stop:step] ---
print(f"First 3 prices: {prices[:3]}")
print(f"Last 3 prices:  {prices[-3:]}")
print(f"Every 2nd day:  {prices[::2]}")
print(f"Reversed:       {prices[::-1]}")

# --- 2. Advanced Modification ---

prices.insert(2, 99)     # Insert at index 2 (O(n) operation!)

val = prices.pop(0)      # Remove from front (O(n) operation!)

idx = prices.index(110)  # Find index of a value

# --- 3. List Comprehensions (The Pythonic Way) ---
# Create a new list of prices > 105, applying a fee
# Syntax: [expression for item in list if condition]

adjusted_prices = [p * 1.01 for p in prices if p > 105]

print(f"Adjusted High Prices: {adjusted_prices}")

# --- 4. Copying (Crucial for Mutable types) ---
prices_backup = prices.copy() # Shallow copy
prices.clear() # Wipes original, backup remains
print(f"Backup size: {len(prices_backup)}")

First 3 prices: [100, 105, 102]
Last 3 prices:  [108, 110, 115]
Every 2nd day:  [100, 102, 110]
Reversed:       [115, 110, 108, 102, 105, 100]
Adjusted High Prices: [109.08, 111.1, 116.15]
Backup size: 6


## 3. Tuples (Immutability & Hashing)
Tuples are not just "read-only lists." They are structurally different. They are hashable, meaning they can be used as dictionary keys.

In [None]:
# A composite key for a dictionary (e.g., (Symbol, Date))
trade_key = ("AAPL", "2023-10-01") 
data = {trade_key: 150.25}

# Try doing this with a list -> TypeError: unhashable type: 'list'
# data = {["AAPL", "2023-10-01"]: 150.25} 

# --- Unpacking Patterns ---
row = (1, "GOOG", 2500, "USD", "NASDAQ")

# Standard unpacking
id_num, ticker, price, currency, exchange = row

# Star unpacking (Python 3+)
# Captures the "rest" of the items
id_num, ticker, *metadata = row
print(f"Metadata captured: {metadata}")

## 4. Sets (Logic & Math)
Sets are unordered collections of unique elements. They are implemented as Hash Maps (dictionaries with dummy values). This makes membership testing `O(1)`.

In [None]:
portfolio_a = {"AAPL", "GOOG", "MSFT"}
portfolio_b = {"GOOG", "AMZN", "NFLX"}

# --- Set Math ---
print(f"Union (|): {portfolio_a | portfolio_b}")
print(f"Intersection (&): {portfolio_a & portfolio_b}")
print(f"Difference (-): {portfolio_a - portfolio_b} (In A but not B)")
print(f"Symmetric Difference (^): {portfolio_a ^ portfolio_b} (In one, not both)")

# --- Advanced Checks ---
tech_stocks = {"AAPL", "GOOG", "MSFT"}
print(f"Is A a subset of Tech? {portfolio_a.issubset(tech_stocks)}")

# --- Safe Removal ---
portfolio_a.discard("TSLA") # Does nothing if missing (Safe)
# portfolio_a.remove("TSLA") # Raises KeyError if missing (Risky)

## 5. Dictionaries (The Engine)
Mastering dicts is mastering Python. 

In [None]:
market = {"AAPL": 150, "GOOG": 2800}

# --- 1. Access Methods ---
# setdefault: Get value, OR set it if missing (Crucial for grouping)
val = market.setdefault("TSLA", 700) 
print(f"Market after setdefault: {market}")

# --- 2. Iteration ---
# .items() gives (key, value) tuples
for k, v in market.items():
    pass

# --- 3. Dictionary Comprehension ---
# Create a new dict, swapping key/value (Invert the map)
price_to_ticker = {v: k for k, v in market.items()}
print(f"Inverted Map: {price_to_ticker}")

# --- 4. Merging (Python 3.9+) ---
updates = {"AAPL": 155, "NVDA": 400}
merged = market | updates # Pipe operator merges dicts
print(f"Merged: {merged}")

## 6. Deep Dive: Mutability, References, and Memory
This is where beginner code breaks in production.

In [None]:
import sys

# --- 1. The Mutable Default Argument Trap ---
# NEVER do this: def add_item(item, lst=[])
def risky_function(item, container=[]):
    container.append(item)
    return container

print(risky_function("A")) # ['A']
print(risky_function("B")) # ['A', 'B'] -> The list persists across calls!

# --- 2. Memory Overhead ---
my_list = [1, 2, 3]
my_tuple = (1, 2, 3)

print(f"List Size: {sys.getsizeof(my_list)} bytes")
print(f"Tuple Size: {sys.getsizeof(my_tuple)} bytes (Leaner)")

# --- 3. Mutability Reference ---
nested = [1, [2, 3], 4]
copy_nested = nested.copy() # Shallow copy

nested[1][0] = 999
print(f"Original: {nested}")
print(f"Copy:     {copy_nested} (Affected! Inner list is the same object)")

## 7. The `collections` Module (Standard Library Gems)
Don't reinvent the wheel. These high-performance containers are built-in.

In [None]:
from collections import defaultdict, Counter, deque, namedtuple

# --- 1. defaultdict ---
# Automatically creates values for missing keys
# Task: Group trades by Symbol
trades = [("AAPL", 100), ("GOOG", 50), ("AAPL", 25)]
portfolio = defaultdict(list) # Default value is an empty list

for symbol, amount in trades:
    portfolio[symbol].append(amount)
print(f"Grouped Portfolio: {dict(portfolio)}")

# --- 2. Counter ---
# High-speed frequency counting
orders = ['buy', 'sell', 'buy', 'hold', 'buy', 'sell']
counts = Counter(orders)
print(f"Order Distribution: {counts}")
print(f"Most Common: {counts.most_common(1)}")

# --- 3. namedtuple ---
# Immutable, memory-efficient records
Tick = namedtuple('Tick', ['symbol', 'price', 'time'])
t = Tick('MSFT', 310.5, '14:00')
print(f"Tick Symbol: {t.symbol} (Access by name, not index)")

# --- 4. deque (Double-Ended Queue) ---
# O(1) appends and pops from both ends (Lists are O(n) from start)
history = deque(maxlen=3)
history.extend([10, 20, 30])
history.append(40) # 10 is pushed out automatically
print(f"Rolling Window: {history}")