# Orderbook Optimization Workshop

## Goal
Start with a **working but slow** orderbook for asks, then **optimize it together** to achieve O(1) best ask lookups!

## API: 3 Core Functions
```python
submit(order_id, price, quantity)  # Add a sell order
cancel(order_id)                   # Cancel a sell order
get_best_price()                   # Get lowest ask price
```

## Your Mission
Optimize to reach these target complexities:

| Operation | Current | Target |
|-----------|---------|--------|
| **submit()** | O(N) ‚ùå | O(log P) ‚úÖ |
| **cancel()** | O(N) ‚ùå | O(P) ‚úÖ|
| **get_best_price()** | **O(N)** ‚ùå | **O(1)** ‚úÖ|

Where:
- N = total orders (could be 1000s!)
- P = number of unique price levels (~10-100)

## Setup

In [57]:
from enum import Enum
from typing import Optional

class OrderStatus(Enum):
    OPEN = "OPEN"
    CANCELLED = "CANCELLED"

class Order:
    def __init__(self, order_id: str, price: float, quantity: int):
        self.order_id = order_id
        self.price = price
        self.quantity = quantity
        self.status = OrderStatus.OPEN
    
    def __repr__(self):
        return f"Order({self.order_id}, ${self.price}, qty={self.quantity}, {self.status.value})"

print("‚úì Setup complete!")

‚úì Setup complete!


## Suboptimal Orderbook (Asks Only)

### Current Data Structure: Simple List ‚ùå
```python
orders = [order1, order2, order3, ...]  # Just a flat list!
```

**Problems:**
- `get_best_price()` scans entire list for min price: **O(N)** üêå
- `cancel()` requires linear search: **O(N)** üêå
- `submit()` must check for duplicates: **O(N)** üêå

How can we make this faster?

In [58]:
class SuboptimalOrderBook:
    """A working but inefficient orderbook for asks."""
    
    def __init__(self):
        # SUBOPTIMAL: Just a flat list of all sell orders!
        self.orders = []
    
    def submit(self, order_id: str, price: float, quantity: int) -> bool:
        """
        Submit a new sell order (ask).
        
        Current: O(N) - scan all orders to check for duplicate ID
        Target: O(log P)

        """
        # Check for duplicate (O(N) scan - SLOW!)
        for order in self.orders:
            if order.order_id == order_id:
                return False
        
        # Create and add order
        order = Order(order_id, price, quantity)
        self.orders.append(order)
        return True
    
    def cancel(self, order_id: str) -> bool:
        """
        Cancel a sell order by ID.
        
        Current: O(N) - linear search
        Target: O(1) - dict lookup
        
        Optimization: orders_dict[order_id] for instant access
        """
        # Linear search (SLOW!)
        for order in self.orders:
            if order.order_id == order_id:
                if order.status == OrderStatus.CANCELLED:
                    return False
                order.status = OrderStatus.CANCELLED
                return True
        return False
    
    def get_best_price(self) -> Optional[float]:
        """
        Get the lowest ask price.
        
        Current: O(N) - scan ALL orders to find min
        Target: O(1)
        
        """
        best_price = None
        
        # SUPER SLOW: Check every single order!
        for order in self.orders:
            if order.status == OrderStatus.OPEN:
                if best_price is None or order.price < best_price:
                    best_price = order.price
        
        return best_price
    
    def __repr__(self):
        """Display the orderbook."""
        lines = ["=" * 60, "ASK ORDERBOOK (Suboptimal)", "=" * 60]
        
        # Group by price (inefficient!)
        prices = {}
        for order in self.orders:
            if order.status == OrderStatus.OPEN:
                prices[order.price] = prices.get(order.price, 0) + order.quantity
        
        # Show asks (low to high)
        for price in sorted(prices.keys()):
            lines.append(f"ASK: ${price:>8.2f} | Qty: {prices[price]}")
        
        lines.append("=" * 60)
        best = self.get_best_price()
        if best:
            lines.append(f"Best Ask (Lowest): ${best:.2f}")
        lines.append(f"Total orders: {len(self.orders)}")
        
        return "\n".join(lines)

print("‚úì Suboptimal OrderBook created!")
print("‚úì Try: ob = SuboptimalOrderBook()")

‚úì Suboptimal OrderBook created!
‚úì Try: ob = SuboptimalOrderBook()


## Test the Suboptimal Implementation

It works... but watch it slow down!

In [59]:
def test_orderbook(orderbook_class, name="OrderBook"):
    """Test basic orderbook functionality."""
    print(f"\n{'='*50}")
    print(f"Testing: {name}")
    print(f"{'='*50}\n")
    
    # Create orderbook and add orders
    ob = orderbook_class()
    ob.submit("A1", 101.0, 50)
    ob.submit("A2", 101.5, 30)
    ob.submit("A3", 102.0, 20)
    
    print(ob)
    print(f"\nBest ask: ${ob.get_best_price()}")
    
    # Test cancel
    print("\nCancelling A1...")
    ob.cancel("A1")
    print(f"New best ask: ${ob.get_best_price()}")
    
    # Test duplicate
    result = ob.submit("A2", 99.0, 100)
    print(f"Duplicate order: {result} (should be False)\n")


In [60]:
def test_orderbook_latency(orderbook_class, name="OrderBook"):
    """Measure get_best_price() performance."""
    import time
    
    print(f"\n{'='*50}")
    print(f"Latency Test: {name}")
    print(f"{'='*50}\n")
    
    for n in [10, 50, 100, 500, 1000]:
        # Create orderbook with n orders
        ob = orderbook_class()
        for i in range(n):
            ob.submit(f"A{i}", 100.0 + i * 0.01, 10)
        
        # Time 1000 get_best_price() calls
        start = time.time()
        for _ in range(1000):
            ob.get_best_price()
        elapsed = time.time() - start
        
        avg_us = elapsed / 1000 * 1e6
        print(f"{n:4d} orders: {avg_us:6.2f} ¬µs/call")
    
    print()


In [61]:
# Try it with SuboptimalOrderBook
test_orderbook(SuboptimalOrderBook, "SuboptimalOrderBook")
print("\n\n")
test_orderbook_latency(SuboptimalOrderBook, "SuboptimalOrderBook") #click on scrollale element to see full output



Testing: SuboptimalOrderBook

ASK ORDERBOOK (Suboptimal)
ASK: $  101.00 | Qty: 50
ASK: $  101.50 | Qty: 30
ASK: $  102.00 | Qty: 20
Best Ask (Lowest): $101.00
Total orders: 3

Best ask: $101.0

Cancelling A1...
New best ask: $101.5
Duplicate order: False (should be False)





Latency Test: SuboptimalOrderBook

  10 orders:   0.43 ¬µs/call
  50 orders:   1.91 ¬µs/call
 100 orders:   3.78 ¬µs/call
 500 orders:  19.53 ¬µs/call
1000 orders:  38.43 ¬µs/call



## Discussion: How to Fix This? ü§î

### Problem 1: get_best_price() is O(N)
Currently scanning all N orders to find the minimum price.

**Q:** What data structure gives instant access to the minimum?

**A:** A **MIN HEAP**!
- Python's `heapq` is a min heap by default
- Store prices: `[101, 102, 103]`
- `heap[0]` always gives the lowest ‚Üí O(1)!
- No negation needed (unlike max heap for bids)

### Problem 2: cancel() is O(N)
Linear search to find order by ID.

**Q:** What gives O(1) lookup by key?

**A:** **Dictionary!**
```python
orders = {order_id: Order}
order = orders["A1"]  # O(1) lookup!
```

### Problem 3: submit() is O(N)
Must scan to check for duplicate ID.

**Q:** If we use dict, what's the new complexity?

**A:** **O(log P)**
- Check `if order_id in orders` ‚Üí O(1)
- Push to heap ‚Üí O(log P) where P = price levels
- Total: O(log P)

## Min Heap Explained üß†

### Python's heapq = Min Heap
```python
import heapq
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 3)
heapq.heappush(heap, 8)
# heap[0] = 3 (smallest) ‚úì
```

### Perfect for Asks (Lowest Price)
```python
# Ask prices: $101, $102, $103
ask_heap = []
heapq.heappush(ask_heap, 101.0)
heapq.heappush(ask_heap, 102.0)
heapq.heappush(ask_heap, 103.0)

# ask_heap = [101.0, 102.0, 103.0]
# Min heap automatically puts 101.0 at top

# To get best ask (lowest):
best_ask = ask_heap[0]  # Just peek: $101 ‚úì
```

### Visual:
```
Prices: [101, 102, 103]
         ‚Üì
Push to heap: [101, 102, 103]
         ‚Üì
heap[0] = 101  ‚Üê Lowest ask! ‚úì
```

**Result:** O(1) access to lowest ask! üéâ

**No negation, no tricks** - just use heap directly!

## Next Steps: Let's Optimize Together! üöÄ

### Target Structure:
```python
class OptimalOrderBook:
    def __init__(self):
        self.asks = {}         # {price: [orders at that price]}
        self.ask_heap = []     # [101, 102, 103] (regular heap!)
        self.orders = {}       # {order_id: Order}
```

### Implementation Plan:
1. **submit()**:
   - Check dict for duplicate: O(1)
   - Add to orders dict: O(1)
   - Push to heap: O(log P)
   - **Total: O(log P)**

2. **cancel()**:
   - Dict lookup: O(1)
   - Mark as cancelled: O(1)
   - Check if price level is empty: O(orders at price)
   - Rebuild heap if needed: O(P)
   - **Total: O(P) (could make this faster with lazy deletion)**

3. **get_best_price()**:
   - Peek at heap top: O(1)
   - No negation needed: O(1)
   - **Total: O(1)** ‚ö°

### Expected Performance:
- From **50+ microseconds** ‚Üí **<1 microsecond** üöÄ
- **50x+ speedup!**

### Key Advantages of Using Asks:
- ‚úì No negative price trick needed
- ‚úì Simpler to understand
- ‚úì Direct heap usage
- ‚úì Same O(1) performance!

**Now let's code the optimal version together or write pseudocode!**

In [62]:
class OptimizedOrderBook:
    """Optimized orderbook for asks using heap and dict."""
    
    def __init__(self):
        import heapq
        
        # Optimized data structures:
        self.orders = {}           # {order_id: Order} - O(1) lookup by ID
        self.asks = {}             # {price: [orders at that price]} - group orders by price
        self.ask_heap = []         # Min heap of prices - O(1) access to best ask
    
    
    def submit(self, order_id: str, price: float, quantity: int) -> bool:
        """
        Submit a new sell order (ask) - optimized version.
        Hint: useful operations: heapq.heappush(heap, price)
        Target: O(log P) 
        """
        # Placeholder for optimized implementation
        pass

    def cancel(self, order_id: str) -> bool:
        """
        Cancel a sell order by ID - optimized version.
        
        Target: O(P)
        """
        # Placeholder for optimized implementation
        pass
        
    def get_best_price(self) -> Optional[float]:
        """
        Get the lowest ask price - optimized version.
        
        Target: O(1)
        """
        # Placeholder for optimized implementation
        pass
    
    def __repr__(self):
        """Display the orderbook."""
        lines = ["=" * 60, "ASK ORDERBOOK (Optimized)", "=" * 60]
        
        # Show asks grouped by price (using self.asks dict)
        prices = {}
        for order_id, order in self.orders.items():
            if order.status == OrderStatus.OPEN:
                prices[order.price] = prices.get(order.price, 0) + order.quantity
        
        # Show asks (low to high)
        for price in sorted(prices.keys()):
            lines.append(f"ASK: ${price:>8.2f} | Qty: {prices[price]}")
        
        lines.append("=" * 60)
        best = self.get_best_price()
        if best:
            lines.append(f"Best Ask (Lowest): ${best:.2f}")
        lines.append(f"Total orders: {len([o for o in self.orders.values() if o.status == OrderStatus.OPEN])}")
        
        return "\n".join(lines)

print("‚úì Optimized OrderBook created!")
print("‚úì Try: ob = OptimizedOrderBook()")

‚úì Optimized OrderBook created!
‚úì Try: ob = OptimizedOrderBook()


In [63]:
test_orderbook(OptimizedOrderBook, "OptimizedOrderBook")
print("\n\n")
test_orderbook_latency(OptimizedOrderBook, "OptimizedOrderBook")


Testing: OptimizedOrderBook

ASK ORDERBOOK (Optimized)
Total orders: 0

Best ask: $None

Cancelling A1...
New best ask: $None
Duplicate order: None (should be False)





Latency Test: OptimizedOrderBook

  10 orders:   0.02 ¬µs/call
  50 orders:   0.02 ¬µs/call
 100 orders:   0.02 ¬µs/call
 500 orders:   0.02 ¬µs/call
1000 orders:   0.02 ¬µs/call

