# **Chapter 24: Edge Computing & IoT**

The Internet of Things (IoT) transforms everyday objects into data-generating endpoints—from smart thermostats and fitness trackers to industrial sensors and autonomous vehicles. Unlike traditional cloud computing where data travels to centralized data centers, edge computing processes data near its source. This chapter explores architectures for handling millions of devices, intermittent connectivity, and the unique challenges of distributed computing at the network edge.

---

## **24.1 Edge Architecture Patterns**

Edge computing sits between devices and the cloud, reducing latency, bandwidth costs, and dependency on constant connectivity. The architecture you choose depends on your latency requirements, compute constraints, and data privacy needs.

### **The Three-Tier Architecture**

Most IoT systems use a three-layer approach:

```
┌─────────────────────────────────────────────────────────────┐
│                    Cloud Layer                               │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ Data Lake│  │  ML      │  │ Business │  │ Device   │    │
│  │ (S3)     │  │ Training │  │ Logic    │  │ Management│   │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
└─────────────────────────────────────────────────────────────┘
                              ↑
                              │ Internet
┌─────────────────────────────────────────────────────────────┐
│                    Edge Layer (Fog)                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ Local    │  │ Stream   │  │ Device   │  │  Cache   │    │
│  │  DB      │  │ Processing│ │  Gateway │  │          │    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
│         Raspberry Pi / Industrial PC / AWS Greengrass       │
└─────────────────────────────────────────────────────────────┘
                              ↑
                              │ Local Network (WiFi/Bluetooth/LoRa)
┌─────────────────────────────────────────────────────────────┐
│                    Device Layer                              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ Sensors  │  │ Actuators│  │ Cameras  │  │  Micro-  │    │
│  │ (Temp)   │  │ (Valves) │  │          │  │ controllers│  │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
│              Arduino / ESP32 / Raspberry Pi Pico            │
└─────────────────────────────────────────────────────────────┘
```

**Data Flow**:
1. **Device Layer**: Collects raw data (temperature readings, motion detection)
2. **Edge Layer**: Aggregates, filters, and processes locally (average temperature over 5 minutes, trigger local alarms)
3. **Cloud Layer**: Long-term storage, machine learning, global dashboards

**Example: Smart Factory**
```
Machine Sensors (1000 devices):
  ↓ Every 100ms (10 readings/second per device)
Edge Gateway:
  ↓ Aggregate to 1 reading/second (90% data reduction)
  ↓ Detect anomalies locally (emergency stop in <10ms)
  ↓ Send hourly summaries to cloud
Cloud:
  ↓ Predictive maintenance ML models
  ↓ Global efficiency dashboards
```

### **Edge Computing Patterns**

**Pattern 1: Local Processing with Cloud Sync**
```python
# Edge device code (Raspberry Pi)
class LocalProcessor:
    def __init__(self):
        self.local_buffer = []
        self.last_cloud_sync = time.time()
        self.anomaly_model = load_tflite_model()  # Lightweight ML
    
    def process_sensor_data(self, reading):
        # Immediate local action (no cloud latency)
        if self.detect_anomaly(reading):
            self.trigger_alarm()  # < 10ms response
        
        # Buffer for batch upload
        self.local_buffer.append({
            'timestamp': time.time(),
            'value': reading,
            'device_id': self.device_id
        })
        
        # Sync to cloud every 5 minutes or when buffer full
        if (time.time() - self.last_cloud_sync > 300 or 
            len(self.local_buffer) > 1000):
            self.sync_to_cloud()
    
    def detect_anomaly(self, reading):
        # Run inference locally using TensorFlow Lite
        prediction = self.anomaly_model.predict([reading])
        return prediction[0] > 0.95  # 95% confidence threshold
    
    def sync_to_cloud(self):
        try:
            requests.post('https://api.cloud.com/metrics', 
                         json=self.local_buffer)
            self.local_buffer = []  # Clear on success
            self.last_cloud_sync = time.time()
        except requests.exceptions.RequestException:
            # Keep data if offline, retry later
            pass  # Buffer persists for next attempt
```

**Pattern 2: Function as a Service at the Edge**
AWS Greengrass and Azure IoT Edge allow you to deploy Lambda functions to edge devices:

```python
# AWS Lambda function deployed to edge gateway
import greengrasssdk
import json

client = greengrasssdk.client('iot-data')

def function_handler(event, context):
    """
    Triggered when temperature exceeds threshold
    Runs locally on edge device, not in AWS cloud
    """
    temperature = event['temperature']
    device_id = event['device_id']
    
    # Local decision (no internet required)
    if temperature > 100:  # Celsius
        # Trigger local actuator immediately
        client.publish(
            topic=f'actuators/{device_id}/cooling',
            payload=json.dumps({'action': 'activate', 'duration': 60})
        )
        
        # Also notify cloud for logging
        client.publish(
            topic='cloud/alerts',
            payload=json.dumps({
                'severity': 'high',
                'message': f'Overheating detected on {device_id}',
                'timestamp': context.timestamp
            })
        )
    
    return {'status': 'processed_locally'}
```

**Pattern 3: Stream Processing at the Edge**
Apache Flink and Kafka can run on edge devices to process data streams before sending to cloud:

```python
# Edge stream processing with Kafka Streams
from kafka import KafkaConsumer, KafkaProducer
import json

consumer = KafkaConsumer(
    'sensor-raw-data',
    bootstrap_servers=['localhost:9092'],  # Local edge broker
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

producer = KafkaProducer(
    bootstrap_servers=['cloud-kafka.example.com:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

# Process locally, send aggregates to cloud
windowed_data = {}

for message in consumer:
    sensor_id = message.value['sensor_id']
    timestamp = message.value['timestamp']
    value = message.value['value']
    
    # 1-minute tumbling window
    window_key = (sensor_id, timestamp // 60)
    
    if window_key not in windowed_data:
        windowed_data[window_key] = []
    
    windowed_data[window_key].append(value)
    
    # When window complete, compute statistics
    if len(windowed_data[window_key]) >= 60:  # 1 minute of data
        stats = {
            'sensor_id': sensor_id,
            'window_start': window_key[1] * 60,
            'avg': sum(windowed_data[window_key]) / 60,
            'max': max(windowed_data[window_key]),
            'min': min(windowed_data[window_key]),
            'sample_count': 60
        }
        
        # Send only summary to cloud (99% bandwidth reduction)
        producer.send('sensor-aggregates', stats)
        del windowed_data[window_key]
```

---

## **24.2 IoT Protocols: MQTT, CoAP, and LoRaWAN**

IoT devices have constrained resources (battery, bandwidth, CPU) and use specialized protocols optimized for these constraints.

### **MQTT: The Standard for IoT Messaging**

**MQTT (Message Queuing Telemetry Transport)** is a lightweight publish-subscribe protocol designed for unreliable networks and low-power devices.

**Key Concepts**:
- **Broker**: Central server that receives all messages and routes them to subscribers
- **Topic**: Hierarchical string (e.g., `factory/line1/machine3/temperature`)
- **QoS Levels**: Quality of Service guarantees
  - QoS 0: At most once (fire and forget)
  - QoS 1: At least once (acknowledged, may duplicate)
  - QoS 2: Exactly once (two-phase handshake, highest overhead)

**Topic Hierarchy Best Practices**:
```
Level 1: Building/Floor      factory/line1/
Level 2: Zone/Line                  machine3/
Level 3: Device                        sensor/
Level 4: Measurement                      temperature

Full topic: factory/line1/machine3/sensor/temperature
Wildcards:
  - factory/+/machine3/#  (All floors, machine3, all sensors)
  - factory/line1/+/+/temperature  (All machines on line1, temp only)
```

**Python Implementation** (Paho-MQTT):
```python
import paho.mqtt.client as mqtt
import json
import time

class IoTDevice:
    def __init__(self, device_id, broker_host):
        self.device_id = device_id
        self.client = mqtt.Client(client_id=device_id)
        self.client.on_connect = self.on_connect
        self.client.on_message = self.on_command
        
        # Enable TLS for security
        self.client.tls_set(ca_certs="ca.crt")
        self.client.username_pw_set("device", "password")
        
        self.client.connect(broker_host, 8883, 60)
    
    def on_connect(self, client, userdata, flags, rc):
        print(f"Connected with result code {rc}")
        # Subscribe to commands for this specific device
        client.subscribe(f"commands/{self.device_id}/#")
    
    def on_command(self, client, userdata, msg):
        """Handle incoming commands from cloud"""
        payload = json.loads(msg.payload)
        command = payload.get('command')
        
        if command == 'reboot':
            self.reboot()
        elif command == 'update_config':
            self.update_config(payload['config'])
        elif command == 'calibrate':
            self.calibrate_sensor()
    
    def publish_telemetry(self, temperature, humidity):
        payload = {
            'device_id': self.device_id,
            'timestamp': time.time(),
            'temperature': temperature,
            'humidity': humidity,
            'battery': self.get_battery_level()
        }
        
        # QoS 1: Ensure delivery, allow duplicates
        self.client.publish(
            f"telemetry/{self.device_id}",
            json.dumps(payload),
            qos=1,
            retain=False  # Don't retain last message
        )
    
    def run(self):
        self.client.loop_start()
        while True:
            temp, hum = self.read_sensors()
            self.publish_telemetry(temp, hum)
            time.sleep(60)  # Send every minute

# Usage
device = IoTDevice("sensor_001", "mqtt.example.com")
device.run()
```

**MQTT Broker Comparison**:
```
Broker        Best For                  Max Connections  Performance
─────────────────────────────────────────────────────────────────────
Mosquitto     Small deployments         10,000+          100k msg/sec
EMQ X         Enterprise/Cloud          1M+              5M msg/sec
HiveMQ        Enterprise with SLA       10M+             10M msg/sec
AWS IoT Core  AWS integration           Unlimited        Pay per message
Azure IoT Hub Azure integration         Unlimited        Pay per message
```

### **CoAP: REST for IoT**

**CoAP (Constrained Application Protocol)** is like HTTP but optimized for UDP and constrained devices (sensors with 100KB RAM).

**Comparison with HTTP**:
```
HTTP:  GET /api/temperature HTTP/1.1
        Host: sensor.example.com
        Accept: application/json
        [Headers: ~200 bytes]
        
CoAP:  CON GET /temperature
        [4 bytes header + 1 byte token]
        
Overhead reduction: 98%
```

**CoAP Features**:
- **UDP based**: No connection overhead (unlike TCP)
- **Observe pattern**: Subscribe to resources (like MQTT topics)
- **Block-wise transfer**: Send large data in chunks for memory-constrained devices

**Python Example** (Aiocoap):
```python
import asyncio
from aiocoap import Context, Message
from aiocoap.resource import Resource, Site

class TemperatureResource(Resource):
    async def render_get(self, request):
        temp = read_temperature_sensor()
        payload = json.dumps({'temperature': temp}).encode('utf-8')
        
        return Message(
            payload=payload,
            content_format=50  # application/json
        )
    
    async def render_put(self, request):
        # Update configuration
        config = json.loads(request.payload)
        update_device_config(config)
        return Message(code=CHANGED)

# Server setup
root = Site()
root.add_resource(['temperature'], TemperatureResource())

asyncio.Task(Context.create_server_context(root))
asyncio.get_event_loop().run_forever()

# Client usage
async def get_temperature():
    protocol = await Context.create_client_context()
    request = Message(code=GET, uri='coap://sensor.local/temperature')
    response = await protocol.request(request).response
    return json.loads(response.payload)
```

### **LoRaWAN: Long Range, Low Power**

For devices kilometers away from infrastructure (agriculture sensors, smart meters), LoRaWAN provides:
- **Range**: 2-15 km (urban to rural)
- **Power**: 10-year battery life
- **Data rate**: 0.3-50 kbps (not for video, perfect for telemetry)

**Architecture**:
```
End Device (Sensor) → Gateway → Network Server → Application Server
     ↑                      ↓           ↓                ↓
  Sends every          Receives all  Manages        Your code
  10 minutes           local traffic join process   (AWS IoT)
  (low power)          forwards to                  processes data
                       internet
```

**Duty Cycle Limitations**:
LoRaWAN regulations limit how often devices can transmit (1% duty cycle in Europe). If your message takes 1 second to send, you must wait 99 seconds before sending again.

**Adaptive Data Rate (ADR)**:
```python
class LoRaDevice:
    def __init__(self):
        self.spreading_factor = 7  # 7-12 (higher = longer range, slower)
        self.tx_power = 14         # dBm
    
    def optimize_for_conditions(self, snr):
        """
        ADR: Adjust data rate based on signal quality
        High SNR (good signal): Lower spreading factor (faster, less power)
        Low SNR (weak signal): Higher spreading factor (slower, more range)
        """
        if snr > 10:
            self.spreading_factor = max(7, self.spreading_factor - 1)
        elif snr < -10:
            self.spreading_factor = min(12, self.spreading_factor + 1)
        
        # Higher spreading factor = lower data rate
        # SF7: ~5.5 kbps, SF12: ~0.25 kbps
```

---

## **24.3 Time-Series Databases**

IoT devices generate time-stamped data: temperature every minute, vibration every millisecond, GPS every second. Traditional relational databases struggle with high ingestion rates and time-based queries. Time-series databases (TSDB) are optimized for this workload.

### **InfluxDB: The Purpose-Built TSDB**

**Data Model**:
```
Measurement: temperature
Tags (indexed): location=factory1, machine=compressor_a, sensor_id=temp_01
Fields (data): value=75.5, unit=celsius
Timestamp: 2024-01-15T10:30:00Z
```

**Key Optimizations**:
1. **High write throughput**: 1M+ points/second on single node
2. **Compression**: 10:1 ratio vs. CSV (gorilla compression for floats)
3. **Retention policies**: Auto-delete old data or downsample

**Code Example**:
```python
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

client = InfluxDBClient(
    url="http://localhost:8086",
    token="my-token",
    org="my-org"
)

write_api = client.write_api(write_options=SYNCHRONOUS)

# Write sensor data
point = Point("temperature") \
    .tag("location", "factory1") \
    .tag("machine", "compressor_a") \
    .field("value", 75.5) \
    .field("unit", "celsius") \
    .time(datetime.utcnow())

write_api.write(bucket="sensors", record=point)

# Query with time range and aggregation
query_api = client.query_api()
query = '''
    from(bucket: "sensors")
        |> range(start: -1h)                    // Last hour
        |> filter(fn: (r) => r._measurement == "temperature")
        |> filter(fn: (r) => r.machine == "compressor_a")
        |> aggregateWindow(every: 5m, fn: mean) // 5-minute averages
        |> yield(name: "mean")
'''

tables = query_api.query(query)
for table in tables:
    for record in table.records:
        print(f"Time: {record.get_time()}, Temp: {record.get_value()}")

# Continuous Query: Downsample raw data to hourly averages
downsample_query = '''
    option task = {
        name: "downsample_temperature",
        every: 1h,
    }

    from(bucket: "sensors_raw")
        |> range(start: -task.every)
        |> filter(fn: (r) => r._measurement == "temperature")
        |> aggregateWindow(every: 1h, fn: mean)
        |> to(bucket: "sensors_hourly")
'''
```

### **TimescaleDB: PostgreSQL for Time-Series**

If you already use PostgreSQL, TimescaleDB adds time-series capabilities as an extension.

**Hypertables**: Automatically partition data by time:
```sql
-- Convert regular table to hypertable
CREATE TABLE sensor_data (
    time TIMESTAMPTZ NOT NULL,
    device_id TEXT,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION
);

-- Partition by time (chunks of 1 day)
SELECT create_hypertable('sensor_data', 'time', chunk_time_interval => INTERVAL '1 day');

-- Automatic partitioning:
-- Chunk 1: Jan 1-2, Chunk 2: Jan 2-3, etc.
-- Old chunks auto-compressed, recent chunks fast for writes
```

**Continuous Aggregation** (like materialized views for time-series):
```sql
-- Create 1-hour rollups for dashboard queries
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS bucket,
    device_id,
    AVG(temperature) as avg_temp,
    MAX(temperature) as max_temp,
    MIN(temperature) as min_temp,
    COUNT(*) as sample_count
FROM sensor_data
GROUP BY bucket, device_id;

-- Auto-refresh every hour
SELECT add_continuous_aggregate_policy('sensor_hourly',
    start_offset => INTERVAL '1 month',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour'
);
```

**Query Performance**:
```sql
-- Fast time-range queries (uses chunk exclusion)
SELECT * FROM sensor_data 
WHERE time > NOW() - INTERVAL '1 hour'
  AND device_id = 'sensor_001';

-- EXPLAIN shows only recent chunks scanned, not entire table
```

### **Choosing Your TSDB**

```
Database       Best For                SQL Support  Scaling        Compression
─────────────────────────────────────────────────────────────────────────────────
InfluxDB       Pure IoT/Monitoring     Flux (custom)  Clustering      High
TimescaleDB    Existing Postgres       Full SQL       Partitioning    Medium
Prometheus     Metrics/Monitoring      PromQL         Federation      High
OpenTSDB       Hadoop ecosystem        Yes            HBase-based     Medium
QuestDB        Fast SQL analytics      Full SQL       Horizontal      Medium
```

---

## **24.4 Offline-First Architecture and Data Synchronization**

IoT devices lose connectivity. Ships at sea, sensors in tunnels, devices in rural areas—they must continue working offline and sync when connection returns.

### **The CAP Theorem at the Edge**

Like distributed systems, edge devices face CAP constraints:
- **Consistency**: All copies of data are the same
- **Availability**: Device works offline
- **Partition tolerance**: Network failures inevitable

**Edge Choice**: Prioritize **Availability** (offline operation) and **Partition tolerance**, accept **Eventual Consistency**.

### **Conflict-Free Replicated Data Types (CRDTs)**

CRDTs are data structures that can be modified independently on different devices and merged automatically without conflicts.

**Example: G-Counter (Grow-only Counter)**
```python
class GCounter:
    """Increment-only counter that merges correctly"""
    def __init__(self, device_id):
        self.device_id = device_id
        self.payload = {device_id: 0}  # Each device tracks own count
    
    def increment(self):
        self.payload[self.device_id] += 1
    
    def value(self):
        return sum(self.payload.values())
    
    def merge(self, other):
        """Merge another device's counter"""
        for device, count in other.payload.items():
            self.payload[device] = max(
                self.payload.get(device, 0),
                count
            )

# Device A counts 5 events offline
device_a = GCounter("A")
for _ in range(5):
    device_a.increment()

# Device B counts 3 events offline  
device_b = GCounter("B")
for _ in range(3):
    device_b.increment()

# Later, they sync
device_a.merge(device_b)
print(device_a.value())  # 8 (correct, no conflicts)
```

**Example: LWW-Element-Set (Last-Write-Wins Set)**
```python
class LWWSet:
    """Set with add/remove timestamps. Last operation wins."""
    def __init__(self):
        self.add_set = {}  # {element: timestamp}
        self.remove_set = {}  # {element: timestamp}
    
    def add(self, element, timestamp):
        self.add_set[element] = max(
            self.add_set.get(element, 0),
            timestamp
        )
    
    def remove(self, element, timestamp):
        self.remove_set[element] = max(
            self.remove_set.get(element, 0),
            timestamp
        )
    
    def contains(self, element):
        add_time = self.add_set.get(element, 0)
        remove_time = self.remove_set.get(element, 0)
        return add_time > remove_time  # Added after last removal
    
    def merge(self, other):
        for elem, ts in other.add_set.items():
            self.add(elem, ts)
        for elem, ts in other.remove_set.items():
            self.remove(elem, ts)
```

### **Delta Sync Protocols**

Instead of sending entire datasets, send only changes (deltas).

**Operational Transformation (like Google Docs)**:
```python
class DocumentSync:
    def __init__(self):
        self.operations = []  # Log of all changes
        self.version = 0
    
    def local_change(self, operation):
        """User makes a change offline"""
        operation['timestamp'] = time.time()
        operation['version'] = self.version
        self.operations.append(operation)
        self.apply_operation(operation)
        self.version += 1
    
    def sync_with_server(self):
        """When online, send pending operations"""
        unsynced = [op for op in self.operations if not op.get('synced')]
        
        if not unsynced:
            return
        
        try:
            response = requests.post('/api/sync', json={
                'device_id': self.device_id,
                'base_version': self.last_synced_version,
                'operations': unsynced
            })
            
            server_operations = response.json()['operations']
            
            # Transform server operations against local pending ones
            for server_op in server_operations:
                self.transform_against_pending(server_op)
                self.apply_operation(server_op)
            
            # Mark local as synced
            for op in unsynced:
                op['synced'] = True
            
            self.last_synced_version = response.json()['new_version']
            
        except requests.exceptions.RequestException:
            # Stay offline, retry later
            pass
    
    def transform_against_pending(self, server_op):
        """Adjust server operation to account for local pending changes"""
        # If server says "insert 'x' at position 5" 
        # but locally we inserted 2 chars at position 3
        # Transform to "insert 'x' at position 7"
        for local_op in self.pending_operations():
            server_op = transform(server_op, local_op)
        return server_op
```

### **Store-and-Forward Queues**

For devices that are intermittently connected (delivery drones, agricultural sensors):

```python
import sqlite3
import json

class PersistentQueue:
    """SQLite-backed queue for offline message buffering"""
    def __init__(self, db_path):
        self.conn = sqlite3.connect(db_path)
        self._create_table()
    
    def _create_table(self):
        self.conn.execute('''
            CREATE TABLE IF NOT EXISTS messages (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                topic TEXT NOT NULL,
                payload TEXT NOT NULL,
                qos INTEGER DEFAULT 1,
                timestamp REAL NOT NULL,
                retry_count INTEGER DEFAULT 0
            )
        ''')
        self.conn.commit()
    
    def enqueue(self, topic, payload, qos=1):
        """Store message for later transmission"""
        self.conn.execute(
            'INSERT INTO messages (topic, payload, qos, timestamp) VALUES (?, ?, ?, ?)',
            (topic, json.dumps(payload), qos, time.time())
        )
        self.conn.commit()
    
    def dequeue_for_sync(self, batch_size=100):
        """Get messages to send when online"""
        cursor = self.conn.execute(
            'SELECT id, topic, payload, qos FROM messages ORDER BY timestamp LIMIT ?',
            (batch_size,)
        )
        return cursor.fetchall()
    
    def ack(self, message_id):
        """Remove successfully sent message"""
        self.conn.execute('DELETE FROM messages WHERE id = ?', (message_id,))
        self.conn.commit()
    
    def retry_later(self, message_id):
        """Increment retry count, will be retried on next sync"""
        self.conn.execute(
            'UPDATE messages SET retry_count = retry_count + 1 WHERE id = ?',
            (message_id,)
        )
        self.conn.commit()
    
    def size(self):
        """Number of pending messages"""
        cursor = self.conn.execute('SELECT COUNT(*) FROM messages')
        return cursor.fetchone()[0]

# Usage in device
queue = PersistentQueue('/data/messages.db')

def publish_sensor_data(data):
    if is_online():
        mqtt_client.publish('sensors/data', data)
    else:
        queue.enqueue('sensors/data', data)
        print(f"Buffered message. Queue size: {queue.size()}")

def sync_when_online():
    if not is_online():
        return
    
    messages = queue.dequeue_for_sync()
    for msg_id, topic, payload, qos in messages:
        try:
            mqtt_client.publish(topic, json.loads(payload), qos=qos)
            queue.ack(msg_id)
        except Exception as e:
            print(f"Failed to send {msg_id}, will retry")
            queue.retry_later(msg_id)
```

---

## **24.5 Security at the Edge**

Edge devices are physically accessible and often deployed in insecure locations. Security requires defense in depth.

### **Device Identity and Attestation**

Each device needs a unique identity, established during manufacturing:

```python
import hashlib
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import rsa, padding

class SecureDevice:
    def __init__(self):
        # Generated during manufacturing, stored in secure element (TPM)
        self.private_key = load_from_secure_element()
        self.device_cert = load_certificate()
        self.ca_cert = load_ca_certificate()
    
    def generate_session_token(self, challenge):
        """Prove identity to server using private key"""
        signature = self.private_key.sign(
            challenge.encode(),
            padding.PSS(mgf=padding.MGF1(hashes.SHA256()), salt_length=padding.PSS.MAX_LENGTH),
            hashes.SHA256()
        )
        return {
            'device_id': self.get_device_id(),
            'certificate': self.device_cert,
            'signature': signature.hex()
        }
    
    def verify_server(self, server_cert):
        """Verify server certificate against CA"""
        try:
            server_cert.verify_directly_issued_by(self.ca_cert)
            return True
        except Exception:
            return False  # Possible man-in-the-middle attack
```

### **Over-the-Air (OTA) Updates**

Secure firmware updates are critical but dangerous—bricked devices in remote locations are expensive to fix.

**Safe Update Process**:
```python
class OTAUpdater:
    def __init__(self):
        self.current_version = read_current_version()
        self.update_partition = '/dev/mmcblk0p2'  # A/B partitioning
    
    def check_for_update(self):
        """Poll for updates when bandwidth available"""
        try:
            metadata = requests.get('https://updates.example.com/latest').json()
            
            if metadata['version'] > self.current_version:
                self.download_update(metadata)
        except Exception as e:
            logger.error(f"Update check failed: {e}")
    
    def download_update(self, metadata):
        """Download to inactive partition"""
        # Verify signature before writing
        firmware = download(metadata['url'])
        
        expected_hash = metadata['sha256']
        actual_hash = hashlib.sha256(firmware).hexdigest()
        
        if expected_hash != actual_hash:
            raise SecurityError("Firmware integrity check failed")
        
        # Write to inactive partition (B while A is running)
        write_to_partition(firmware, self.update_partition)
        mark_partition_bootable(self.update_partition)
        
        # Schedule reboot (during low-activity window)
        schedule_reboot()
    
    def rollback_on_failure(self):
        """If new firmware crashes, automatic rollback"""
        if boot_count_since_update() > 3 and not health_check_passed():
            # Mark current partition bad, revert to previous
            mark_partition_bad(current_partition())
            switch_to_previous_partition()
            reboot()
```

---

## **24.6 Key Takeaways**

1. **Edge computing reduces latency and bandwidth**: Process data locally, send only summaries to cloud. Critical for real-time control systems.

2. **MQTT is the lingua franca of IoT**: Lightweight pub-sub with QoS levels for unreliable networks. Use topic hierarchies for clean architecture.

3. **Time-series databases handle high ingestion**: InfluxDB and TimescaleDB optimize for time-range queries and high write throughput with compression.

4. **Design for offline operation**: CRDTs and delta sync allow devices to work disconnected and merge changes seamlessly when reconnected.

5. **Security starts at manufacturing**: Device identity in hardware secure elements, signed firmware updates, and mutual TLS prevent physical and network attacks.

6. **Protocol choice depends on constraints**: MQTT for reliable delivery, CoAP for ultra-low power, LoRaWAN for long-range low-bandwidth scenarios.

---

## **Chapter Summary**

This chapter explored the unique challenges of edge computing and IoT. We architected three-tier systems that process data locally while maintaining cloud visibility, implemented MQTT and CoAP for device communication, and optimized storage with time-series databases. We solved the offline synchronization problem using CRDTs and persistent queues, and established security practices for physically vulnerable devices.

The edge is the new frontier of computing—billions of devices generating exabytes of data. Success requires respecting constraints (power, bandwidth, connectivity) while maintaining system reliability and security.

**Coming up next**: In Chapter 25, we'll cover the final section—The System Design Interview—focusing on interview strategy, communication techniques, and mock interview walkthroughs to help you demonstrate these architectural skills in high-pressure interview settings.

---

## **Exercises**

1. **Edge Architecture Design**: Design a system for smart street lights:
   - 10,000 lights across a city
   - Must respond to motion sensors in <50ms (local processing)
   - Report energy usage hourly to cloud
   - Continue operating if internet down
   - Draw the architecture diagram and specify protocols (MQTT topics, database choice)

2. **MQTT Topic Design**: Create a topic hierarchy for a multi-tenant building automation system with:
   - 50 buildings
   - 20 floors per building
   - 100 rooms per floor
   - 5 sensors per room (temp, humidity, occupancy, light, CO2)
   Design topics that allow:
   - Subscribing to all sensors in one room
   - Subscribing to all temperature sensors in one building
   - Commands to specific actuators (HVAC, lights)

3. **Time-Series Optimization**: You have 1 million sensors sending data every 10 seconds. Calculate:
   - Daily data points (86.4 billion)
   - Storage required for 1 year at 100 bytes per point (raw)
   - Storage with InfluxDB compression (10:1 ratio)
   - Cost difference between keeping 1 year raw vs. 1 week raw + 1 year downsampled (hourly averages only)

4. **CRDT Implementation**: Implement a PN-Counter (Positive-Negative Counter) that supports both increments and decrements, merging correctly across devices. Demonstrate with Device A incrementing 5 times and decrementing 2, while Device B increments 3 times and decrements 1.

5. **Offline Sync Strategy**: Design a synchronization protocol for a smart inventory system where warehouse scanners operate offline in the basement (no WiFi) and sync when brought upstairs. Handle conflicts where two scanners update the same item's count while offline.

---


<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='23. ai_ml_system_design.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='../9. The_system_design_interview/25. interview_strategy_and_communication.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
