# Chapter 82: Event-Driven Architecture

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the principles of event‑driven architecture (EDA) and how it complements microservices.
- Distinguish between event notifications, event‑carried state transfer, and event sourcing.
- Design events and topics for a time‑series prediction system (e.g., NEPSE data updates, feature changes, model retraining triggers).
- Implement event producers and consumers using Apache Kafka.
- Apply stream processing to transform event streams in real time.
- Understand the concepts of event sourcing and Command Query Responsibility Segregation (CQRS).
- Handle event ordering, idempotency, and exactly‑once processing.
- Integrate event‑driven patterns with the monitoring and alerting system from Chapter 73.
- Evaluate the benefits and challenges of event‑driven architectures for time‑series systems.

---

## **82.1 Introduction to Event‑Driven Architecture**

Event‑driven architecture (EDA) is a software architecture pattern in which components communicate by producing and consuming events. An event is a significant change in state – for example, “new NEPSE data arrived”, “feature vector computed”, “model retrained”. Events are captured as messages and published to an event broker (like Apache Kafka, RabbitMQ, or AWS Kinesis). Other services subscribe to relevant events and react accordingly.

EDA is a natural fit for time‑series prediction systems because:

- Data arrives continuously (e.g., daily CSV, real‑time sensor readings).
- Multiple downstream processes depend on the same data (feature engineering, monitoring, alerting).
- Different components have different latency requirements (e.g., real‑time predictions vs. batch model retraining).
- It decouples producers and consumers, making the system more scalable and resilient.

In the context of our NEPSE system, we can use events to:

- Notify the feature service when new raw data is available.
- Trigger model retraining when enough new data has accumulated.
- Send predictions to a monitoring service for drift detection.
- Alert operators when anomalies are detected (as in Chapter 73).

In this chapter, we will design an event‑driven version of the NEPSE system, building on the microservices introduced in Chapter 81.

---

## **82.2 Core Concepts of Event‑Driven Architecture**

### **82.2.1 Events**
An event is a record of something that happened. It typically contains:

- **Event type** (e.g., `DataIngested`, `FeaturesComputed`, `PredictionMade`).
- **Timestamp** (when the event occurred).
- **Payload** (data relevant to the event, e.g., symbol, date, values).
- **Metadata** (e.g., version, producer ID).

Events are immutable facts. Once recorded, they should not be changed.

### **82.2.2 Event Broker**
The event broker is the backbone of EDA. It receives events from producers, stores them durably, and makes them available to consumers. Apache Kafka is a popular choice because it provides:

- High throughput, low latency.
- Persistent storage (events can be replayed).
- Partitioning for scalability.
- Exactly‑once semantics (with appropriate configuration).

### **82.2.3 Producers and Consumers**
- **Producer**: A service that creates events and publishes them to the broker.
- **Consumer**: A service that subscribes to certain event types and processes them.

A service can be both a producer and a consumer (e.g., the feature service consumes `DataIngested` events and produces `FeaturesComputed` events).

### **82.2.4 Topics and Partitions**
Events are organised into **topics** (e.g., `raw-data`, `features`, `predictions`). Topics are divided into **partitions** to allow parallel processing. Events with the same key (e.g., symbol) are sent to the same partition, preserving order.

### **82.2.5 Event Patterns**
There are several common patterns:

- **Event Notification**: A simple signal that something happened; the consumer may need to fetch more data. (e.g., “new data available” – consumer then calls an API to get it).
- **Event‑Carried State Transfer**: The event contains all the data needed; the consumer can update its own state without further requests. (e.g., “feature vector computed” with the vector included).
- **Event Sourcing**: All changes to application state are stored as a sequence of events. The current state can be reconstructed by replaying events.

For the NEPSE system, we will use a mix: event‑carried state transfer for features (so prediction service doesn’t need to recompute), and event notification for retraining triggers.

---

## **82.3 Designing Events for the NEPSE System**

Let's define the key events in our system.

### **82.3.1 `DataIngested`**
Produced by the Data Ingestion Service when new raw data is available.

```json
{
  "event_type": "DataIngested",
  "timestamp": "2023-06-01T12:00:00Z",
  "producer": "ingestion-service",
  "payload": {
    "symbol": "NEPSE",
    "date": "2023-06-01",
    "data_location": "s3://nepse-raw/2023-06-01.parquet",
    "row_count": 350
  }
}
```

### **82.3.2 `FeaturesComputed`**
Produced by the Feature Service after computing features for a given symbol and date. Contains the full feature vector (event‑carried state transfer).

```json
{
  "event_type": "FeaturesComputed",
  "timestamp": "2023-06-01T12:05:00Z",
  "producer": "feature-service",
  "payload": {
    "symbol": "NEPSE",
    "date": "2023-06-01",
    "features": {
      "Close_Lag_1": 1250.5,
      "SMA_20": 1245.2,
      "RSI": 62.3,
      "Volume_Z_Score": 1.2,
      ...
    }
  }
}
```

### **82.3.3 `PredictionMade`**
Produced by the Prediction Service each time a prediction is made. Used for monitoring and drift detection.

```json
{
  "event_type": "PredictionMade",
  "timestamp": "2023-06-01T12:10:00Z",
  "producer": "prediction-service",
  "payload": {
    "symbol": "NEPSE",
    "date": "2023-06-01",
    "predicted_close": 1275.3,
    "model_version": "v2.3.1",
    "features_used": ["Close_Lag_1", "SMA_20", "RSI", ...]
  }
}
```

### **82.3.4 `ActualAvailable`**
Produced later when the actual closing price becomes known (e.g., next day). Used to compute error.

```json
{
  "event_type": "ActualAvailable",
  "timestamp": "2023-06-02T12:00:00Z",
  "producer": "ingestion-service",
  "payload": {
    "symbol": "NEPSE",
    "date": "2023-06-01",
    "actual_close": 1260.2
  }
}
```

### **82.3.5 `ModelRetrainingTriggered`**
Produced by a scheduler or a monitoring service when conditions for retraining are met (e.g., after 100 new data points).

```json
{
  "event_type": "ModelRetrainingTriggered",
  "timestamp": "2023-06-01T12:00:00Z",
  "producer": "retraining-scheduler",
  "payload": {
    "reason": "weekly_schedule",
    "data_end_date": "2023-06-01"
  }
}
```

### **82.3.6 `ModelTrained`**
Produced by the Training Service after a new model is trained and registered.

```json
{
  "event_type": "ModelTrained",
  "timestamp": "2023-06-01T14:00:00Z",
  "producer": "training-service",
  "payload": {
    "model_version": "v2.4.0",
    "performance": {"mae": 12.3, "rmse": 18.7},
    "features_used": ["Close_Lag_1", "SMA_20", "RSI", "Volume_Z_Score"],
    "artifact_location": "s3://nepse-models/v2.4.0/model.pkl"
  }
}
```

---

## **82.4 Implementing an Event Broker with Apache Kafka**

We'll use Apache Kafka as the event broker. For local development, we can run Kafka using Docker Compose. Here's a minimal `docker-compose.yml`:

```yaml
version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - 2181:2181

  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    ports:
      - 9092:9092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
```

Start with `docker-compose up -d`.

### **82.4.1 Producing Events in Python**

We'll use the `kafka-python` library. Install with `pip install kafka-python`.

```python
from kafka import KafkaProducer
import json
import time

class EventProducer:
    def __init__(self, bootstrap_servers='localhost:9092'):
        self.producer = KafkaProducer(
            bootstrap_servers=bootstrap_servers,
            value_serializer=lambda v: json.dumps(v).encode('utf-8'),
            key_serializer=lambda k: k.encode('utf-8') if k else None
        )
    
    def publish(self, topic, event, key=None):
        """
        Publish an event to a topic.
        key is optional; if provided, ensures events with same key go to same partition.
        """
        future = self.producer.send(topic, value=event, key=key)
        result = future.get(timeout=10)  # wait for acknowledgement
        return result

# Example usage
producer = EventProducer()
event = {
    "event_type": "DataIngested",
    "timestamp": "2023-06-01T12:00:00Z",
    "producer": "ingestion-service",
    "payload": {
        "symbol": "NEPSE",
        "date": "2023-06-01",
        "data_location": "s3://nepse-raw/2023-06-01.parquet"
    }
}
producer.publish("raw-data", event, key="NEPSE")
```

### **82.4.2 Consuming Events**

```python
from kafka import KafkaConsumer
import json

class EventConsumer:
    def __init__(self, topics, bootstrap_servers='localhost:9092', group_id='my-group'):
        self.consumer = KafkaConsumer(
            *topics,
            bootstrap_servers=bootstrap_servers,
            group_id=group_id,
            value_deserializer=lambda m: json.loads(m.decode('utf-8')),
            key_deserializer=lambda m: m.decode('utf-8') if m else None,
            auto_offset_reset='earliest',  # start from beginning if no offset
            enable_auto_commit=True
        )
    
    def consume(self, handler):
        for message in self.consumer:
            handler(message.topic, message.key, message.value)

# Example handler
def handle_event(topic, key, event):
    print(f"Received event on {topic}: key={key}, type={event['event_type']}")
    # Process event...

consumer = EventConsumer(topics=['raw-data', 'features'])
consumer.consume(handle_event)
```

---

## **82.5 Stream Processing**

Sometimes we need to transform or enrich event streams in real time. This is **stream processing**. For example, we might want to compute a running average of prediction errors from `PredictionMade` and `ActualAvailable` events.

Apache Kafka provides **Kafka Streams** (Java), but for Python we can use **Faust** or a simpler approach with consumers and producers.

### **82.5.1 Example: Computing Prediction Error Stream**

We'll create a service that consumes `PredictionMade` and `ActualAvailable` events, joins them by symbol and date, and produces an `ErrorComputed` event.

```python
# error_computation_service.py
from kafka import KafkaConsumer, KafkaProducer
import json
import time

consumer = KafkaConsumer(
    'predictions', 'actuals',
    bootstrap_servers='localhost:9092',
    group_id='error-computation',
    value_deserializer=lambda m: json.loads(m.decode('utf-8')),
    enable_auto_commit=False
)

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

# Store predictions and actuals in a simple dictionary (in production, use a state store)
predictions = {}
actuals = {}

for msg in consumer:
    topic = msg.topic
    event = msg.value
    key = msg.key.decode() if msg.key else None
    payload = event['payload']
    symbol = payload.get('symbol')
    date = payload.get('date')
    if topic == 'predictions':
        predictions[(symbol, date)] = payload['predicted_close']
    elif topic == 'actuals':
        actuals[(symbol, date)] = payload['actual_close']
    
    # If we have both for a given key, compute error
    if (symbol, date) in predictions and (symbol, date) in actuals:
        pred = predictions[(symbol, date)]
        actual = actuals[(symbol, date)]
        error = pred - actual
        abs_error = abs(error)
        pct_error = (error / actual) * 100 if actual != 0 else None
        
        error_event = {
            'event_type': 'ErrorComputed',
            'timestamp': time.time(),
            'producer': 'error-computation',
            'payload': {
                'symbol': symbol,
                'date': date,
                'predicted': pred,
                'actual': actual,
                'error': error,
                'abs_error': abs_error,
                'pct_error': pct_error
            }
        }
        producer.send('errors', value=error_event, key=key)
        # Remove processed items (optional)
        del predictions[(symbol, date)]
        del actuals[(symbol, date)]
    
    consumer.commit()
```

**Explanation:**

- This service consumes from two topics, `predictions` and `actuals`.
- It stores each event in a local dictionary (in a real system, you'd use a persistent state store like RocksDB to handle restarts).
- When both a prediction and actual are available for the same (symbol, date), it computes the error and publishes an `ErrorComputed` event.
- This event can then be consumed by the monitoring service to track model performance and trigger alerts.

---

## **82.6 Event Sourcing and CQRS**

**Event Sourcing** is a pattern where state changes are stored as a sequence of events. Instead of storing the current state, you store all events, and the current state is derived by replaying them. This provides a complete audit log and allows rebuilding state at any point in time.

**CQRS (Command Query Responsibility Segregation)** separates the write side (commands) from the read side (queries). Often used with event sourcing: commands produce events, and the read side builds projections from those events.

For the NEPSE system, we might use event sourcing for the feature store. Instead of storing the latest feature vector, we store every `FeaturesComputed` event. When we need the current features, we replay the events for that symbol to get the latest. This adds complexity but gives a complete history.

A simpler approach is to use **event‑carried state transfer** (as we did with `FeaturesComputed`) and store the latest state in a database. This is more practical for most use cases.

---

## **82.7 Handling Event Ordering, Idempotency, and Exactly‑Once Processing**

### **82.7.1 Ordering**
Kafka preserves order **within a partition** if you use the same key. For example, if we set the key to `symbol`, all events for NEPSE will go to the same partition, and consumers will see them in order. This is important for stateful operations (e.g., computing features from raw data, where order matters).

### **82.7.2 Idempotency**
Consumers may receive the same event multiple times (e.g., after a rebalance). Processing must be idempotent: applying the same event twice should have the same effect as applying it once. For example, when storing a feature vector, use a upsert operation keyed by (symbol, date). If the event is replayed, it simply overwrites the existing value.

### **82.7.3 Exactly‑Once Processing**
Exactly‑once semantics guarantee that each event is processed exactly once, even in the face of failures. Kafka supports exactly‑once semantics for streams, but it requires careful configuration (enable.idempotence, transactional.id). For many applications, at‑least‑once with idempotent processing is sufficient.

---

## **82.8 Integrating with the Alerting System (Chapter 73)**

Our event‑driven system can feed directly into the alerting framework. For example, the monitoring service can consume `ErrorComputed` events and, if the error exceeds a threshold, trigger an alert via the `AlertManager`.

```python
# monitoring_service.py (partial)
from alerting import AlertManager, AlertRule

alert_manager = AlertManager()
# ... register channels and rules ...

def handle_error_event(event):
    payload = event['payload']
    abs_error = payload['abs_error']
    symbol = payload['symbol']
    if abs_error > 50:  # threshold
        alert_manager.process_row({
            'symbol': symbol,
            'abs_error': abs_error,
            'timestamp': event['timestamp']
        })

# In the consumer loop, after receiving an ErrorComputed event
handle_error_event(event)
```

---

## **82.9 Case Study: Event‑Driven NEPSE System**

Let's sketch the complete event flow for the NEPSE system:

1. **Data Ingestion Service** reads new CSV and publishes `DataIngested` events to `raw-data` topic.
2. **Feature Service** consumes `raw-data`, computes features, and publishes `FeaturesComputed` to `features` topic.
3. **Prediction Service** consumes `features` (event‑carried state transfer), retrieves the current model from registry, predicts, and publishes `PredictionMade` to `predictions` topic.
4. **Monitoring Service** consumes `predictions` and, later, `ActualAvailable` events (from ingestion), computes errors, and publishes `ErrorComputed` to `errors` topic.
5. **Alerting** consumes `errors` and triggers notifications when thresholds are breached.
6. **Retraining Scheduler** (a cron job) publishes `ModelRetrainingTriggered` to `training` topic periodically.
7. **Training Service** consumes `training` and `features` (for training data), trains a new model, and publishes `ModelTrained` to `models` topic.
8. **Prediction Service** subscribes to `models` to be notified when a new model is available; it can then load the new model for future predictions.

All services are loosely coupled, communicating only through events. This makes the system scalable and resilient.

---

## **82.10 Benefits and Challenges of Event‑Driven Architecture**

### **Benefits**
- **Decoupling**: Services evolve independently.
- **Scalability**: Each component can scale based on its load; event partitioning allows parallel processing.
- **Resilience**: If a consumer fails, events are still in the broker and can be replayed.
- **Auditability**: The event log provides a complete history of what happened.
- **Real‑time reactions**: Events can trigger immediate downstream processing.

### **Challenges**
- **Complexity**: More moving parts; requires good monitoring and debugging tools.
- **Eventual consistency**: Systems are eventually consistent, which may be hard to reason about.
- **Message ordering**: Ensuring correct order across partitions requires careful design.
- **Idempotency**: Consumers must handle duplicate events gracefully.
- **Schema evolution**: Events change over time; need to manage versions (e.g., with Avro or Protobuf).

---

## **82.11 Best Practices**

- **Define clear event schemas** and version them. Use schema registries (e.g., Confluent Schema Registry) to enforce compatibility.
- **Use keys for ordering** when needed (e.g., symbol).
- **Make events self‑contained** (event‑carried state transfer) when possible to reduce coupling.
- **Design for idempotency** – every consumer should be able to process the same event twice without ill effect.
- **Monitor consumer lag** – if consumers fall behind, alerts should fire.
- **Test with failure scenarios** – kill a consumer, restart a broker, and ensure the system recovers.
- **Use dead letter queues** for events that cannot be processed.

---

## **Chapter Summary**

In this chapter, we explored event‑driven architecture and its application to time‑series prediction systems. We defined key events for the NEPSE system, implemented producers and consumers with Apache Kafka, and demonstrated stream processing to compute prediction errors. We discussed event sourcing, CQRS, and the importance of idempotency and ordering. By integrating events with our alerting framework, we built a fully decoupled, scalable system. Event‑driven architecture is a powerful pattern for systems that need to react to continuous data streams, and it complements the microservices approach from Chapter 81.

In the next chapter, we will delve into **Multi‑Model Systems**, where multiple prediction models are combined or routed dynamically.

---

**End of Chapter 82**