# Chapter 81: Microservices Architecture

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the principles of microservices architecture and how they apply to time‑series prediction systems.
- Decompose a monolithic prediction system into loosely coupled, independently deployable services.
- Design service boundaries based on domain‑driven design and business capabilities.
- Implement inter‑service communication using synchronous (REST, gRPC) and asynchronous (message queues, events) patterns.
- Manage data consistency and transactions across microservices.
- Handle service discovery, load balancing, and resilience (circuit breakers, retries, timeouts).
- Apply containerisation and orchestration (Docker, Kubernetes) to microservices.
- Monitor and trace requests across distributed services.
- Evaluate the trade‑offs between microservices and monoliths for time‑series systems.

---

## **81.1 Introduction to Microservices Architecture**

Microservices architecture is an approach to software development where a single application is composed of many loosely coupled, independently deployable services. Each service runs its own process and communicates with others through well‑defined APIs. This contrasts with a monolithic architecture, where all components are bundled together.

For a time‑series prediction system like the NEPSE stock predictor, a monolithic design might include:

- Data ingestion
- Feature engineering
- Model training
- Prediction API
- Monitoring and alerting

all in one codebase. As the system grows, this becomes difficult to maintain, scale, and deploy. Microservices allow each component to be developed, scaled, and deployed independently, which is especially valuable when different parts have different resource requirements (e.g., model training needs GPUs, while the prediction API needs low latency).

However, microservices introduce complexity: network latency, distributed data management, service discovery, and fault tolerance must be handled explicitly.

In this chapter, we will design a microservices architecture for the NEPSE prediction system and implement key patterns.

---

## **81.2 Core Principles of Microservices**

Before diving into implementation, let's review the guiding principles:

- **Single Responsibility**: Each service should do one thing well (e.g., a Feature Service that only serves feature vectors).
- **Loose Coupling**: Services should have minimal knowledge of each other; changes to one should not require changes to others.
- **High Cohesion**: Related functionality should be grouped together.
- **Independent Deployability**: Each service can be deployed, scaled, and updated independently.
- **Decentralised Data Management**: Each service owns its database; no shared database.
- **Infrastructure Automation**: Continuous integration and deployment pipelines are essential.
- **Design for Failure**: Services must handle failures gracefully (retries, circuit breakers, fallbacks).
- **Observability**: Logs, metrics, and traces must be aggregated to understand system behaviour.

For the NEPSE system, we can identify candidate services:

- **Data Ingestion Service**: Responsible for fetching raw NEPSE data from CSV/APIs and storing it.
- **Feature Service**: Computes and serves feature vectors for a given timestamp and symbol.
- **Model Training Service**: Trains models on historical data, registers them in a model registry.
- **Prediction Service**: Accepts requests for predictions, retrieves features, loads the model, and returns predictions.
- **Model Registry Service**: Manages model versions and metadata.
- **Monitoring Service**: Collects metrics and triggers alerts.
- **User Interface Service**: Serves dashboards and visualisations.

These services communicate via APIs or asynchronous messages.

---

## **81.3 Designing Service Boundaries**

The most critical step is defining the right boundaries. A common approach is **Domain‑Driven Design (DDD)** , where we identify bounded contexts. In the NEPSE system, we might have:

- **Data Acquisition Context**: Raw data collection, validation, and storage.
- **Feature Engineering Context**: Transforming raw data into features.
- **Model Management Context**: Training, validation, and versioning of models.
- **Prediction Context**: Serving predictions in real time.
- **Monitoring Context**: Observability and alerting.

Each context becomes a service or a set of services.

Let's outline the responsibilities and APIs for each service.

### **81.3.1 Data Ingestion Service**

- **Responsibilities**:
  - Periodically fetch new NEPSE data (e.g., daily CSV).
  - Validate schema and data quality.
  - Store raw data in a data lake (Parquet files) or a time‑series database.
  - Publish events when new data arrives (e.g., to Kafka).

- **API**:
  - `POST /ingest` – manually trigger ingestion (for testing).
  - `GET /health` – health check.

- **Data Storage**: Raw data in Parquet (e.g., on S3) or a database like InfluxDB.

### **81.3.2 Feature Service**

- **Responsibilities**:
  - Read raw data and compute features on demand or in batch.
  - Store feature vectors for quick retrieval (feature store).
  - Provide an API to get features for a given symbol and timestamp.

- **API**:
  - `GET /features?symbol=NEPSE&timestamp=2023-01-01` – return feature vector.
  - `POST /features/batch` – accept list of requests and return batch features.

- **Data Storage**: Feature store (e.g., Redis for online, Parquet for offline).

### **81.3.3 Model Training Service**

- **Responsibilities**:
  - Periodically trigger training (e.g., weekly) using historical features.
  - Load data from feature store, train models, evaluate.
  - Register models in the model registry.
  - Optionally, perform hyperparameter tuning.

- **API**: Minimal (maybe just a trigger endpoint). Typically runs as a scheduled job.

### **81.3.4 Model Registry Service**

- **Responsibilities**:
  - Store model metadata: version, creation date, performance metrics, feature list, artifact location.
  - Allow querying for the latest production model or a specific version.
  - Manage model stages (staging, production, archived).

- **API**:
  - `POST /models` – register a new model.
  - `GET /models/latest?stage=production` – get latest production model metadata.
  - `GET /models/{version}` – get metadata for a specific version.

- **Data Storage**: Database (PostgreSQL) for metadata; model artifacts stored in blob storage.

### **81.3.5 Prediction Service**

- **Responsibilities**:
  - Accept prediction requests.
  - Query feature service for the required features.
  - Retrieve the current production model from the registry.
  - Run inference and return result.
  - Log predictions for monitoring.

- **API**:
  - `POST /predict` – with symbol and date, return predicted close price.
  - `POST /predict/batch` – batch predictions.

- **Data Storage**: None (stateless), but may cache models locally.

### **81.3.6 Monitoring Service**

- **Responsibilities**:
  - Collect logs and metrics from all services.
  - Compute performance metrics (e.g., prediction error) by comparing predictions with actuals.
  - Trigger alerts (using Chapter 73) when thresholds are breached.

- **API**: Typically internal; may expose metrics for Prometheus.

### **81.3.7 User Interface Service**

- **Responsibilities**:
  - Serve web dashboard for visualisation and manual intervention.
  - Communicate with prediction service and feature service.

- **API**: Serves HTML/JavaScript, calls backend services.

---

## **81.4 Inter‑Service Communication**

Services need to communicate. Two main styles:

- **Synchronous**: HTTP/REST, gRPC. Simple but can introduce coupling and cascading failures.
- **Asynchronous**: Message queues (RabbitMQ, Kafka), events. Decouples services but adds complexity.

For the NEPSE system, we can mix both:

- **Prediction Service → Feature Service**: Synchronous (REST) because predictions need features immediately.
- **Data Ingestion → Feature Service**: Asynchronous (event) – when new data arrives, the feature service can be notified to pre‑compute features.
- **Prediction Service → Model Registry**: Synchronous (REST) on startup to fetch model, then cached.
- **Prediction Service → Monitoring**: Asynchronous (log or message) – send prediction events for later analysis.

### **81.4.1 Synchronous Communication with REST**

REST is simple and widely understood. We'll use FastAPI for our services. Example: Feature Service endpoint.

```python
# feature_service.py (simplified)
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import pandas as pd
import joblib
from datetime import date

app = FastAPI()

# Load feature engineering pipeline (pre‑fitted)
feature_pipeline = joblib.load("feature_pipeline.pkl")

class FeatureRequest(BaseModel):
    symbol: str
    date: date

@app.post("/features")
def get_features(request: FeatureRequest):
    # In practice, retrieve raw data from database and compute features
    # Here we simulate by loading a pre‑computed feature store
    df = pd.read_parquet("feature_store.parquet")
    row = df[(df['symbol']==request.symbol) & (df['date']==request.date)]
    if row.empty:
        raise HTTPException(status_code=404, detail="Features not found")
    return row.to_dict(orient='records')[0]
```

### **81.4.2 Asynchronous Communication with Kafka**

We'll use Kafka to publish events when new data is ingested. Services can subscribe to topics.

```python
# data_ingestion_service.py (publisher)
from kafka import KafkaProducer
import json

producer = KafkaProducer(bootstrap_servers='localhost:9092',
                         value_serializer=lambda v: json.dumps(v).encode('utf-8'))

def ingest_and_publish(file_path):
    df = pd.read_csv(file_path)
    # ... save raw data ...
    # Publish event for each symbol-date
    for _, row in df.iterrows():
        event = {
            'symbol': row['Symbol'],
            'date': row['Date'].isoformat() if 'Date' in row else None,
            'event_type': 'NEW_DATA'
        }
        producer.send('raw-data-events', value=event)
```

The feature service can consume these events and trigger feature computation.

```python
# feature_service_consumer.py
from kafka import KafkaConsumer
import json

consumer = KafkaConsumer('raw-data-events',
                         bootstrap_servers='localhost:9092',
                         value_deserializer=lambda m: json.loads(m.decode('utf-8')))

for message in consumer:
    event = message.value
    if event['event_type'] == 'NEW_DATA':
        symbol = event['symbol']
        date = event['date']
        # Trigger feature computation for this symbol and date
        compute_features(symbol, date)
```

---

## **81.5 Data Management in Microservices**

Each service should own its database. Sharing a database between services creates coupling. For the NEPSE system:

- **Data Ingestion Service**: Writes raw data to a data lake (Parquet) and possibly a time‑series DB.
- **Feature Service**: Owns the feature store (could be Redis for online, and Parquet for offline).
- **Model Registry Service**: Owns metadata database (PostgreSQL).
- **Prediction Service**: Stateless; may cache models locally.

This decentralisation means that queries spanning multiple services must be handled at the application level (e.g., API composition). For example, a dashboard that needs to show both a prediction and the features that drove it would call the prediction service, which in turn calls the feature service.

### **81.5.1 Eventual Consistency**

Because services have separate databases, we must accept eventual consistency. For example, when new data is ingested, the feature service may take some time to compute features. During that window, prediction requests for that date might fail or use stale features. This is acceptable as long as the system is designed for it (e.g., by returning a 404 or a warning).

### **81.5.2 Transactions Across Services**

Distributed transactions (e.g., two‑phase commit) are generally avoided in microservices. Instead, use the **Saga pattern**: a sequence of local transactions with compensating actions. For instance, if model training requires updating both the model registry and the prediction service's cache, we could:

1. Register model in registry (local transaction).
2. Send a message to prediction service to update its cache.
3. If cache update fails, send a compensation to roll back the registry (or mark model as invalid).

This is complex; often we accept that services are eventually consistent.

---

## **81.6 Service Discovery and Load Balancing**

In a dynamic environment where services are scaled up/down, we need a way for services to find each other. Tools like **Kubernetes** provide built‑in service discovery via DNS. Alternatively, we can use a service registry like **Consul** or **Eureka**.

For the NEPSE system deployed on Kubernetes, each service gets a DNS name (e.g., `feature-service.default.svc.cluster.local`). The prediction service can use that to call the feature service.

Load balancing is handled by Kubernetes or by client‑side load balancing (e.g., using a library like Ribbon). In our FastAPI services, we can use simple HTTP clients with retries.

```python
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential

class FeatureServiceClient:
    def __init__(self, base_url="http://feature-service:8000"):
        self.base_url = base_url
    
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
    async def get_features(self, symbol, date):
        async with httpx.AsyncClient() as client:
            resp = await client.post(f"{self.base_url}/features", json={"symbol": symbol, "date": date.isoformat()})
            resp.raise_for_status()
            return resp.json()
```

---

## **81.7 Resilience Patterns**

Microservices must be resilient to failures. Key patterns:

- **Retries** with exponential backoff (as above).
- **Circuit Breaker**: Prevents calling a failing service repeatedly. Implemented with libraries like `pybreaker`.
- **Timeouts**: Always set timeouts on external calls.
- **Bulkhead**: Isolate resources so that failure in one part doesn't cascade.
- **Fallback**: Provide default responses when a service is unavailable.

Example circuit breaker for the feature service client:

```python
import pybreaker

breaker = pybreaker.CircuitBreaker(fail_max=5, reset_timeout=60)

class FeatureServiceClientWithBreaker:
    def __init__(self, base_url):
        self.base_url = base_url
        self.client = httpx.AsyncClient()
    
    @breaker
    async def get_features(self, symbol, date):
        try:
            resp = await self.client.post(f"{self.base_url}/features", json={"symbol": symbol, "date": date.isoformat()}, timeout=5.0)
            resp.raise_for_status()
            return resp.json()
        except (httpx.TimeoutException, httpx.HTTPStatusError) as e:
            # Circuit breaker will count this as failure
            raise
        except Exception as e:
            # Unexpected errors also count
            raise

    async def get_features_with_fallback(self, symbol, date):
        try:
            return await self.get_features(symbol, date)
        except pybreaker.CircuitBreakerError:
            # Fallback: return cached features or a default
            return {"error": "Feature service unavailable", "fallback": True}
```

---

## **81.8 Containerisation and Orchestration**

Microservices are typically packaged as Docker containers and orchestrated with Kubernetes.

### **81.8.1 Dockerising a Service**

Example Dockerfile for the prediction service:

```dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "prediction_service:app", "--host", "0.0.0.0", "--port", "8000"]
```

### **81.8.2 Kubernetes Deployment**

A simple deployment and service for the prediction service:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prediction-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: prediction
  template:
    metadata:
      labels:
        app: prediction
    spec:
      containers:
      - name: prediction
        image: nepse/prediction-service:latest
        ports:
        - containerPort: 8000
        env:
        - name: FEATURE_SERVICE_URL
          value: "http://feature-service:8000"
        - name: MODEL_REGISTRY_URL
          value: "http://model-registry:8000"
---
apiVersion: v1
kind: Service
metadata:
  name: prediction-service
spec:
  selector:
    app: prediction
  ports:
  - port: 80
    targetPort: 8000
```

Kubernetes provides service discovery: the prediction service can reach `feature-service` via its DNS name.

---

## **81.9 Observability in Microservices**

With many services, monitoring becomes challenging. We need:

- **Centralised logging**: All services log to a central system (e.g., ELK stack, Loki).
- **Metrics aggregation**: Prometheus scrapes metrics from each service; Grafana for dashboards.
- **Distributed tracing**: Follow a request across services using tools like Jaeger or Zipkin.

### **81.9.1 Logging**

Each service should log in a structured format (JSON) to facilitate aggregation. In FastAPI, we can use `python-json-logger`.

```python
import logging
from pythonjsonlogger import jsonlogger

logger = logging.getLogger()
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter()
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
logger.setLevel(logging.INFO)
```

### **81.9.2 Metrics**

Prometheus client libraries can expose metrics. Example for prediction service:

```python
from prometheus_client import Counter, Histogram, generate_latest
from fastapi import Response

predictions_total = Counter('predictions_total', 'Total number of predictions', ['symbol'])
prediction_duration = Histogram('prediction_duration_seconds', 'Prediction duration')

@app.get("/metrics")
def metrics():
    return Response(content=generate_latest(), media_type="text/plain")

@app.post("/predict")
@prediction_duration.time()
def predict(request: PredictionRequest):
    predictions_total.labels(symbol=request.symbol).inc()
    # ... prediction logic ...
```

### **81.9.3 Distributed Tracing**

We can instrument our services with OpenTelemetry. For FastAPI, use the `opentelemetry-instrumentation` package. This automatically propagates trace context across HTTP calls.

```python
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

trace.set_tracer_provider(TracerProvider())
jaeger_exporter = JaegerExporter(
    agent_host_name="jaeger",
    agent_port=6831,
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(jaeger_exporter)
)

app = FastAPI()
FastAPIInstrumentor.instrument_app(app)
```

Now, when a request flows from prediction service to feature service, we can see the entire path in Jaeger.

---

## **81.10 Case Study: Refactoring the NEPSE Monolith into Microservices**

Let's walk through a concrete refactoring of the NEPSE prediction system we built in Chapter 74.

**Original Monolith**: A single Python application that:
- Loads CSV daily (ingestion).
- Computes features.
- Trains a model weekly.
- Serves predictions via FastAPI.
- Logs and alerts.

**Step 1: Identify services** – As earlier: ingestion, feature, training, registry, prediction, monitoring.

**Step 2: Extract ingestion service** – Move data fetching and storage to a separate service. It writes raw Parquet to shared storage (e.g., S3). It also publishes events to Kafka.

**Step 3: Extract feature service** – This service listens to Kafka events, computes features, and stores them in Redis (for online) and Parquet (for offline). It exposes a REST API for feature retrieval.

**Step 4: Extract model registry** – A simple service with a database to store model metadata. It exposes APIs to register and retrieve models.

**Step 5: Extract training service** – A batch job (could be a Kubernetes CronJob) that pulls features from the feature store, trains a model, and registers it.

**Step 6: Extract prediction service** – Stateless service that calls feature service and model registry (caching the model locally). It returns predictions.

**Step 7: Extract monitoring service** – Consumes prediction logs, computes errors, and triggers alerts.

**Step 8: Deploy with Kubernetes** – Each service gets its own deployment and service. Use ConfigMaps for configuration (e.g., Kafka brokers, database URLs).

**Step 9: Set up observability** – Deploy Prometheus, Grafana, Loki, and Jaeger in the cluster. Instrument services accordingly.

**Step 10: Implement CI/CD** – Each service has its own build pipeline; changes to one do not require rebuilding others.

---

## **81.11 Trade‑offs and When to Use Microservices**

Microservices are not a silver bullet. They add complexity in development, testing, and operations. Consider them when:

- The team is large enough to own multiple services.
- Different parts have different scalability requirements (e.g., prediction service needs many instances, training needs GPUs).
- Independent deployment cycles are needed.
- Technology heterogeneity is desired (e.g., use a different language for some services).

For the NEPSE system, if it's a small project, a well‑structured monolith may be sufficient. However, as the system grows to include multiple models, real‑time and batch predictions, and a larger team, microservices become beneficial.

---

## **81.12 Best Practices**

- **Start with a monolith** and extract services as needed (Strangler pattern).
- **Define clear API contracts** and version them.
- **Use API gateways** to handle cross‑cutting concerns (authentication, rate limiting).
- **Automate everything** – build, test, deploy.
- **Design for failure** – assume networks will fail.
- **Keep services small but not too small** – a service should be manageable by a small team.
- **Document service boundaries and dependencies**.

---

## **Chapter Summary**

In this chapter, we explored microservices architecture in the context of time‑series prediction systems, using the NEPSE system as a running example. We identified candidate services, discussed inter‑service communication patterns, data management, resilience, containerisation, and observability. We walked through a refactoring of the NEPSE monolith into microservices and highlighted the trade‑offs. Microservices can bring scalability and independence but require significant investment in infrastructure and operations. For many systems, a well‑modularised monolith remains a pragmatic choice.

In the next chapter, we will dive deeper into **Event‑Driven Architecture**, a complementary pattern that fits well with microservices.

---

**End of Chapter 81**