# Chapter 84: Real-Time Learning Systems

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the difference between batch learning and real‑time (online) learning.
- Identify scenarios where real‑time learning is beneficial, especially in time‑series prediction.
- Implement incremental learning algorithms using libraries like `scikit‑learn` (with `partial_fit`) and `river`.
- Handle streaming feature engineering and maintain state across time.
- Detect and adapt to concept drift using statistical tests and adaptive models.
- Build a real‑time learning system for the NEPSE stock prediction problem, where the model updates as new daily data arrives.
- Evaluate model performance in a streaming context using prequential (test‑then‑train) evaluation.
- Deploy real‑time learning systems with considerations for latency, state management, and monitoring.

---

## **84.1 Introduction to Real‑Time Learning**

In traditional machine learning, models are trained on a fixed historical dataset and then deployed. This is **batch learning**. However, many time‑series applications, including stock prediction, demand forecasting, and IoT analytics, generate data continuously. Batch learning has several limitations:

- Models become stale as new data arrives, leading to performance degradation (concept drift).
- Retraining from scratch periodically is computationally expensive and may not keep up with the data rate.
- The model cannot adapt to recent patterns until the next retraining cycle.

**Real‑time learning** (also called online learning or incremental learning) addresses these issues by updating the model incrementally as each new data point arrives. The model learns continuously, adapting to changes in the underlying data distribution. This is particularly valuable in financial markets like NEPSE, where market conditions can shift rapidly due to news, regulations, or macroeconomic events.

In this chapter, we will build a real‑time learning system for the NEPSE prediction task. The model will be updated daily after each new trading day's data becomes available, allowing it to adapt to recent market behaviour.

---

## **84.2 Online Learning vs. Batch Learning**

### **84.2.1 Key Differences**

| Aspect | Batch Learning | Online Learning |
|--------|----------------|-----------------|
| Training data | All historical data at once | One instance at a time (or mini‑batches) |
| Model update | Infrequent, expensive | Incremental, cheap |
| Adaptation to change | Only after retraining | Continuous |
| Memory usage | High (stores all data) | Low (only current model state) |
| Latency | High for retraining | Low for each update |
| Evaluation | Train/validation/test split | Prequential (test‑then‑train) |

### **84.2.2 When to Use Online Learning**

Online learning is ideal when:

- Data arrives in a stream (e.g., sensor readings, stock ticks).
- The underlying concept may change over time (concept drift).
- Low‑latency predictions are required.
- Storage is limited (cannot keep all historical data).
- You want the model to personalise quickly to user behaviour.

For the NEPSE system, we receive one new data point per day per stock. This is a low‑velocity stream, but concept drift (e.g., due to a market crash) can happen suddenly. Online learning allows the model to adapt quickly without waiting for a weekly retraining job.

---

## **84.3 Incremental Learning Algorithms**

Many machine learning algorithms have online versions that support incremental updates. We'll focus on those available in Python.

### **84.3.1 `partial_fit` in scikit‑learn**

Several scikit‑learn estimators implement the `partial_fit` method, which allows incremental training. These include:

- `SGDRegressor` / `SGDClassifier` (Stochastic Gradient Descent)
- `PassiveAggressiveRegressor` / `PassiveAggressiveClassifier`
- `Perceptron`
- `MiniBatchKMeans`
- `IncrementalPCA`

These are linear models or simple neural networks. They are suitable for high‑dimensional data and can be updated efficiently.

### **84.3.2 River Library**

[River](https://github.com/online-ml/river) is a Python library specifically designed for online machine learning. It provides a wide range of algorithms, including linear models, decision trees (Hoeffding trees), neural networks, and preprocessing tools. River is well‑suited for streaming data because it maintains state and supports incremental learning natively.

### **84.3.3 Example: Online Linear Regression with SGD**

Let's start with a simple online linear regression using `SGDRegressor` on the NEPSE data.

```python
import numpy as np
import pandas as pd
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error
from sklearn.pipeline import make_pipeline

# Generate synthetic NEPSE data (as before)
def generate_nepse_data(days=1500):
    dates = pd.date_range(start='2020-01-01', periods=days, freq='B')
    prices = 1000 + np.cumsum(np.random.randn(days) * 5)
    df = pd.DataFrame({
        'date': dates,
        'close': prices,
        'volume': np.random.lognormal(12, 1, days)
    })
    # Add some features
    df['lag_1'] = df['close'].shift(1)
    df['lag_5'] = df['close'].shift(5)
    df['sma_10'] = df['close'].rolling(10).mean()
    df['volatility'] = df['close'].rolling(20).std()
    df = df.dropna().reset_index(drop=True)
    return df

df = generate_nepse_data(days=1500)

# Features and target (predict next day close)
feature_cols = ['lag_1', 'lag_5', 'sma_10', 'volatility', 'volume']
X = df[feature_cols]
y = df['close'].shift(-1).dropna()
X = X.iloc[:-1]  # align

# Online learning simulation: we'll iterate through the data in order
# We'll use a pipeline with scaling, because SGD is sensitive to feature scale.
model = make_pipeline(
    StandardScaler(),
    SGDRegressor(learning_rate='adaptive', eta0=0.01, random_state=42)
)

# Prequential evaluation: test on current sample, then train on it
predictions = []
true_values = []
for i in range(len(X)):
    X_i = X.iloc[i:i+1]  # keep as DataFrame for scaling
    y_i = y.iloc[i]
    
    # Predict before training (test)
    pred = model.predict(X_i)[0]
    predictions.append(pred)
    true_values.append(y_i)
    
    # Train on this sample (partial fit)
    # Note: for the first sample, the model is not fitted yet; we need to call partial_fit with classes for classification, but for regression we can just call partial_fit.
    # However, the pipeline with StandardScaler does not support partial_fit directly. We need to handle scaling manually or use River.
    # For simplicity, we'll use River for this example, but here's a workaround:
    if i == 0:
        # Initialize the model with a dummy partial_fit
        model.named_steps['sgdregressor'].partial_fit(X_i, [y_i])
    else:
        model.named_steps['sgdregressor'].partial_fit(X_i, [y_i])
    # Note: StandardScaler is not updated incrementally. This is a limitation.
    # In a real online system, you'd use an incremental scaler (e.g., from River).

# Compute metrics
mae = mean_absolute_error(true_values, predictions)
print(f"Online SGD MAE: {mae:.2f}")
```

**Explanation:**

- We simulate a stream by iterating through the DataFrame in chronological order.
- For each new sample, we first predict (test) and then update the model (train). This is **prequential evaluation** (test‑then‑train).
- The `SGDRegressor` is updated with `partial_fit`. However, the `StandardScaler` in the pipeline is not updated incrementally – it would need to be fitted on the entire dataset beforehand, which violates the online principle. In a true online setting, we need an incremental scaler (like `River`'s `StandardScaler`).

This example illustrates the concept but has limitations. Let's now use **River**, which is designed for exactly this purpose.

### **84.3.4 Online Learning with River**

River provides estimators that are fully incremental, including preprocessing.

```python
# pip install river
from river import linear_model, preprocessing, metrics
import numpy as np

# Prepare data as stream of dictionaries
X_stream = X.to_dict(orient='records')
y_stream = y.values

# Build a pipeline with an incremental scaler and a linear regression
model = preprocessing.StandardScaler() | linear_model.LinearRegression()

# Initialize metric
metric = metrics.MAE()

# Prequential evaluation
predictions = []
for xi, yi in zip(X_stream, y_stream):
    # Predict
    y_pred = model.predict_one(xi)
    if y_pred is not None:
        predictions.append(y_pred)
        metric.update(yi, y_pred)
    # Train
    model.learn_one(xi, yi)

print(f"River Linear Regression MAE: {metric.get():.2f}")
```

**Explanation:**

- River's `StandardScaler` maintains running mean and standard deviation, updating incrementally.
- `LinearRegression` in River uses stochastic gradient descent with a configurable optimizer.
- `predict_one` and `learn_one` operate on single dictionaries.
- The metric is updated after each prediction, giving us a running estimate of MAE.

River also provides more sophisticated models like `HoeffdingTreeRegressor`, `AdaptiveRandomForest`, and neural networks. It also includes drift detection methods.

---

## **84.4 Streaming Feature Engineering**

In a real‑time learning system, features must be computed incrementally from the stream. For example, lag features require storing recent values; rolling statistics require maintaining a window. River provides stateful transformers that can be used in a pipeline.

### **84.4.1 Lag and Rolling Features with River**

River has a `Rolling` class that can compute statistics over a sliding window, and `Lag` for lagged values.

```python
from river import feature_extraction as fx
from river import compose

# Define a feature extractor that adds lagged values and rolling mean
def add_features():
    return (
        compose.Select('close', 'volume')
        | (fx.Lag('close', lags=[1, 5]) + fx.Lag('volume', lags=[1]))
        | (fx.Rolling('close', window=10, func='mean') + fx.Rolling('close', window=20, func='std'))
    )

# We need to simulate a stream of raw data without precomputed features.
# For demonstration, we'll create a stream of raw dicts with only 'close' and 'volume'.
raw_stream = df[['close', 'volume']].to_dict(orient='records')

# We'll combine the feature extractor with a model in a pipeline
model = (
    add_features() |
    preprocessing.StandardScaler() |
    linear_model.LinearRegression()
)

# Now we can process the stream
metric = metrics.MAE()
for x, y in zip(raw_stream, y_stream):
    y_pred = model.predict_one(x)
    if y_pred is not None:
        metric.update(y, y_pred)
    model.learn_one(x, y)

print(f"MAE with streaming features: {metric.get():.2f}")
```

**Explanation:**

- The feature extractor is built using River's `compose` and `feature_extraction` modules. It adds lag features and rolling statistics on the fly, maintaining internal state (e.g., a deque for rolling windows).
- This allows us to feed raw data directly and get engineered features incrementally.
- The model learns from each sample after prediction.

---

## **84.5 Handling Concept Drift**

Concept drift occurs when the statistical properties of the target variable change over time. In financial markets, drift is common due to changing volatility, new regulations, or market sentiment shifts. Online learning systems must detect and adapt to drift.

### **84.5.1 Types of Drift**

- **Sudden drift**: abrupt change (e.g., after a news event).
- **Gradual drift**: slow change over time (e.g., evolving market trends).
- **Recurring concepts**: patterns that reappear (e.g., seasonal effects).

### **84.5.2 Drift Detection Methods**

River provides several drift detectors:

- **ADWIN** (Adaptive Windowing): keeps a sliding window and detects changes when the mean of two sub‑windows differs significantly.
- **Page-Hinkley**: sequential test for change in the mean of a signal.
- **DDM** (Drift Detection Method): monitors error rate of a model.

We can use these detectors to trigger model adaptation, such as resetting the model or switching to a new model.

### **84.5.3 Adaptive Models**

River also includes models that are inherently adaptive, such as:

- `HoeffdingTreeRegressor` (incrementally builds a decision tree)
- `AdaptiveRandomForestRegressor` (ensemble that adapts to drift)
- `EWARegressor` (Exponentially Weighted Average, which gives more weight to recent data)

### **84.5.4 Example: Using ADWIN to Detect Drift in Prediction Errors**

We can monitor the prediction error (e.g., absolute error) with ADWIN. If drift is detected, we might reset the model or switch to a more recent model.

```python
from river import drift

# Initialize ADWIN detector
adwin = drift.ADWIN()

# Run online learning and check for drift
model = preprocessing.StandardScaler() | linear_model.LinearRegression()
metric = metrics.MAE()
errors = []

for xi, yi in zip(X_stream, y_stream):
    y_pred = model.predict_one(xi)
    if y_pred is not None:
        error = abs(yi - y_pred)
        errors.append(error)
        adwin.update(error)
        if adwin.change_detected:
            print(f"Drift detected at step {len(errors)}! Mean error changed.")
            # Optionally reset model or adapt
            # model = preprocessing.StandardScaler() | linear_model.LinearRegression()
            adwin.reset()  # reset detector after handling
    model.learn_one(xi, yi)
```

**Explanation:**

- ADWIN monitors the stream of absolute errors. If it detects a significant change in the mean error, `change_detected` becomes True.
- We can then take action, such as resetting the model (starting fresh) or switching to a different model.

---

## **84.6 Implementing a Real‑Time Learning System for NEPSE**

Let's put it all together into a production‑ready real‑time learning system for NEPSE. The system will:

1. Receive daily raw data (open, high, low, close, volume).
2. Compute features incrementally (lags, rolling stats, technical indicators).
3. Update the online model (e.g., an adaptive random forest).
4. Make predictions for the next day.
5. Monitor for drift and adjust.

We'll use River's `AdaptiveRandomForestRegressor`, which handles drift internally by weighting recent trees.

```python
from river import ensemble, metrics, preprocessing, compose, feature_extraction as fx

# Define a streaming feature pipeline
def feature_pipeline():
    return (
        compose.Select('close', 'volume', 'open', 'high', 'low')
        | (
            fx.Lag('close', lags=[1, 2, 5]) +
            fx.Lag('volume', lags=[1]) +
            fx.Rolling('close', window=5, func='mean') +
            fx.Rolling('close', window=10, func='std') +
            fx.Rolling('volume', window=5, func='mean')
        )
        # Add a custom feature: daily return
        | compose.FuncTransformer(lambda x: {'return': (x['close'] - x['close_lag_1']) / x['close_lag_1'] if x.get('close_lag_1') else 0})
    )

# Create the online model
model = (
    feature_pipeline() |
    preprocessing.StandardScaler() |
    ensemble.AdaptiveRandomForestRegressor(
        n_models=10,
        max_depth=10,
        seed=42
    )
)

# Prepare raw stream (only raw columns)
raw_columns = ['close', 'volume', 'open', 'high', 'low']
# We need to simulate a stream where each sample is a dict with these keys.
# In practice, you'd get this from your ingestion service.
raw_stream = df[raw_columns].to_dict(orient='records')
y_stream = df['close'].shift(-1).dropna().values
# Align: the last row of raw_stream corresponds to the last target? Need to drop last raw.
raw_stream = raw_stream[:-1]

# Run online learning
metric = metrics.MAE()
for xi, yi in zip(raw_stream, y_stream):
    # Predict
    y_pred = model.predict_one(xi)
    if y_pred is not None:
        metric.update(yi, y_pred)
    # Learn
    model.learn_one(xi, yi)
    
    # Optionally log progress every N steps
    # if metric.n_samples % 100 == 0:
    #     print(f"Step {metric.n_samples}: MAE = {metric.get():.2f}")

print(f"Final MAE: {metric.get():.2f}")
```

**Explanation:**

- The feature pipeline computes lagged values, rolling means and standard deviations, and a custom return feature. All state (windows) is maintained internally by River.
- `AdaptiveRandomForestRegressor` is an ensemble of Hoeffding trees that adapts to concept drift by weighting recent trees more heavily and replacing poor performers.
- We use a standard scaler that normalizes features incrementally.
- The loop processes one day at a time: predict, update metric, then train.

This system continuously adapts to the latest NEPSE data and can be deployed in production.

---

## **84.7 Evaluation in Streaming Context**

In batch learning, we split data into train/test. In online learning, we use **prequential evaluation** (test‑then‑train) as shown above. This gives a realistic estimate of how the model would perform if deployed.

We can also compute metrics over a sliding window to track performance over time, which helps detect degradation.

```python
from river import metrics

# Use a windowed MAE to see recent performance
window_mae = metrics.Rolling(metrics.MAE(), window_size=30)

for xi, yi in zip(raw_stream, y_stream):
    y_pred = model.predict_one(xi)
    if y_pred is not None:
        window_mae.update(yi, y_pred)
        # Print every 30 days
        if window_mae.n_samples % 30 == 0:
            print(f"Day {window_mae.n_samples}: Rolling MAE = {window_mae.get():.2f}")
    model.learn_one(xi, yi)
```

**Explanation:**

- `Rolling` computes the metric over the last `window_size` samples, giving a local view of performance.
- This is useful for detecting when the model starts to fail (e.g., due to drift).

---

## **84.8 Deployment Considerations**

Deploying a real‑time learning system requires careful design.

### **84.8.1 State Persistence**
The model's internal state (e.g., trees, scaler means, feature windows) must be persisted so that after a restart, the system can resume learning. River models can be serialized with `pickle` or `joblib`. However, the state of feature extractors (like rolling windows) also needs to be saved.

```python
import joblib

# Save the entire pipeline
joblib.dump(model, 'online_model.pkl')

# Later, load
model = joblib.load('online_model.pkl')
```

### **84.8.2 Microservice Integration**
In a microservices architecture (Chapter 81), the real‑time learner could be a service that:

- Consumes raw data events (e.g., from Kafka).
- Maintains model state in memory or a fast database.
- Exposes a prediction endpoint that returns the latest prediction and optionally updates the model.
- Periodically checkpoints the model to object storage.

### **84.8.3 Latency and Throughput**
For daily data, latency is not an issue. For high‑frequency streams, ensure that the model's update time is less than the inter‑arrival time. River's models are optimised for speed.

### **84.8.4 Monitoring**
Monitor:

- Prediction error (rolling metric)
- Drift detection alerts
- Model update latency
- Feature distribution shifts

Integrate with the alerting system from Chapter 73.

### **84.8.5 Model Versioning**
You may want to keep snapshots of the model at different times (e.g., daily) for debugging or rollback. This can be done by saving the model after each update (or periodically) to a versioned store.

---

## **84.9 Best Practices**

1. **Start with a simple model**: Online linear regression is a good baseline. Add complexity only if needed.
2. **Monitor for drift**: Use drift detectors to alert you when performance changes.
3. **Use incremental preprocessing**: Never use batch scaling; always use incremental statistics.
4. **Test in production carefully**: Shadow deploy the online model alongside a batch model before fully switching.
5. **Handle missing values**: In streaming, you may need to impute or skip samples.
6. **Consider mini‑batches**: If data arrives in bursts, you can update in mini‑batches for efficiency.
7. **Regularly evaluate on hold‑out data**: Even though you're learning online, maintain a separate test set from an earlier time period to check for catastrophic forgetting.
8. **Document the learning process**: Record model updates, detected drifts, and any interventions.

---

## **Chapter Summary**

In this chapter, we explored real‑time (online) learning systems and applied them to the NEPSE stock prediction problem. We contrasted online learning with batch learning, introduced incremental algorithms using scikit‑learn and River, and implemented streaming feature engineering. We discussed concept drift and demonstrated how to detect and adapt using River's drift detectors and adaptive models. We built a complete online learning pipeline for NEPSE using `AdaptiveRandomForestRegressor`. Finally, we covered deployment considerations, including state persistence, monitoring, and best practices.

Real‑time learning enables models to stay current with the latest data, adapt to changing conditions, and provide up‑to‑date predictions. It is a powerful addition to the time‑series practitioner's toolkit.

In the next chapter, we will delve into **Distributed Systems**, exploring how to scale time‑series prediction across multiple machines for large‑scale applications.

---

**End of Chapter 84**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='83. multi_model_systems.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='85. distributed_systems.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
