# 05: ProToPhen Deployment Infrastructure

**Session 10 Documentation Notebook**

This notebook demonstrates the production-serving components of ProToPhen:

1. **Inference Pipeline** — End-to-end sequence to phenotype prediction
2. **Checkpoint Compatibility** — Loading Trainer, Callback, and Registry checkpoints
3. **Model Registry** — Version tracking, promotion, rollback
4. **Monitoring & Drift Detection** — Latency, throughput, and distribution shift
5. **Feedback & Quality Tracking** — Closing the active-learning loop
6. **REST API** — FastAPI service for real-time and batch inference
7. **Docker Deployment** — Containerised serving quick-start

---

**Prerequisites:**
```bash
pip install 'protophen[serving]'  # adds fastapi, uvicorn, httpx, prometheus-client
```

**Contents**

0. [Setup & Synthetic Checkpoint](#0-setup--synthetic-checkpoint)
1. [Inference Pipeline](#1-inference-pipeline)
2. [Checkpoint Compatibility](#2-checkpoint-compatibility)
3. [Model Registry](#3-model-registry)
4. [Monitoring & Drift Detection](#4-monitoring--drift-detection)
5. [Feedback & Quality Tracking](#5-feedback--quality-tracking)
6. [REST API](#6-rest-api)
7. [Docker deployment](#7-docker-deployment)
8. [Configuration Reference](#8-configuration-reference)
9. [Summary](#9-summary)

## 0. Setup & Synthetic Checkpoint

All examples in this notebook use a **tiny synthetic model** so that
no real ESM-2 weights or GPU are required.  The same patterns apply
to full-scale models — only the checkpoint path changes.

In [1]:
import tempfile
import shutil
from pathlib import Path

import numpy as np
import torch

# Create a temporary working directory for this notebook
WORK_DIR = Path(tempfile.mkdtemp(prefix="protophen_deploy_"))
CHECKPOINT_DIR = WORK_DIR / "checkpoints"
REGISTRY_DIR = WORK_DIR / "model_registry"
FEEDBACK_DIR = WORK_DIR / "feedback"

CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)
print(f"Working directory: {WORK_DIR}")

Working directory: C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8


In [2]:
from protophen.models.protophen import ProToPhenConfig, ProToPhenModel

# Build a tiny model for demonstration
model_config = ProToPhenConfig(
    protein_embedding_dim=32,
    encoder_hidden_dims=[16],
    encoder_output_dim=8,
    decoder_hidden_dims=[16],
    cell_painting_dim=10,
    predict_viability=True,
    predict_transcriptomics=False,
    mc_dropout=True,
)

model = ProToPhenModel(config = model_config)
print(f"Model: {model.n_parameters:,} parameters")
print(f"Tasks: {model.task_names}")

[32m2026-02-18 16:38:22[0m | [1mINFO    [0m | [36mprotophen.models.protophen[0m:[36m__init__[0m:[36m170[0m | [1mInitialised ProToPhenModel: 10,915 parameters, tasks=['cell_painting', 'viability'][0m


Model: 10,915 parameters
Tasks: ['cell_painting', 'viability']


In [3]:
# Save in the format Trainer.save_checkpoint() produces - this is the most common real-world checkpoint format.
from dataclasses import asdict
from protophen.training.trainer import TrainerConfig

trainer_config = TrainerConfig(
    epochs=100,
    learning_rate=1e-4,
    weight_decay=0.01,
    optimiser="adamw",
    scheduler="cosine",
    tasks=["cell_painting", "viability"],
    task_weights={"cell_painting": 1.0, "viability": 0.5},
    seed=42,
)

trainer_ckpt_path = CHECKPOINT_DIR / "trainer_best.pt"
torch.save(
    {
        "epoch": 75,
        "global_step": 3750,
        "model_state_dict": model.state_dict(),
        "optimiser_state_dict": {},
        "config": asdict(trainer_config),  # TrainerConfig, NOT ProToPhenConfig
        "best_val_loss": 0.0312,
    },
    trainer_ckpt_path,
)

# Also save a pipeline-style checkpoint (with ProToPhenConfig)
pipeline_ckpt_path = CHECKPOINT_DIR / "pipeline_v1.pt"
torch.save(
    {
        "model_state_dict": model.state_dict(),
        "config": {
            "protein_embedding_dim": 32,
            "encoder_hidden_dims": [16],
            "encoder_output_dim": 8,
            "decoder_hidden_dims": [16],
            "cell_painting_dim": 10,
            "predict_viability": True,
            "predict_transcriptomics": False,
            "mc_dropout": True,
        },
        "epoch": 75,
        "version": "v1.0",
        "metrics": {"val_r2": 0.72, "val_mse": 0.031},
    },
    pipeline_ckpt_path,
)

print(f"Trainer checkpoint: {trainer_ckpt_path}")
print(f"Pipeline checkpoint: {pipeline_ckpt_path}")

Trainer checkpoint: C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\trainer_best.pt
Pipeline checkpoint: C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\pipeline_v1.pt


## 1. Inference Pipeline

The `InferencePipeline` class encapsulates the full prediction pathway:

sequence ──► ESM-2 embedding ──► physicochemical features ──► fusion
         ──► ProToPhen model ──► task predictions (+ optional uncertainty)

Heavy components (ESM-2 model, ProToPhen checkpoint) are **lazily loaded**
on first use, so construction is cheap.

In [4]:
from protophen.serving.pipeline import InferencePipeline, PipelineConfig

config = PipelineConfig(
    device="cpu",
    use_fp16=False,
    # Disable physicochemical features for this demo
    # (in production, leave these as True)
    include_physicochemical=False,
    # Uncertainty defaults
    default_mc_samples=20,
    # Sequence limits
    max_sequence_length=2000,
    max_batch_size=64,
)

pipeline = InferencePipeline(
    checkpoint_path=pipeline_ckpt_path,
    config=config,
)

print(f"Pipeline ready: {pipeline.is_ready}")
print(f"Model version:  {pipeline.model_version}")
print(f"Device:         {pipeline.device}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m__init__[0m:[36m421[0m | [1mInferencePipeline initialised (device=cpu, checkpoint=C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\pipeline_v1.pt)[0m


Pipeline ready: False
Model version:  unknown
Device:         cpu


In [5]:
# Inspect model metadata
info = pipeline.get_model_info()
for key, value in info.items():
    print(f"  {key}: {value}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m245[0m | [1mLoading checkpoint from C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\pipeline_v1.pt[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.models.protophen[0m:[36m__init__[0m:[36m170[0m | [1mInitialised ProToPhenModel: 10,915 parameters, tasks=['cell_painting', 'viability'][0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mbuild_model_from_checkpoint[0m:[36m344[0m | [1mModel restored (epoch 75), 10,915 params on cpu[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_model[0m:[36m512[0m | [1mModel version 'v1.0' loaded successfully[0m


  model_version: v1.0
  model_name: ProToPhen
  tasks: {'cell_painting': 10, 'viability': 1}
  latent_dim: 8
  protein_embedding_dim: 32
  n_parameters: 10915
  n_trainable_parameters: 10915
  encoder_hidden_dims: [16]
  decoder_hidden_dims: [16]
  esm_model: esm2_t33_650M_UR50D
  fusion_method: concatenate
  device: cpu
  loaded_at: 2026-02-18T05:38:23.215483+00:00


### 1.1 Mock the ESM Embedder

In production, the pipeline calls `ESMEmbedder.embed_sequence()` to
compute real ESM-2 embeddings.  For this demo we inject a mock that
returns deterministic 32-dimensional vectors.

In [6]:
from unittest.mock import MagicMock

mock_esm = MagicMock()
mock_esm.embedding_dim = 32
mock_esm.output_dim = 32
mock_esm._model_loaded = True

def _mock_embed(seq):
    np.random.seed(hash(seq) % 2**31)
    return np.random.randn(32).astype(np.float32)

def _mock_embed_batch(seqs, **kw):
    return np.stack([_mock_embed(s) for s in seqs])

mock_esm.embed_sequence.side_effect = _mock_embed
mock_esm.embed_sequences.side_effect = _mock_embed_batch

# Inject mock
pipeline._esm_embedder = mock_esm

### 1.2 Single Prediction

In [7]:
sequence = "MKFLILLFNILCLFPVLAADNHGVGPQGAS"

result = pipeline.predict(sequence, protein_name="demo_protein_1")

print(f"Protein:          {result.protein_name}")
print(f"Sequence length:  {result.sequence_length}")
print(f"Hash:             {result.protein_hash}")
print(f"Model version:    {result.model_version}")
print(f"Inference time:   {result.inference_time_ms:.1f} ms")
print()

for pred in result.predictions:
    vals = pred.values[:5]  # first 5 values
    print(f"  Task: {pred.task_name} ({pred.dimension} dims)")
    print(f"    First 5 values: {[f'{v:.4f}' for v in vals]}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.embeddings.fusion[0m:[36m__init__[0m:[36m423[0m | [1mInitialised EmbeddingFusion: method=concatenate[0m


Protein:          demo_protein_1
Sequence length:  30
Hash:             2af685c306feed93
Model version:    v1.0
Inference time:   30.3 ms

  Task: cell_painting (10 dims)
    First 5 values: ['0.3078', '-0.1771', '-0.0311', '-0.1844', '0.3945']
  Task: viability (1 dims)
    First 5 values: ['0.6807']


### 1.3 Prediction with Uncertainty (MC Dropout)

In [8]:
result_unc = pipeline.predict(
    sequence,
    protein_name="demo_protein_1",
    return_uncertainty=True,
    n_mc_samples=10,
)

print("Uncertainty estimates:")
for unc in result_unc.uncertainty:
    mean_std = np.mean(unc.std)
    max_std = np.max(unc.std)
    print(f"  Task: {unc.task_name}")
    print(f"    MC samples: {unc.n_samples}")
    print(f"    Mean σ: {mean_std:.4f}, Max σ: {max_std:.4f}")

Uncertainty estimates:
  Task: cell_painting
    MC samples: 10
    Mean σ: 0.1170, Max σ: 0.1941
  Task: viability
    MC samples: 10
    Mean σ: 0.0651, Max σ: 0.0651


### 1.4 Prediction with Latent Representation

In [9]:
result_lat = pipeline.predict(
    sequence,
    protein_name="demo_protein_1",
    return_latent=True,
)

print(f"Latent dimension: {len(result_lat.latent)}")
print(f"Latent vector:    {[f'{v:.3f}' for v in result_lat.latent]}")

Latent dimension: 8
Latent vector:    ['1.002', '-0.913', '-0.774', '0.711', '0.319', '0.368', '-1.867', '1.154']


### 1.5 Batch Prediction

In [10]:
sequences = [
    "MKFLILLFNILCLFPVLAADNHGVGPQGAS",
    "ACDEFGHIKLMNPQRSTVWY",
    "GGGGGGGGGGAAAAAAAAAA",
    "MTEYKLVVVGAGGVGKSALT",
    "FWKRHCQPLAGDELLHQRRL",
]
names = [f"protein_{i+1}" for i in range(len(sequences))]

batch_results = pipeline.predict_batch(
    sequences,
    protein_names=names,
    return_uncertainty=True,
    n_mc_samples=5,
)

print(f"Batch size: {len(batch_results)}")
print()

for res in batch_results:
    cp_vals = res.predictions[0].values[:3]
    unc_mean = np.mean(res.uncertainty[0].std) if res.uncertainty else None
    print(
        f"  {res.protein_name}: "
        f"len={res.sequence_length}, "
        f"cp[:3]={[f'{v:.3f}' for v in cp_vals]}, "
        f"mean_σ={unc_mean:.4f}" if unc_mean else 'N/A',
        f"time={res.inference_time_ms:.1f}ms"
    )

Batch size: 5

  protein_1: len=30, cp[:3]=['0.358', '-0.209', '0.003'], mean_σ=0.0934 time=13.4ms
  protein_2: len=20, cp[:3]=['-0.477', '-0.516', '-0.132'], mean_σ=0.1108 time=11.4ms
  protein_3: len=20, cp[:3]=['-0.567', '-0.611', '-0.306'], mean_σ=0.0585 time=10.8ms
  protein_4: len=20, cp[:3]=['0.151', '-0.568', '-0.100'], mean_σ=0.1105 time=9.6ms
  protein_5: len=20, cp[:3]=['-0.147', '-0.371', '-0.238'], mean_σ=0.2633 time=10.4ms


### 1.6 Health Check

In [11]:
health = pipeline.health_check()
for key, value in health.items():
    print(f"  {key}: {value}")

  status: healthy
  model_loaded: True
  esm_loaded: True
  uptime_seconds: 0.2
  version: v1.0
  device: cpu
  checks: {'model_loaded': True, 'esm_loaded': True, 'checkpoint_exists': True}


## 2. Checkpoint Compatibility

The serving pipeline handles **four checkpoint formats** transparently:

| Format | Source | `config` key contains: |
|--------|--------|-----------------------|
| Trainer | `Trainer.save_checkpoint()` | `TrainerConfig` |
| Callback | `CheckpointCallback` | `TrainerConfig.__dict__` + `best_value`/`monitor` |
| Pipeline | `InferencePipeline` / `ModelRegistry` | `ProToPhenConfig` |
| Raw | `torch.save(model.state_dict(), ...)` | (no config) |

When the config contains `TrainerConfig` fields (epochs, learning_rate, etc.)
instead of `ProToPhenConfig` fields, the pipeline **automatically infers**
the model architecture from the shapes of the state dict tensors.

In [12]:
from protophen.serving.pipeline import (
    load_checkpoint,
    build_model_from_checkpoint,
    _is_trainer_config,
    _infer_model_config_from_state_dict,
)

In [13]:
# Load the Trainer-format checkpoint
ckpt = load_checkpoint(trainer_ckpt_path, device="cpu")

print("Checkpoint keys:", list(ckpt.keys()))
print(f"Epoch:           {ckpt['epoch']}")
print(f"Version:         {ckpt['version']}")
print(f"Metrics:         {ckpt['metrics']}")
print(f"TrainerConfig:   {'_trainer_config' in ckpt}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m245[0m | [1mLoading checkpoint from C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\trainer_best.pt[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m272[0m | [1mCheckpoint 'config' appears to be TrainerConfig; inferring ProToPhenConfig from state dict shapes.[0m


Checkpoint keys: ['epoch', 'global_step', 'model_state_dict', 'optimiser_state_dict', 'config', 'best_val_loss', '_trainer_config', 'version', 'metrics']
Epoch:           75
Version:         epoch_75
Metrics:         {'best_val_loss': 0.0312}
TrainerConfig:   True


In [14]:
# Demonstrate the heuristic
trainer_dict = asdict(trainer_config)
model_dict = {
    "protein_embedding_dim": 32,
    "encoder_hidden_dims": [16],
    "encoder_output_dim": 8,
}

print(f"Is TrainerConfig dict? {_is_trainer_config(trainer_dict)}")
print(f"Is model config dict?  {_is_trainer_config(model_dict)}")

Is TrainerConfig dict? True
Is model config dict?  False


In [15]:
# Demonstrate config inference from state dict
inferred_config = _infer_model_config_from_state_dict(model.state_dict())

print(f"Inferred protein_embedding_dim: {inferred_config.protein_embedding_dim}")
print(f"Inferred encoder_output_dim:    {inferred_config.encoder_output_dim}")
print(f"Inferred cell_painting_dim:     {inferred_config.cell_painting_dim}")
print(f"Inferred predict_viability:     {inferred_config.predict_viability}")
print(f"Inferred encoder_hidden_dims:   {inferred_config.encoder_hidden_dims}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m_infer_model_config_from_state_dict[0m:[36m203[0m | [1mInferred ProToPhenConfig from state dict: input=32, latent=8, cp_dim=10[0m


Inferred protein_embedding_dim: 32
Inferred encoder_output_dim:    8
Inferred cell_painting_dim:     10
Inferred predict_viability:     True
Inferred encoder_hidden_dims:   [16]


In [16]:
# Build model from the Trainer checkpoint and verify it works
restored_model = build_model_from_checkpoint(ckpt, device="cpu")

x = torch.randn(2, 32)
with torch.no_grad():
    outputs = restored_model(x)

print(f"Model restored successfully ({restored_model.n_parameters:,} params)")
for task, tensor in outputs.items():
    print(f"  {task}: {tensor.shape}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m_infer_model_config_from_state_dict[0m:[36m203[0m | [1mInferred ProToPhenConfig from state dict: input=32, latent=8, cp_dim=10[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.models.protophen[0m:[36m__init__[0m:[36m170[0m | [1mInitialised ProToPhenModel: 10,915 parameters, tasks=['cell_painting', 'viability'][0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mbuild_model_from_checkpoint[0m:[36m344[0m | [1mModel restored (epoch 75), 10,915 params on cpu[0m


Model restored successfully (10,915 params)
  cell_painting: torch.Size([2, 10])
  viability: torch.Size([2, 1])


In [17]:
# The pipeline can also load this checkpoint seamlessly
pipeline2 = InferencePipeline(config=PipelineConfig(device="cpu", use_fp16=False, include_physicochemical=False)) # Note: include_physicochemical defaults to True in production, but we have set it to False here to match our mock embedder's 32 dims (otherwise we'd get a matrix multiplication error)
pipeline2.load_model(trainer_ckpt_path)
pipeline2._esm_embedder = mock_esm

# Trainer config is preserved for reproducibility
print(f"Pipeline model version: {pipeline2.model_version}")
print(f"Trainer config available: {pipeline2.trainer_config is not None}")
if pipeline2.trainer_config:
    tc = pipeline2.trainer_config
    print(f"  learning_rate: {tc['learning_rate']}")
    print(f"  optimiser:     {tc['optimiser']}")
    print(f"  tasks:         {tc['tasks']}")

print(f"Checkpoint metrics: {pipeline2.checkpoint_metrics}")

# Predictions work normally
result = pipeline2.predict("ACDEFGHIKL")
print(f"Prediction: {len(result.predictions)} tasks, {result.inference_time_ms:.1f}ms")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m__init__[0m:[36m421[0m | [1mInferencePipeline initialised (device=cpu, checkpoint=None)[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m245[0m | [1mLoading checkpoint from C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\trainer_best.pt[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m272[0m | [1mCheckpoint 'config' appears to be TrainerConfig; inferring ProToPhenConfig from state dict shapes.[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m_infer_model_config_from_state_dict[0m:[36m203[0m | [1mInferred ProToPhenConfig from state dict: input=32, latent=8, cp_dim=10[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.models.protophen[0m:[36m__init__[0m:

Pipeline model version: epoch_75
Trainer config available: True
  learning_rate: 0.0001
  optimiser:     adamw
  tasks:         ['cell_painting', 'viability']
Checkpoint metrics: {'best_val_loss': 0.0312}


[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.embeddings.fusion[0m:[36m__init__[0m:[36m423[0m | [1mInitialised EmbeddingFusion: method=concatenate[0m


Prediction: 2 tasks, 5.3ms


## 3. Model Registry

The `ModelRegistry` provides filesystem-backed model versioning with
support for staging, production promotion, rollback, and comparison.

```bash
model_registry/
├── registry.json         ← version index
├── v1/
│   └── model.pt          ← copied checkpoint
├── v2/
│   └── model.pt
└── ...
```

In [18]:
from protophen.serving.registry import ModelRegistry, RegistryConfig

registry = ModelRegistry(
    config=RegistryConfig(
        registry_dir=str(REGISTRY_DIR),
        max_versions=20,
    )
)

print(registry)

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36m__init__[0m:[36m138[0m | [1mModelRegistry at C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry (0 versions)[0m


ModelRegistry(dir=C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry, versions=0, production=none)


### 3.1 Register Models

In [19]:
# Register the pipeline-style checkpoint
mv1 = registry.register(
    checkpoint_path=pipeline_ckpt_path,
    version="v1.0",
    description="Baseline model, cosine scheduler, 75 epochs",
    metrics={"val_r2": 0.72, "val_mse": 0.031, "val_pearson": 0.85},
    tags=["baseline", "cosine"],
)

print(f"Registered: {mv1.version} (stage={mv1.stage})")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mregister[0m:[36m254[0m | [1mRegistered model version 'v1.0' (stage=staging)[0m


Registered: v1.0 (stage=staging)


In [20]:
# Register from a Trainer checkpoint
mv2 = registry.register_from_trainer_checkpoint(
    checkpoint_path=trainer_ckpt_path,
    version="v2.0",
    tags=["improved", "longer_training"],
)

print(f"Registered: {mv2.version} (stage={mv2.stage})")
print(f"  Epoch:           {mv2.epoch}")
print(f"  Metrics:         {mv2.metrics}")
print(f"  TrainerConfig:   {mv2.trainer_config is not None}")
print(f"  Auto-description: {mv2.description}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m245[0m | [1mLoading checkpoint from C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\trainer_best.pt[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m272[0m | [1mCheckpoint 'config' appears to be TrainerConfig; inferring ProToPhenConfig from state dict shapes.[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mregister[0m:[36m254[0m | [1mRegistered model version 'v2.0' (stage=staging)[0m


Registered: v2.0 (stage=staging)
  Epoch:           75
  Metrics:         {'best_val_loss': 0.0312}
  TrainerConfig:   True
  Auto-description: Trainer checkpoint at epoch 75, best_val_loss=0.0312


In [21]:
# List all versions
for v in registry.list_versions():
    print(f"  {v.version} | stage={v.stage} | metrics={v.metrics}")

  v2.0 | stage=staging | metrics={'best_val_loss': 0.0312}
  v1.0 | stage=staging | metrics={'val_r2': 0.72, 'val_mse': 0.031, 'val_pearson': 0.85}


### 3.2 Promote to Production

In [22]:
registry.set_stage("v1.0", "production")

print(f"Production checkpoint: {registry.get_production_checkpoint()}")
print(f"v1.0 stage: {registry.get_version('v1.0').stage}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mset_stage[0m:[36m382[0m | [1mVersion 'v1.0' → stage=production[0m


Production checkpoint: C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry\v1.0\model.pt
v1.0 stage: production


In [23]:
# Promote v2.0 → automatically archives v1.0
registry.set_stage("v2.0", "production")

print(f"v1.0 stage: {registry.get_version('v1.0').stage}")  # archived
print(f"v2.0 stage: {registry.get_version('v2.0').stage}")  # production
print(f"Production: {registry.get_production_checkpoint()}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mset_stage[0m:[36m377[0m | [1mVersion 'v1.0' archived (replaced by 'v2.0')[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mset_stage[0m:[36m382[0m | [1mVersion 'v2.0' → stage=production[0m


v1.0 stage: archived
v2.0 stage: production
Production: C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry\v2.0\model.pt


### 3.3 Compare Versions

In [24]:
comparison = registry.compare_versions("v1.0", "v2.0")

print(f"Comparing {comparison['version_a']} vs {comparison['version_b']}:")
for metric, data in comparison["metrics"].items():
    delta = data.get("delta")
    delta_str = f"{delta:+.4f}" if delta is not None else "N/A"
    print(f"  {metric}: {data.get('v1.0', 'N/A')} → {data.get('v2.0', 'N/A')} (Δ={delta_str})")

Comparing v1.0 vs v2.0:
  best_val_loss: None → 0.0312 (Δ=N/A)
  val_mse: 0.031 → None (Δ=N/A)
  val_pearson: 0.85 → None (Δ=N/A)
  val_r2: 0.72 → None (Δ=N/A)


### 3.4 Rollback

In [25]:
# Rollback to the most recently archived version (v1.0)
rolled_back = registry.rollback()

print(f"Rolled back to: {rolled_back.version} (stage={rolled_back.stage})")
print(f"Production checkpoint: {registry.get_production_checkpoint()}")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mset_stage[0m:[36m377[0m | [1mVersion 'v2.0' archived (replaced by 'v1.0')[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mset_stage[0m:[36m382[0m | [1mVersion 'v1.0' → stage=production[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mrollback[0m:[36m465[0m | [1mRolled back to version 'v1.0'[0m


Rolled back to: v1.0 (stage=production)
Production checkpoint: C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry\v1.0\model.pt


### 3.5 Best Version Selection

In [26]:
# Re-promote v2.0 so both are accessible
registry.set_stage("v2.0", "production")

best = registry.get_best_version("val_r2", higher_is_better=True)
print(f"Best by val_r2: {best.version} (val_r2={best.metrics.get('val_r2')})")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mset_stage[0m:[36m377[0m | [1mVersion 'v1.0' archived (replaced by 'v2.0')[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.registry[0m:[36mset_stage[0m:[36m382[0m | [1mVersion 'v2.0' → stage=production[0m


Best by val_r2: v1.0 (val_r2=0.72)


In [27]:
# Registry summary
summary = registry.summary()
for key, value in summary.items():
    print(f"  {key}: {value}")

  registry_dir: C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry
  total_versions: 2
  stages: {'archived': 1, 'production': 1}
  production_version: v2.0
  latest_version: v2.0


### 3.6 Load Production Model into Pipeline

**This is the typical deployment pattern:**
 1. Registry resolves the production checkpoint
 2. Pipeline loads it


In [28]:
# Resolve checkpoint for production model from registry
prod_path = registry.get_production_checkpoint()

# Load pipeline and mock embedder
prod_pipeline = InferencePipeline(
    checkpoint_path=prod_path,
    config=PipelineConfig(device="cpu", use_fp16=False, include_physicochemical=False), # Note: include_physicochemical defaults to True in production, but we have set it to False here to match our mock embedder's 32 dims (otherwise we'd get a matrix multiplication error)
)
prod_pipeline._esm_embedder = mock_esm

# Perform inference with the production model
result = prod_pipeline.predict("MKFLILLFNILCLFPVLAADNHGVGPQGAS")
print(f"Production model prediction: {len(result.predictions)} tasks")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m__init__[0m:[36m421[0m | [1mInferencePipeline initialised (device=cpu, checkpoint=C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry\v2.0\model.pt)[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m245[0m | [1mLoading checkpoint from C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\model_registry\v2.0\model.pt[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m272[0m | [1mCheckpoint 'config' appears to be TrainerConfig; inferring ProToPhenConfig from state dict shapes.[0m
[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m_infer_model_config_from_state_dict[0m:[36m203[0m | [1mInferred ProToPhenConfig from state dict: input=32, latent=8, cp_dim=10[0m
[32m2026-02-18 1

Production model prediction: 2 tasks


## 4. Monitoring & Drift Detection

The `PredictionMonitor` tracks latency, throughput, and prediction
distribution statistics.

The `DriftDetector` uses the Kolmogorov-Smirnov test to flag distribution shift relative to a reference (typically the training/validation set).

In [29]:
from protophen.serving.monitoring import (
    PredictionMonitor,
    MonitoringConfig,
    DriftDetector,
)

monitor = PredictionMonitor(
    config=MonitoringConfig(
        window_size=500,
        enable_drift_detection=True,
        drift_window_size=100,
        drift_significance=0.05,
        log_predictions=False,  # quiet for notebook
        track_regression_metrics=True,
    )
)

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.monitoring[0m:[36m__init__[0m:[36m245[0m | [1mPredictionMonitor initialised[0m


### 4.1 Record Predictions

In [30]:
# Simulate 200 prediction requests
rng = np.random.default_rng(42)

for i in range(200):
    pred = rng.standard_normal(10).astype(np.float32)
    monitor.record_request(
        latency_ms=15.0 + rng.standard_normal() * 3.0,
        sequence_length=100 + rng.integers(0, 200),
        predictions={"cell_painting": pred},
        protein_id=f"prot_{i:04d}",
    )

print("After 200 requests:")
summary = monitor.summary()
print(f"  Total requests:  {summary['total_requests']}")
print(f"  Total errors:    {summary['total_errors']}")
print(f"  Throughput:      {summary['throughput_rps']:.1f} req/s")
if "latency_ms" in summary:
    lat = summary["latency_ms"]
    print(f"  Latency p50:     {lat['p50']:.1f} ms")
    print(f"  Latency p99:     {lat['p99']:.1f} ms")

[32m2026-02-18 16:38:23[0m | [1mINFO    [0m | [36mprotophen.serving.monitoring[0m:[36madd_observation[0m:[36m533[0m | [1mDrift reference auto-set for 'cell_painting' from first 100 observations[0m


After 200 requests:
  Total requests:  200
  Total errors:    0
  Throughput:      12500.0 req/s
  Latency p50:     15.1 ms
  Latency p99:     21.6 ms


### 4.2 Drift Detection

The drift detector auto-sets its reference from the first window_size observations, then compares subsequent windows.


In [31]:

drift_report = summary.get("drift", {})
for task, info in drift_report.items():
    print(f"Task: {task}")
    for key, value in info.items():
        print(f"  {key}: {value}")

Task: cell_painting
  drift_detected: False
  p_value: 0.368188
  reference_set: True
  current_observations: 100


### 4.3 Explicit Reference from Trainer Predictions

In production, you'd typically set the reference from your validation set:

```python
    trainer_output = trainer.predict(val_loader, return_targets=True)

    monitor._drift_detector.set_reference_from_trainer(trainer_output)
```

In [32]:
# Demonstrate with synthetic data
detector = DriftDetector(window_size=50, significance=0.05)

# Set reference from "training" predictions
ref_predictions = rng.standard_normal((200, 10))
detector.set_reference_from_trainer({
    "cell_painting_predictions": ref_predictions,
})

print("Reference set:")
print(f"  {detector.report()}")

# Add observations from the same distribution (no drift expected)
for _ in range(60):
    detector.add_observation("cell_painting", rng.standard_normal(10))

print(f"\nAfter same-distribution observations:")
print(f"  Drift detected: {detector.report()['cell_painting']['drift_detected']}")

# Now shift the distribution
for _ in range(60):
    detector.add_observation("cell_painting", rng.standard_normal(10) + 5.0)

print(f"\nAfter shifted observations:")
report = detector.report()["cell_painting"]
print(f"  Drift detected: {report['drift_detected']}")
print(f"  KS p-value:     {report['p_value']:.2e}")

Reference set:




  {'cell_painting': {'drift_detected': False, 'p_value': 1.0, 'reference_set': True, 'current_observations': 0}}

After same-distribution observations:
  Drift detected: False





After shifted observations:
  Drift detected: True
  KS p-value:     0.00e+00


## 5. Feedback & Quality Tracking

When you have generated wet-lab results, they can be fed back through the monitoring
system to track prediction quality over time.

The `PredictionQualityTracker` reuses the same regression metrics from `protophen.training.metrics` so that training and production evaluation are consistent.

In [33]:
from protophen.serving.monitoring import PredictionQualityTracker

tracker = PredictionQualityTracker(window_size=200)

# Simulate feedback: predictions with small noise
for i in range(20):
    true_phenotype = rng.standard_normal(10).astype(np.float32)
    predicted_phenotype = true_phenotype + rng.standard_normal(10).astype(np.float32) * 0.1

    tracker.add(f"prot_{i:04d}", predicted_phenotype, true_phenotype)

print(f"Quality pairs stored: {tracker.n_pairs}")

metrics = tracker.compute_metrics()
print("\nPrediction quality metrics (from training.metrics):")
for name, value in sorted(metrics.items()):
    print(f"  {name}: {value:.4f}")

Quality pairs stored: 20

Prediction quality metrics (from training.metrics):
  quality_cosine_similarity: 0.9947
  quality_mae: 0.0778
  quality_mse: 0.0104
  quality_pearson: 0.9947
  quality_r2: 0.9908
  quality_rmse: 0.1019


### 5.1 Monitor-Integrated Feedback

In the full API workflow, the monitor automatically matches cached
predictions with incoming feedback:

In [34]:
# Create a fresh monitor
monitor2 = PredictionMonitor(
    config=MonitoringConfig(
        enable_drift_detection=False,
        log_predictions=False,
        track_regression_metrics=True,
    )
)

# Simulate: predict, then receive feedback
for i in range(10):
    pred = rng.standard_normal(10).astype(np.float32)
    obs = pred + rng.standard_normal(10).astype(np.float32) * 0.05

    # Step 1: Record prediction (caches prediction by protein_id)
    monitor2.record_request(
        latency_ms=12.0,
        sequence_length=150,
        predictions={"cell_painting": pred},
        protein_id=f"fb_prot_{i}",
    )

    # Step 2: Feed observation back (monitor matches to cached prediction)
    monitor2.record_feedback(
        protein_id=f"fb_prot_{i}",
        observation=obs,
    )

summary2 = monitor2.summary()
print(f"Total requests:  {summary2['total_requests']}")
print(f"Total feedback:  {summary2['total_feedback']}")

if "prediction_quality" in summary2:
    pq = summary2["prediction_quality"]
    print(f"\nPrediction quality ({pq['n_pairs']} pairs):")
    for name, value in sorted(pq.items()):
        if name != "n_pairs":
            print(f"  {name}: {value:.4f}")

[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.serving.monitoring[0m:[36m__init__[0m:[36m245[0m | [1mPredictionMonitor initialised[0m


Total requests:  10
Total feedback:  10

Prediction quality (10 pairs):
  quality_cosine_similarity: 0.9990
  quality_mae: 0.0395
  quality_mse: 0.0024
  quality_pearson: 0.9990
  quality_r2: 0.9982
  quality_rmse: 0.0492


## 6. REST API

The `create_app()` factory produces a FastAPI application with
endpoints for prediction, health checks, monitoring, and feedback.

### Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/predict` | Single protein prediction |
| `POST` | `/predict/batch` | Batch prediction (up to 1000) |
| `POST` | `/feedback` | Active-learning feedback ingestion |
| `GET` | `/health` | Readiness/liveness probe |
| `GET` | `/model/info` | Model metadata |
| `GET` | `/metrics` | Monitoring summary (JSON) |
| `GET` | `/metrics/prometheus` | Prometheus-formatted metrics |

### 6.1 Launch the Server (CLI)

```bash
# From a checkpoint:
python scripts/serve.py --checkpoint checkpoints/best.pt

# From the model registry:
python scripts/serve.py --registry ./model_registry

# With full config:
python scripts/serve.py \
    --checkpoint checkpoints/best.pt \
    --config configs/deployment.yaml \
    --host 0.0.0.0 --port 8000 \
    --device cuda
```

### 6.2 In-Process Testing with httpx

For notebooks and tests, we can use FastAPI's `TestClient` to
exercise the API without starting a real server.

In [35]:
try:
    from fastapi.testclient import TestClient
    from protophen.serving.api import create_app

    _FASTAPI_AVAILABLE = True
except ImportError:
    _FASTAPI_AVAILABLE = False
    print("FastAPI not installed — skipping API examples.")
    print("Install with: pip install 'protophen[serving]'")

In [36]:
if _FASTAPI_AVAILABLE:
    # Create the app with our demo checkpoint
    app = create_app(
        checkpoint_path=str(pipeline_ckpt_path),
        pipeline_config=PipelineConfig(
            device="cpu",
            use_fp16=False,
            include_physicochemical=False,
        ),
        monitoring_config=MonitoringConfig(
            enable_drift_detection=False,
            log_predictions=False,
        ),
        registry_dir=str(REGISTRY_DIR),
        feedback_dir=str(FEEDBACK_DIR),
    )

    # Enter content to trigger Lifespan (loads model)
    client = TestClient(app)
    client.__enter__()

    # Inject mock ESM into the loaded pipeline
    state = app.state._protophen
    if state.pipeline is not None:
        state.pipeline._esm_embedder = mock_esm

    print("TestClient ready")

[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.serving.api[0m:[36mlifespan[0m:[36m144[0m | [1mProToPhen API starting up[0m
[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36m__init__[0m:[36m421[0m | [1mInferencePipeline initialised (device=cpu, checkpoint=C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\pipeline_v1.pt)[0m
[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mload_checkpoint[0m:[36m245[0m | [1mLoading checkpoint from C:\Users\adou0002\AppData\Local\Temp\protophen_deploy_f2qccnb8\checkpoints\pipeline_v1.pt[0m
[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.models.protophen[0m:[36m__init__[0m:[36m170[0m | [1mInitialised ProToPhenModel: 10,915 parameters, tasks=['cell_painting', 'viability'][0m
[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.serving.pipeline[0m:[36mbuild_model_from_checkpoint[0

TestClient ready


[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.serving.api[0m:[36mlifespan[0m:[36m172[0m | [1mProToPhen API shutting down[0m


### 6.3 Health Check

In [37]:
if _FASTAPI_AVAILABLE:
    resp = client.get("/health")
    print(f"Status: {resp.status_code}")
    health = resp.json()
    for key, value in health.items():
        print(f"  {key}: {value}")

Status: 200
  status: healthy
  model_loaded: True
  esm_loaded: True
  uptime_seconds: 0.0
  version: v1.0
  device: cpu
  checks: {'model_loaded': True, 'esm_loaded': True, 'checkpoint_exists': True}


### 6.4 Model Info

In [38]:
if _FASTAPI_AVAILABLE:
    resp = client.get("/model/info")
    print(f"Status: {resp.status_code}")
    info = resp.json()
    for key, value in info.items():
        print(f"  {key}: {value}")

Status: 200
  model_version: v1.0
  model_name: ProToPhen
  tasks: {'cell_painting': 10, 'viability': 1}
  latent_dim: 8
  protein_embedding_dim: 32
  n_parameters: 10915
  n_trainable_parameters: 10915
  encoder_hidden_dims: [16]
  decoder_hidden_dims: [16]
  esm_model: esm2_t33_650M_UR50D
  fusion_method: concatenate
  device: cpu
  loaded_at: 2026-02-18T05:38:24.637569+00:00


### 6.5 Single Prediction

In [39]:
if _FASTAPI_AVAILABLE:
    resp = client.post("/predict", json={
        "protein": {
            "sequence": "MKFLILLFNILCLFPVLAADNHGVGPQGAS",
            "name": "demo_protein",
        },
        "return_latent": True,
        "return_uncertainty": True,
        "n_uncertainty_samples": 5,
    })

    print(f"Status: {resp.status_code}")
    data = resp.json()
    print(f"Protein:       {data['protein_name']}")
    print(f"Sequence len:  {data['sequence_length']}")
    print(f"Inference:     {data['inference_time_ms']:.1f} ms")
    print(f"Latent dim:    {len(data['latent'])}")
    print(f"N predictions: {len(data['predictions'])}")
    print(f"N uncertainty: {len(data['uncertainty'])}")

[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.embeddings.fusion[0m:[36m__init__[0m:[36m423[0m | [1mInitialised EmbeddingFusion: method=concatenate[0m


Status: 200
Protein:       demo_protein
Sequence len:  30
Inference:     18.5 ms
Latent dim:    8
N predictions: 2
N uncertainty: 2


### 6.6 Batch Prediction

In [40]:
if _FASTAPI_AVAILABLE:
    resp = client.post("/predict/batch", json={
        "proteins": [
            {"sequence": "MKFLILLFNILCLFPVLAADNHGVGPQGAS", "name": "prot_a"},
            {"sequence": "ACDEFGHIKLMNPQRSTVWY", "name": "prot_b"},
            {"sequence": "MTEYKLVVVGAGGVGKSALT", "name": "prot_c"},
        ],
        "return_uncertainty": False,
    })

    print(f"Status: {resp.status_code}")
    data = resp.json()
    print(f"N proteins:     {data['n_proteins']}")
    print(f"Total time:     {data['total_inference_time_ms']:.1f} ms")
    print(f"Model version:  {data['model_version']}")
    for r in data["results"]:
        print(f"  {r['protein_name']}: len={r['sequence_length']}, "
              f"tasks={len(r['predictions'])}")

Status: 200
N proteins:     3
Total time:     7.5 ms
Model version:  v1.0
  prot_a: len=30, tasks=2
  prot_b: len=20, tasks=2
  prot_c: len=20, tasks=2


### 6.7 Feedback Endpoint

In [41]:
if _FASTAPI_AVAILABLE:
    resp = client.post("/feedback", json={
        "protein_id": "demo_protein",
        "sequence": "MKFLILLFNILCLFPVLAADNHGVGPQGAS",
        "observed_features": [0.1] * 10,
        "plate_id": "plate_001",
        "well_id": "A02",
        "cell_count": 1500,
        "trigger_reselection": False,
    })

    print(f"Status: {resp.status_code}")
    fb = resp.json()
    for key, value in fb.items():
        print(f"  {key}: {value}")

[32m2026-02-18 16:38:24[0m | [1mINFO    [0m | [36mprotophen.serving.api[0m:[36mfeedback[0m:[36m357[0m | [1mFeedback received for protein 'demo_protein': 10 features (total stored: 1)[0m


Status: 200
  status: accepted
  protein_id: demo_protein
  message: Feedback stored successfully (1 total entries).
  reselection_triggered: False
  next_candidates: None


### 6.8 Monitoring Endpoint

In [42]:
if _FASTAPI_AVAILABLE:
    resp = client.get("/metrics")
    print(f"Status: {resp.status_code}")
    metrics = resp.json()
    for key, value in metrics.items():
        if isinstance(value, dict):
            print(f"  {key}:")
            for k2, v2 in value.items():
                print(f"    {k2}: {v2}")
        else:
            print(f"  {key}: {value}")

Status: 200
  total_requests: 4
  total_errors: 0
  total_feedback: 0
  error_rate: 0.0
  throughput_rps: 32.0
  uptime_seconds: 0.1
  latency_ms:
    mean: 6.45
    p50: 2.71
    p95: 16.22
    p99: 18.03
    max: 18.48
  prediction_norm:
    mean: 0.784
    std: 0.2028


### 6.9 Input Validation

The API rejects invalid sequences with clear error messages:

In [43]:
if _FASTAPI_AVAILABLE:
    # Invalid characters
    resp = client.post("/predict", json={
        "protein": {"sequence": "MKFLIL123"},
    })
    print(f"Invalid chars → {resp.status_code}: {resp.json()['detail']}")

    # Empty sequence
    resp = client.post("/predict", json={
        "protein": {"sequence": "   "},
    })
    print(f"Empty seq     → {resp.status_code}")

    # Empty batch
    resp = client.post("/predict/batch", json={
        "proteins": [],
    })
    print(f"Empty batch   → {resp.status_code}")

Invalid chars → 422: [{'type': 'value_error', 'loc': ['body', 'protein', 'sequence'], 'msg': "Value error, Sequence contains invalid characters: {'3', '1', '2'}. Allowed: ACDEFGHIKLMNPQRSTVWY", 'input': 'MKFLIL123', 'ctx': {'error': {}}}]
Empty seq     → 422
Empty batch   → 422


## 7. Docker Deployment

ProToPhen includes Docker configurations for containerised deployment.

### File Structure

```bash
docker/
├── Dockerfile           # CPU-only image
├── Dockerfile.gpu       # CUDA-enabled image
└── docker-compose.yml   # Service orchestration
```

### 7.1 Build & Run (CPU)

```bash
# Build
docker build -t protophen:latest -f docker/Dockerfile .

# Run with a checkpoint mounted from the host
docker run -p 8000:8000 \
    -v ./checkpoints:/app/checkpoints:ro \
    -v ./model_registry:/app/model_registry \
    protophen:latest \
    python scripts/serve.py \
        --checkpoint /app/checkpoints/best.pt \
        --config /app/configs/deployment.yaml
```

### 7.2 Build & Run (GPU)

```bash
docker build -t protophen:gpu -f docker/Dockerfile.gpu .

docker run --gpus all -p 8000:8000 \
    -v ./checkpoints:/app/checkpoints:ro \
    protophen:gpu \
    python scripts/serve.py \
        --checkpoint /app/checkpoints/best.pt \
        --device cuda
```

### 7.3 Docker Compose

```bash
# Start all services (API + optional Prometheus)
docker compose -f docker/docker-compose.yml up -d

# Check logs
docker compose -f docker/docker-compose.yml logs -f protophen-api

# Stop
docker compose -f docker/docker-compose.yml down
```

### 7.4 Batch Inference in Docker

```bash
docker run \
    -v ./data:/app/data \
    -v ./checkpoints:/app/checkpoints:ro \
    protophen:latest \
    python scripts/batch_inference.py \
        --input /app/data/proteins.fasta \
        --checkpoint /app/checkpoints/best.pt \
        --output /app/data/predictions.parquet \
        --uncertainty
```

## 8. Configuration Reference

The deployment is configured via `configs/deployment.yaml`.

> **NOTE:** CLI flags override YAML values.

```yaml
logging:
  level: "INFO"
  log_file: null

pipeline:
  checkpoint_path: null
  esm_model_name: "esm2_t33_650M_UR50D"
  device: "auto"
  use_fp16: true
  include_physicochemical: true
  max_batch_size: 64
  max_sequence_length: 2000
  default_mc_samples: 20

api:
  host: "0.0.0.0"
  port: 8000
  workers: 1
  reload: false

monitoring:
  window_size: 1000
  enable_drift_detection: true
  drift_significance: 0.01
  track_regression_metrics: true

registry:
  registry_dir: "./model_registry"
  max_versions: 20

feedback:
  persist_dir: "./data/feedback"
```

## 9. Summary

| Component | Class / Function | Purpose |
|-----------|-----------------|---------|
| **Pipeline** | `InferencePipeline` | seq → embedding → prediction |
| **Checkpoint** | `load_checkpoint()` | Normalise any checkpoint format |
| **Checkpoint** | `build_model_from_checkpoint()` | Reconstruct model from checkpoint |
| **Registry** | `ModelRegistry` | Version, promote, rollback |
| **Registry** | `register_from_trainer_checkpoint()` | Auto-extract Trainer metadata |
| **Monitor** | `PredictionMonitor` | Latency, throughput, errors |
| **Drift** | `DriftDetector` | KS-test distribution shift |
| **Quality** | `PredictionQualityTracker` | Feedback-based R², MSE, Pearson |
| **API** | `create_app()` | FastAPI application factory |
| **CLI** | `scripts/serve.py` | Launch API server |
| **CLI** | `scripts/batch_inference.py` | Batch processing with resume |
| **Docker** | `docker/Dockerfile` | CPU container |
| **Docker** | `docker/Dockerfile.gpu` | GPU container |

**Next steps:**
- Addition of the phenotype autoencoder and two-phase pre-training
- The autoencoder's latent space will become the prediction target,
  further improving the serving pipeline's output quality.

---
## Cleanup

In [44]:
if _FASTAPI_AVAILABLE:
    client.__exit__(None, None, None)

shutil.rmtree(WORK_DIR, ignore_errors=True)
print("Temporary files cleaned up.")

Temporary files cleaned up.
