# Chapter 38: From Development to Production

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the key differences between a development environment and a production environment
- Identify the challenges that arise when moving a machine learning model from a notebook to a live system
- Organize code into modular, reusable components for maintainability
- Manage configuration settings (e.g., file paths, model parameters) using environment variables and configuration files
- Handle dependencies and environment reproducibility using virtual environments and containerization
- Implement logging and monitoring to track model performance and system health in production
- Design robust error handling and retry logic for real‑world data pipelines
- Write unit tests and integration tests for data processing and model inference
- Document the system architecture and APIs for team collaboration
- Adopt best practices for version control, CI/CD, and deployment

---

## **38.1 Introduction: The Development‑Production Gap**

In the earlier chapters, we focused on building and evaluating models in a Jupyter notebook or a local Python script. This is the **development** phase: we explore data, engineer features, experiment with algorithms, and tune hyperparameters. However, a model that performs well in a notebook is not ready for live use. The **production** environment has different requirements:

- **Scalability:** The system must handle requests (e.g., predictions for many stocks) reliably and quickly.
- **Reliability:** It must run 24/7, handle errors gracefully, and recover from failures.
- **Maintainability:** Code must be organized, tested, and documented so that other team members (or your future self) can understand and modify it.
- **Reproducibility:** The exact environment (Python version, library versions) must be captured to avoid "it works on my machine" problems.
- **Monitoring:** You need to know if the model's performance degrades over time (data drift, concept drift) or if the system is down.

Bridging this gap requires engineering discipline. In this chapter, we will walk through the steps to productionize the NEPSE prediction system.

---

## **38.2 Production Requirements**

Before writing any production code, we must define the requirements. For the NEPSE system, typical requirements might be:

- **Input:** Daily receives new data (CSV or API) at market close.
- **Output:** Generates predictions for the next day's return for each stock, stored in a database or delivered via API.
- **Frequency:** Batch job runs once per day after data arrives.
- **Latency:** Not critical for batch, but if real‑time, could be < 1 second.
- **Accuracy:** Must maintain validation performance; if performance drops, alert.
- **Failover:** If a step fails, retry or fallback to a simple model.

These requirements guide the architecture.

---

## **38.3 Code Organization**

A Jupyter notebook is great for exploration but terrible for production. We need to refactor the code into a structured Python project.

A typical project layout might look like:

```
nepse_prediction/
├── config/
│   └── config.yaml          # configuration parameters
├── data/
│   ├── raw/                  # raw input data
│   ├── processed/             # cleaned/featured data
│   └── models/                # saved model artifacts
├── src/
│   ├── __init__.py
│   ├── data/
│   │   ├── collector.py       # data ingestion (API, CSV)
│   │   ├── cleaner.py         # data cleaning
│   │   └── features.py        # feature engineering
│   ├── models/
│   │   ├── train.py           # model training script
│   │   ├── predict.py         # prediction script
│   │   └── utils.py           # common utilities (scaling, etc.)
│   ├── evaluation/
│   │   └── metrics.py         # custom metrics
│   └── monitoring/
│       └── drift_detection.py  # data drift checks
├── tests/
│   ├── test_data.py
│   ├── test_features.py
│   └── test_model.py
├── notebooks/                  # exploration notebooks (not for production)
├── requirements.txt            # dependencies
├── setup.py                    # installable package (optional)
├── Dockerfile                  # container definition
├── .env                        # environment variables (not committed)
└── README.md
```

**Explanation:**  
Separating concerns into modules makes the code easier to test, maintain, and scale. Each module has a clear responsibility.

---

## **38.4 Configuration Management**

Hard‑coding parameters (file paths, model hyperparameters, API keys) is a recipe for disaster. Instead, use configuration files and environment variables.

### **38.4.1 Using YAML Configuration**

Create a `config/config.yaml` file:

```yaml
data:
  raw_path: "data/raw/nepse_{date}.csv"
  processed_path: "data/processed/features_{date}.parquet"
  model_path: "data/models/model.pkl"

features:
  lag_list: [1, 2, 3, 5]
  windows: [5, 10, 20]
  technical_indicators: ["RSI", "MACD"]

model:
  name: "RandomForest"
  params:
    n_estimators: 100
    max_depth: 5
    random_state: 42

training:
  test_size: 0.2
  cv_splits: 3

api:
  host: "0.0.0.0"
  port: 8000
```

Load it in Python:

```python
import yaml

with open("config/config.yaml", "r") as f:
    config = yaml.safe_load(f)

raw_path = config["data"]["raw_path"]
model_params = config["model"]["params"]
```

### **38.4.2 Environment Variables for Secrets**

Sensitive information (database passwords, API keys) should never be in the config file. Use environment variables.

```python
import os

DB_PASSWORD = os.getenv("DB_PASSWORD")
if DB_PASSWORD is None:
    raise ValueError("DB_PASSWORD environment variable not set")
```

You can use a `.env` file for local development (loaded via `python-dotenv`), but never commit it.

---

## **38.5 Dependency and Environment Management**

Reproducibility is critical. You must capture the exact versions of all libraries.

### **38.5.1 Virtual Environments**

Use `venv` or `conda` to create an isolated environment.

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

### **38.5.2 requirements.txt**

Generate a comprehensive list of dependencies:

```bash
pip freeze > requirements.txt
```

But this includes all packages, even transitive ones. Better to manually specify top‑level packages with versions, and let the installer resolve dependencies. Tools like `pip-tools` can help.

### **38.5.3 Docker**

Containers encapsulate the entire environment, including the operating system. A `Dockerfile` might look like:

```dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY src/ ./src/
COPY config/ ./config/

CMD ["python", "src/models/predict.py"]
```

Then build and run:

```bash
docker build -t nepse-predictor .
docker run nepse-predictor
```

This ensures the exact same environment runs anywhere.

---

## **38.6 Modular Architecture**

### **38.6.1 Data Ingestion Module**

`src/data/collector.py` handles loading raw data from CSV or API.

```python
import pandas as pd
import requests
from datetime import datetime

class DataCollector:
    def __init__(self, data_url=None):
        self.data_url = data_url
    
    def from_csv(self, filepath):
        return pd.read_csv(filepath)
    
    def from_api(self, date):
        # Example: fetch data from some API
        response = requests.get(f"{self.data_url}?date={date}")
        response.raise_for_status()
        data = response.json()
        return pd.DataFrame(data)
    
    def save_raw(self, df, path):
        df.to_csv(path, index=False)
```

### **38.6.2 Feature Engineering Module**

`src/data/features.py` contains functions to create features from raw data.

```python
import pandas as pd
import numpy as np

class FeatureEngineer:
    def __init__(self, config):
        self.lag_list = config["features"]["lag_list"]
        self.windows = config["features"]["windows"]
    
    def create_features(self, df):
        df = df.copy()
        df['Return'] = df['Close'].pct_change() * 100
        for lag in self.lag_list:
            df[f'Return_Lag{lag}'] = df['Return'].shift(lag)
        for w in self.windows:
            df[f'MA_{w}'] = df['Close'].rolling(w).mean()
            df[f'Volatility_{w}'] = df['Return'].rolling(w).std()
        # ... more features
        return df.dropna()
```

### **38.6.3 Model Training Module**

`src/models/train.py` trains and saves the model.

```python
import joblib
from sklearn.ensemble import RandomForestRegressor
from src.data.features import FeatureEngineer
from src.data.collector import DataCollector

def train_model(config):
    # Load data
    collector = DataCollector()
    df_raw = collector.from_csv(config["data"]["raw_path"])
    
    # Engineer features
    engineer = FeatureEngineer(config)
    df_features = engineer.create_features(df_raw)
    
    # Prepare X, y
    feature_cols = [c for c in df_features.columns if c not in ['Target', 'Date']]
    X = df_features[feature_cols]
    y = df_features['Target']
    
    # Train
    model = RandomForestRegressor(**config["model"]["params"])
    model.fit(X, y)
    
    # Save
    joblib.dump(model, config["data"]["model_path"])
    return model
```

### **38.6.4 Prediction Module**

`src/models/predict.py` loads the model and makes predictions on new data.

```python
import joblib
import pandas as pd
from src.data.features import FeatureEngineer

class Predictor:
    def __init__(self, model_path, config):
        self.model = joblib.load(model_path)
        self.engineer = FeatureEngineer(config)
        self.feature_cols = None  # would be stored with model
    
    def predict(self, df_raw):
        df_features = self.engineer.create_features(df_raw)
        # Ensure features are in the same order as training
        X = df_features[self.feature_cols]
        return self.model.predict(X)
```

---

## **38.7 Logging and Monitoring**

Production systems must log events and metrics for debugging and performance tracking.

### **38.7.1 Logging**

Use Python's `logging` module instead of `print`.

```python
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

def predict():
    logger.info("Starting prediction for date %s", date)
    try:
        # ... prediction code
        logger.info("Prediction completed")
    except Exception as e:
        logger.error("Prediction failed: %s", str(e), exc_info=True)
        raise
```

Logs should be written to a file or sent to a centralized logging system (e.g., ELK stack, Splunk).

### **38.7.2 Monitoring Model Performance**

Track metrics over time to detect drift. For example, after each batch prediction, if we receive actual returns the next day, we can compute error metrics and compare to a baseline.

```python
def monitor_performance(y_true, y_pred, threshold=1.5):
    rmse = np.sqrt(np.mean((y_true - y_pred)**2))
    baseline_rmse = ...  # from training
    logger.info(f"Current RMSE: {rmse:.4f}, Baseline: {baseline_rmse:.4f}")
    if rmse > threshold * baseline_rmse:
        logger.warning("Performance degradation detected!")
        # send alert (email, Slack, etc.)
```

We can also monitor feature distributions (data drift) using statistical tests (e.g., Kolmogorov‑Smirnov).

---

## **38.8 Error Handling and Retry Logic**

Real‑world data pipelines encounter failures: network timeouts, corrupted files, missing data. Build robustness with retries and fallbacks.

```python
import time
from functools import wraps

def retry(max_attempts=3, delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    logger.warning(f"Attempt {attempt+1} failed: {e}")
                    if attempt == max_attempts - 1:
                        raise
                    time.sleep(delay * (2 ** attempt))  # exponential backoff
            return None
        return wrapper
    return decorator

class DataCollector:
    @retry(max_attempts=3, delay=2)
    def from_api(self, date):
        # ...
```

For critical failures, implement a fallback (e.g., use yesterday's data, or a simple mean model).

---

## **38.9 Testing Strategies**

Testing ensures that code behaves as expected and that changes do not break existing functionality.

### **38.9.1 Unit Tests**

Test individual functions in isolation. For example, test that feature engineering produces expected columns.

```python
import pytest
import pandas as pd
from src.data.features import FeatureEngineer

def test_create_features():
    df = pd.DataFrame({'Close': [100, 101, 102], 'Date': pd.date_range('2023-01-01', periods=3)})
    engineer = FeatureEngineer(config)
    result = engineer.create_features(df)
    assert 'Return' in result.columns
    assert len(result) == 2  # after dropping NaN
```

Run tests with `pytest`.

### **38.9.2 Integration Tests**

Test the entire pipeline end‑to‑end on a small sample of data.

```python
def test_prediction_pipeline():
    # Use a tiny dataset
    df = pd.read_csv('tests/sample_data.csv')
    predictor = Predictor(model_path='tests/test_model.pkl', config=test_config)
    preds = predictor.predict(df)
    assert len(preds) == len(df) - min_lookback
```

### **38.9.3 Model Tests**

Test that the model's performance on a holdout set meets a minimum threshold. This can be part of CI/CD to prevent deploying a bad model.

```python
def test_model_performance():
    model = joblib.load('models/prod_model.pkl')
    X_test, y_test = load_test_data()
    rmse = np.sqrt(np.mean((model.predict(X_test) - y_test)**2))
    assert rmse < 1.0  # acceptable threshold
```

---

## **38.10 Documentation**

Good documentation is essential for team collaboration and future maintenance.

- **README.md:** Overview, setup instructions, how to run.
- **API documentation:** If exposing a REST API, document endpoints, request/response formats (e.g., using Swagger/OpenAPI).
- **Model card:** Describe the model, its intended use, performance, limitations (as in Chapter 66).
- **Code comments:** Explain why, not what (the code shows what). Use docstrings for modules, classes, and functions.

Example docstring:

```python
def create_features(df):
    """
    Generate time‑series features from raw OHLCV data.

    Parameters
    ----------
    df : pd.DataFrame
        Raw data with columns: Date, Open, High, Low, Close, Vol.

    Returns
    -------
    pd.DataFrame
        DataFrame with added features (lags, rolling stats, RSI, etc.),
        with rows containing NaN values dropped.
    """
```

---

## **38.11 Version Control and CI/CD**

### **38.11.1 Git**

Use Git for version control. Commit often, with meaningful messages. Use `.gitignore` to exclude sensitive files and large data.

### **38.11.2 Continuous Integration (CI)**

Automate testing on every push using GitHub Actions, GitLab CI, or Jenkins. Example GitHub Actions workflow:

```yaml
name: CI
on: [push]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run tests
        run: pytest tests/
```

### **38.11.3 Continuous Deployment (CD)**

After tests pass, automatically deploy to a staging or production environment. For a batch job, this might mean updating a Docker image or triggering a cloud function.

---

## **38.12 Deployment Options**

Depending on infrastructure, you can deploy as:

- **Batch job** scheduled via cron, Airflow, or cloud scheduler.
- **REST API** using Flask, FastAPI, or Django.
- **Serverless function** (AWS Lambda, Google Cloud Functions) for low‑frequency inference.
- **Containerized service** running on Kubernetes or a VM.

For the NEPSE system, a daily batch job is appropriate. We'll illustrate a simple scheduled script.

```python
# run_daily.py
import schedule
import time
from src.models.predict import run_prediction_pipeline

def job():
    logger.info("Starting daily prediction job")
    try:
        run_prediction_pipeline()
        logger.info("Job completed successfully")
    except Exception as e:
        logger.error(f"Job failed: {e}")
        # send alert

schedule.every().day.at("18:00").do(job)

while True:
    schedule.run_pending()
    time.sleep(60)
```

---

## **38.13 Chapter Summary**

In this chapter, we covered the essential steps to transition a machine learning model from development to production, using the NEPSE prediction system as a guiding example.

- **Production requirements** differ from development: reliability, scalability, maintainability.
- **Code organization** into modules (data, features, models, monitoring) improves maintainability.
- **Configuration management** with YAML files and environment variables separates code from settings.
- **Environment reproducibility** via virtual environments, `requirements.txt`, and Docker ensures consistent execution.
- **Logging and monitoring** track system health and model performance, alerting on degradation.
- **Error handling and retries** make the pipeline robust to transient failures.
- **Testing** at unit, integration, and model levels prevents regressions.
- **Documentation** (README, API docs, model cards) facilitates collaboration.
- **Version control and CI/CD** automate testing and deployment.
- **Deployment options** range from batch jobs to APIs; choose based on requirements.

### **Practical Takeaways for the NEPSE System:**

- Refactor your notebook code into a structured Python package.
- Use configuration files for all parameters that may change.
- Containerize the application with Docker to ensure consistent runs.
- Set up logging and monitor daily predictions.
- Implement retry logic for data ingestion.
- Write tests for critical functions (feature engineering, data loading).
- Use Git and a CI pipeline to run tests automatically.
- Schedule the daily prediction job using cron or Airflow.

With these practices, your model is ready for reliable, maintainable production deployment. In the next chapter, **Chapter 39: Model Serialization and Storage**, we will dive deeper into saving and versioning models, including formats like Pickle, ONNX, and using model registries.

---

**End of Chapter 38**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../5. model_evaluation_and_validation/37. error_analysis.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='39. model_serialization_and_storage.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
