# Chapter 89: Documentation Strategies

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the importance of documentation for long‑term maintainability and team collaboration.
- Distinguish between different types of documentation: code documentation, API documentation, architecture documentation, user documentation, model documentation, and operational runbooks.
- Write effective docstrings following standard formats (Google, NumPy, Sphinx).
- Use tools like Sphinx, MkDocs, and Jupyter Book to generate professional documentation.
- Document REST APIs using OpenAPI/Swagger, especially with FastAPI.
- Create high‑level architecture diagrams and maintain them as code (C4 model, PlantUML).
- Develop user guides and tutorials tailored to different audiences (e.g., traders, data scientists).
- Produce model cards and data sheets to document ML models and datasets (as introduced in Chapter 77).
- Write runbooks for incident response and operational tasks.
- Maintain a team knowledge base (wiki) that evolves with the project.
- Establish a documentation review process to ensure accuracy and freshness.

---

## **89.1 Introduction to Documentation**

Documentation is often the last thing on a developer's mind, but it is one of the most important investments a team can make. Good documentation:

- **Onboards new team members faster** – they can understand the system without pinging colleagues.
- **Reduces tribal knowledge** – prevents critical information from being locked in one person's head.
- **Improves quality** – writing forces you to think through design decisions and edge cases.
- **Enables users** – whether they are traders using the NEPSE predictions or other developers integrating with your API.
- **Supports compliance** – in regulated industries, documentation is often a legal requirement.

In the context of the NEPSE prediction system, documentation spans multiple audiences:

- **Developers**: need to understand the code, architecture, and how to contribute.
- **Data scientists**: need to know how features are computed, how models are trained, and how to reproduce experiments.
- **Operations**: need runbooks for deployment, monitoring, and incident response.
- **Business stakeholders**: need high‑level overviews and model performance reports.
- **End users (traders)**: need to know how to interpret predictions and what the system can/cannot do.

This chapter will guide you through creating and maintaining documentation for each of these audiences.

---

## **89.2 Code Documentation**

Code documentation lives closest to the source. It includes comments, docstrings, and READMEs.

### **89.2.1 Docstrings**

Every public module, class, function, and method should have a docstring. Docstrings should explain *what* the code does, not *how* (the code shows how). They should also document parameters, return values, and exceptions raised.

There are several popular docstring formats. We'll use the **Google style** for its readability.

```python
def calculate_rsi(prices: pd.Series, period: int = 14) -> pd.Series:
    """
    Calculate the Relative Strength Index (RSI) for a price series.

    RSI is a momentum oscillator that measures the speed and change of price
    movements. It oscillates between 0 and 100. Traditionally, RSI is considered
    overbought when above 70 and oversold when below 30.

    Args:
        prices (pd.Series): Series of closing prices.
        period (int, optional): Lookback period for RSI calculation. Defaults to 14.

    Returns:
        pd.Series: RSI values with the same index as input, first `period` values NaN.

    Raises:
        ValueError: If `prices` contains less than `period` non‑NaN values.

    Example:
        >>> prices = pd.Series([100, 105, 103, 108, 107, 110, 109, 112, 115, 113, 116, 118, 117, 119, 120])
        >>> rsi = calculate_rsi(prices)
        >>> rsi.iloc[-1]
        65.23
    """
    if len(prices.dropna()) < period:
        raise ValueError(f"Need at least {period} non‑NaN values")
    # Implementation...
```

**Tools**: Use `pydocstyle` to enforce docstring conventions.

### **89.2.2 Inline Comments**

Use comments sparingly, only to explain *why* a particular approach was taken, or to clarify complex logic. Avoid obvious comments.

```python
# We use a while loop instead of recursion to avoid stack overflow on large datasets
while idx < len(data):
    ...
```

### **89.2.3 README Files**

Every repository should have a `README.md` at the top level. It should include:

- Project title and brief description.
- Badges (CI status, code coverage, license).
- Quick start: how to install and run a minimal example.
- Links to more detailed documentation (e.g., `docs/` folder).
- Contribution guidelines.
- License information.

For the NEPSE prediction system, the README might look like:

```markdown
# NEPSE Stock Prediction System

[![CI](https://github.com/yourorg/nepse-predictor/actions/workflows/ci.yml/badge.svg)](https://github.com/yourorg/nepse-predictor/actions/workflows/ci.yml)
[![Documentation](https://img.shields.io/badge/docs-sphinx-blue)](https://yourorg.github.io/nepse-predictor)

A machine learning system that predicts stock prices on the Nepal Stock Exchange (NEPSE) using daily OHLCV data.

## Quick Start

```bash
git clone https://github.com/yourorg/nepse-predictor
cd nepse-predictor
poetry install
poetry run python scripts/download_sample_data.py
poetry run python scripts/train_baseline.py
```

See the [documentation](https://yourorg.github.io/nepse-predictor) for detailed instructions.
```

---

## **89.3 API Documentation**

If your system exposes a REST API (as we did in Chapter 74), you must document it for users.

### **89.3.1 OpenAPI / Swagger**

FastAPI automatically generates OpenAPI (Swagger) documentation. This is a huge advantage. You just need to add descriptive docstrings to your endpoints and use Pydantic models with field descriptions.

```python
from pydantic import BaseModel, Field
from datetime import date

class PredictionRequest(BaseModel):
    symbol: str = Field(..., description="NEPSE symbol, e.g., 'NABIL'")
    date: date = Field(..., description="Date for which to predict close price")

class PredictionResponse(BaseModel):
    symbol: str
    date: date
    predicted_close: float = Field(..., description="Predicted closing price in NPR")
    model_version: str = Field(..., description="Model version used")

@app.post("/predict", response_model=PredictionResponse, summary="Predict next day close price")
async def predict(request: PredictionRequest):
    """
    Predict the closing price for a given symbol and date.

    The model uses features up to the day before the requested date.
    If data for that symbol and date is insufficient, returns an error.
    """
    # ...
```

The resulting Swagger UI (at `/docs`) provides an interactive interface for users to test the API and see documentation.

### **89.3.2 API Reference in Static Docs**

For users who prefer static documentation, you can use tools like `redoc` or integrate the OpenAPI spec into your Sphinx site using `sphinxcontrib-openapi`.

---

## **89.4 Architecture Documentation**

High‑level architecture documentation helps new team members understand how the system fits together.

### **89.4.1 C4 Model**

The C4 model (Context, Containers, Components, Code) is a lightweight approach to documenting software architecture. You can create diagrams as code using tools like **PlantUML** or **Structurizr**.

**Context diagram** (Level 1): Shows the system in its environment: users, external systems (e.g., NEPSE data source, traders).

```plantuml
@startuml
!include <C4/C4_Context>

Person(trader, "Trader", "A person who uses predictions to inform trading decisions")
System(nepse_system, "NEPSE Prediction System", "Predicts stock prices")
System_Ext(nepse_data, "NEPSE Data Feed", "Daily CSV with OHLCV data")

Rel(trader, nepse_system, "Requests predictions")
Rel(nepse_system, nepse_data, "Pulls data")
@enduml
```

**Container diagram** (Level 2): Shows the high‑level technical building blocks: web app, API, database, etc.

```plantuml
@startuml
!include <C4/C4_Container>

Person(trader, "Trader", "A person who uses predictions")

System_Boundary(nepse_system, "NEPSE Prediction System") {
    Container(api, "API Application", "FastAPI", "Handles prediction requests")
    Container(worker, "Feature Worker", "Python", "Computes features from raw data")
    ContainerDb(feature_store, "Feature Store", "Redis", "Stores feature vectors")
    ContainerDb(model_store, "Model Store", "S3", "Stores serialized models")
}

Rel(trader, api, "Uses", "HTTPS")
Rel(api, feature_store, "Reads features", "Redis protocol")
Rel(worker, feature_store, "Writes features", "Redis protocol")
Rel(worker, model_store, "Reads models", "S3 API")
@enduml

Component and code diagrams can drill down further.

### 89.4.2 Maintaining Diagrams as Code
Store diagram source files in your repository (e.g., in a `docs/diagrams` folder) and generate images as part of your documentation build. This keeps them in sync with code changes.

---

## 89.5 User Documentation

User documentation is aimed at people who use the system, such as traders or analysts.

### 89.5.1 Getting Started Guide
A step‑by‑step guide for a new user:

- How to sign up (if applicable).
- How to make their first prediction (e.g., via API or web interface).
- Explanation of the output: what does the predicted price mean? what is the error range?

### 89.5.2 Tutorials
Walk through common use cases:

- “Predicting the next day's price for a specific stock.”
- “Comparing model performance across different stocks.”
- “Using the API to batch predict for a portfolio.”

### 89.5.3 FAQs
Answer common questions:

- “How accurate is the model?”
- “How often are predictions updated?”
- “What should I do if the prediction seems off?”

### 89.5.4 Format
User documentation can be hosted on a website (e.g., using MkDocs) or as PDFs. It should be written in plain language, avoiding jargon where possible.

---

## 89.6 Model Documentation

As introduced in Chapter 77, each model should have a **model card**. This is a structured document that describes the model's purpose, performance, limitations, and ethical considerations.

### 89.6.1 Model Card Template

```markdown
# Model Card: NEPSE Close Price Predictor

## Model Details
- **Version**: 2.3.0
- **Type**: XGBoost Regressor
- **Date trained**: 2024-06-01
- **Training data**: NEPSE daily data 2018-2023
- **Features**: 15 features including lags, moving averages, RSI, volume Z‑score
- **License**: Proprietary

## Intended Use
- Predict next‑day closing price for NEPSE stocks.
- Should be used as one input among many in trading decisions, not as a sole basis.

## Factors
- Model was trained on data from a single exchange (NEPSE). It may not generalise to other exchanges.
- Performance may vary across stocks with different liquidity and volatility.

## Metrics
- **Overall MAE**: 12.34
- **MAE by volatility quartile**:
  - Low: 8.21
  - Medium: 11.45
  - High: 18.56

## Evaluation Data
- Test period: Jan‑Jun 2024 (out‑of‑time)
- Distribution matches training period (no significant drift detected)

## Ethical Considerations
- The model does not use sensitive attributes (e.g., race, gender).
- Predictions are uncertain; users should be aware of limitations.

## Caveats and Recommendations
- Model performance degrades during extreme market events (e.g., circuit breaker days).
- Retraining is recommended monthly.
```

Store model cards in a central location, perhaps alongside the model in the registry (e.g., MLflow).

---

## 89.7 Data Documentation

Similarly, datasets should be documented with **data sheets** (Gebru et al., 2018). A data sheet includes:

- **Motivation**: Why was the dataset created?
- **Composition**: What instances does it contain? (e.g., stocks, dates, features)
- **Collection process**: How was the data collected? (e.g., from NEPSE CSV)
- **Preprocessing**: Any cleaning, imputation, or feature engineering applied.
- **Uses**: What tasks is the dataset suitable for? What should it not be used for?
- **Distribution**: How is the data shared? (e.g., private S3 bucket)
- **Maintenance**: Who maintains it? How are updates handled?

For the NEPSE system, a data sheet for the raw OHLCV dataset would be valuable for anyone using the data for analysis.

---

## 89.8 Operational Documentation (Runbooks)

Runbooks are step‑by‑step guides for handling operational tasks and incidents. They are essential for on‑call engineers.

### 89.8.1 Common Runbook Sections

- **Title**: e.g., “Data Ingestion Failure”
- **Symptoms**: What alerts or user reports might indicate this issue?
- **Severity**: P1, P2, etc.
- **Initial checks**: Quick things to verify (e.g., “Check if the CSV file is present in S3”)
- **Resolution steps**: Detailed instructions to fix the problem.
- **Escalation**: Who to contact if the issue persists.

### 89.8.2 Example Runbook

```markdown
# Runbook: Data Ingestion Failure

**Symptoms**:
- Alert: "No new data for 24 hours"
- Prediction service returns 404 for recent dates
- Grafana dashboard shows flat line in data freshness metric

**Severity**: P2 (if >24h stale) → P1 (if >48h)

**Initial Checks**:
1. Log into the ingestion server: `ssh ingestion‑server`
2. Check the ingestion logs: `journalctl -u nepse‑ingestion -n 50`
3. Verify that the source CSV exists in the configured S3 bucket: `aws s3 ls s3://nepse-raw/ --recursive | grep $(date +%Y-%m-%d)`

**Resolution**:
- If source file missing, contact NEPSE data provider (see contacts in wiki).
- If source file present but not processed, restart the ingestion service: `sudo systemctl restart nepse‑ingestion`
- If service fails to start, check for Python errors in logs and fix as needed.

**Escalation**: If unresolved after 1 hour, page the data engineering team (#data‑eng on Slack).
```

Store runbooks in a version‑controlled repository (e.g., a `runbooks/` folder) and link them from your monitoring alerts.

---

## 89.9 Knowledge Base / Wiki

A team wiki (Confluence, Notion, GitHub Wiki) serves as the central repository for all non‑code documentation that doesn't fit elsewhere.

### 89.9.1 Suggested Wiki Structure

- **Home**: Overview and links to key documents.
- **Architecture**: Diagrams, design decisions (ADRs), technology stack.
- **Development**: Setup guide, coding standards, pull request process.
- **Operations**: Deployment process, monitoring, runbooks.
- **Data**: Data dictionary, data sources, feature definitions.
- **Models**: Model cards, experiment results, retraining schedule.
- **Project Management**: Roadmap, meeting notes, OKRs.
- **Onboarding**: Step‑by‑step guide for new team members.

### 89.9.2 Keeping the Wiki Alive

- Make it easy to update (anyone can edit).
- Encourage linking to the wiki from pull requests and Slack discussions.
- Regularly review and archive outdated pages.

---

## 89.10 Documentation Tools and Automation

### 89.10.1 Sphinx for Python Projects
Sphinx is the de facto standard for Python documentation. It extracts docstrings and can generate HTML/PDF. Use with the `autodoc` extension.

```bash
# In docs/ directory
sphinx-quickstart
# Edit conf.py to include extensions:
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'sphinx.ext.viewcode']
# Generate HTML
make html
```

Host the generated documentation on Read the Docs or GitHub Pages.

### 89.10.2 MkDocs for User Documentation
MkDocs is simpler and better for user‑facing docs. Write in Markdown, and it generates a static site.

```yaml
# mkdocs.yml
site_name: NEPSE Predictor
nav:
  - Home: index.md
  - User Guide: user-guide.md
  - API Reference: api.md
  - Model Cards: models.md
theme: readthedocs
```

### 89.10.3 Jupyter Book
If you have many Jupyter notebooks (e.g., analysis, tutorials), Jupyter Book can compile them into a book‑like structure.

### 89.10.4 Continuous Documentation
Integrate documentation building into your CI pipeline. For example, build Sphinx and MkDocs on every push and deploy to a staging site. Only publish to production when merging to main.

```yaml
# .github/workflows/docs.yml
name: Build Docs

on: [push]

jobs:
  docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: |
          pip install sphinx sphinx-rtd-theme
      - name: Build Sphinx docs
        run: |
          cd docs
          make html
      - name: Deploy to GitHub Pages
        if: github.ref == 'refs/heads/main'
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./docs/_build/html
```

---

## 89.11 Maintenance and Review

Documentation, like code, needs maintenance. Set up a process:

- **Documentation sprints**: Occasionally dedicate a sprint to updating and improving docs.
- **Review documentation in PRs**: When code changes, require updates to relevant docs.
- **Outdate detection**: Use tools like `markdown-link-check` to find broken links.
- **Feedback mechanism**: Allow users to suggest improvements (e.g., “Was this page helpful?” buttons).

---

## Chapter Summary

In this chapter, we explored the multifaceted world of documentation for a time‑series prediction system. We covered:

- **Code documentation** with docstrings and READMEs.
- **API documentation** using OpenAPI and tools like FastAPI.
- **Architecture documentation** with the C4 model and diagrams as code.
- **User documentation** including guides, tutorials, and FAQs.
- **Model documentation** with model cards.
- **Data documentation** with datasheets.
- **Operational documentation** (runbooks) for incident response.
- **Team wikis** as a central knowledge repository.
- **Automation tools** (Sphinx, MkDocs, Jupyter Book) and CI integration.
- **Maintenance** practices to keep documentation fresh.

Good documentation is an investment that pays off in faster onboarding, fewer incidents, and better collaboration. By treating documentation as a first‑class citizen, you ensure that the NEPSE prediction system remains understandable and maintainable for years to come.

In the next chapter, we will discuss **Quality Assurance**, focusing on how to test and validate both the software and the models in your system.

---

**End of Chapter 89**