# Reusable Jupyter Notebook Template
**Author: Cazandra Aporbo**  
**Date:** 2025-09-17

This notebook is a teaching template designed to help you write **reusable**, **readable**, and **professional** notebooks. It includes a modular utilities file, lightweight tests, a small data pipeline, and guidance on styling, documentation, and packaging for reuse.


In [None]:

%%html
<style>
:root{
  --fg: #2b2b2b;
  --muted: #6b7280;
  --purple: #6B5B95;
  --pink: #F6E6F9;
  --mint: #B9F5D0;
  --lav: #E9D8FD;
}
body, .jp-Notebook, .jp-NotebookPanel{
  color: var(--fg);
  font-family: ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, Helvetica, Arial, "Apple Color Emoji","Segoe UI Emoji";
}
.jp-Notebook h1, .jp-Notebook h2, .jp-Notebook h3{
  letter-spacing: 0.2px;
}
.jp-Notebook h1{
  background: linear-gradient(90deg, var(--purple), var(--lav), var(--mint));
  -webkit-background-clip: text;
  -webkit-text-fill-color: transparent;
}
.callout {
  border-left: 4px solid var(--purple);
  background: #faf9ff;
  padding: 0.75rem 1rem;
  margin: 1rem 0;
  border-radius: 6px;
}
.callout .title{
  font-weight: 700;
  color: var(--purple);
  margin-bottom: 0.25rem;
}
.badge {
  display: inline-block;
  padding: 0.25rem 0.5rem;
  border-radius: 9999px;
  background: var(--mint);
  font-size: 0.85rem;
  font-weight: 600;
  color: #0b5137;
  margin-right: 0.5rem;
}
.jp-RenderedHTMLCommon table {
  border-collapse: collapse;
}
.jp-RenderedHTMLCommon table, 
.jp-RenderedHTMLCommon th, 
.jp-RenderedHTMLCommon td{
  border: 1px solid #eee;
  padding: 6px 10px;
}
.jp-RenderedHTMLCommon th{
  background: var(--pink);
  text-align: left;
}
</style>



## How to Use This Template

This notebook is both **guide** and **backbone**. Treat it as a starting point for all new analyses.

1. **Duplicate** this file and rename it per project: `YYYYMMDD_proj-shortname.ipynb`.
2. **Fill Project Metadata** (next section) to declare purpose, inputs, outputs, and decisions.
3. **Use the `notebook_utils.py` module** for reusable functions; extend it rather than copying code.
4. **Write lightweight tests** (assertions) whenever you introduce a new function or assumption.
5. **Keep cells small** and **name variables descriptively**. Add short docstrings and rationale.
6. **Pipe-based workflows**: prefer pure, modular steps that compose (`df.pipe(step)`).
7. **Version your outputs** and make runs reproducible (seed, environment, pinned deps when needed).
8. **If logic generalizes**, move it to the utils module or a tiny local package for reuse across notebooks.

<div class="callout">
  <div class="title">Why “reusable” matters</div>
  Reusable code reduces duplication, clarifies intent, and cuts error rates. It makes you faster and the work more robust.
</div>

### Six Principles of Professional, Reusable Notebooks
- <span class="badge">Modular</span> small, single-purpose functions; centralize shared logic
- <span class="badge">Correct</span> verify behavior with asserts/tests; validate assumptions
- <span class="badge">Readable</span> clear names, docstrings, comments; concise cells
- <span class="badge">Stylish</span> stick to one style (PEP 8); consistent imports/spacing
- <span class="badge">Versatile</span> anticipate variation; parameterize; pure transforms
- <span class="badge">Creative</span> only build new things when they improve on existing solutions


In [None]:

# --- Project Metadata (declare intent up-front) --------------------------------
from dataclasses import dataclass, field
from typing import List, Optional
from pathlib import Path

@dataclass
class ProjectMeta:
    project_name: str
    owner: str
    description: str
    data_sources: List[str] = field(default_factory=list)
    outputs: List[str] = field(default_factory=list)
    seed: int = 42
    notebook_path: Optional[str] = None

META = ProjectMeta(
    project_name="Example Analysis – Reusable Template",
    owner="Cazandra Aporbo",
    description="Demonstrates reusable, modular, and testable patterns in a Jupyter workflow.",
    data_sources=["./data/raw/example.csv (optional)"],
    outputs=["./reports/figures/example_plot.png (generated)"],
    seed=42,
    notebook_path=str(Path.cwd())
)
META


In [None]:

# --- Environment & Reproducibility -------------------------------------------
import sys, platform, random, math, os
import numpy as np

def set_all_seeds(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    # Extend here for torch, tf, jax, etc.

set_all_seeds(META.seed)

print("Python:", sys.version.split()[0], "| Platform:", platform.platform())
print("CWD:", os.getcwd())


In [None]:

# --- Local Utilities Module ---------------------------------------------------
# We write a tiny module next to the notebook for reuse across projects.
# Re-run this cell after editing to refresh in current session.

UTILS_PATH = "notebook_utils.py"

utils_src = """
"""
Reusable utilities for data notebooks.

Principles:
- Functions are pure: no hidden state, no I/O side effects unless explicitly documented.
- Typed signatures and docstrings clarify intent.
- Small surface area: compose tiny steps with pandas .pipe or function chaining.
"""
from typing import Iterable, Tuple, List, Any, Dict, Callable
from collections import Counter
import logging
import math

logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s:%(message)s")
logger = logging.getLogger("notebook_utils")

def find_most_common(values: Iterable[Any]) -> Any:
    """
    Return the single most-common value in an iterable.
    Raises:
        AssertionError: if there's a tie for most common.
    """
    counts = Counter(values)
    top_two: List[Tuple[Any,int]] = counts.most_common(2)
    assert len(top_two) >= 1, "Empty iterable."
    if len(top_two) == 1:
        return top_two[0][0]
    assert top_two[0][1] != top_two[1][1], "There's a tie for most common value."
    return top_two[0][0]

def safe_mean(x: Iterable[float]) -> float:
    """Mean with basic validation and helpful errors."""
    x = list(x)
    assert len(x) > 0, "safe_mean() received an empty sequence."
    return sum(x) / len(x)

def zscore(seq: Iterable[float]) -> List[float]:
    """Standardize a sequence to mean 0, std 1. Returns a new list."""
    data = list(seq)
    mu = safe_mean(data)
    var = sum((v - mu)**2 for v in data) / len(data) if len(data) > 0 else 0.0
    sd = math.sqrt(var) if var > 0 else 1.0
    return [(v - mu)/sd for v in data]

def compose(*funcs: Callable) -> Callable:
    """Functional composition: compose(f, g, h)(x) -> f(g(h(x)))"""
    def _inner(x):
        out = x
        for fn in reversed(funcs):
            out = fn(out)
        return out
    return _inner
"""
with open(UTILS_PATH, "w", encoding="utf-8") as f:
    f.write(utils_src)

import importlib
utils = importlib.import_module("notebook_utils")
importlib.reload(utils)

# Quick smoke tests (Correctness via assertions)
assert utils.find_most_common([1,2,2,3]) == 2
try:
    utils.find_most_common([1,1,2,2])
except AssertionError as e:
    print("Expected tie caught:", e)
assert round(utils.safe_mean([1,2,3]), 3) == 2.0
zs = utils.zscore([1,2,3])
assert abs(sum(zs)) < 1e-6

print("Utilities ready.")



## Readability & Style

- Prefer descriptive names: `sales_data_jan` over `data2`.
- Keep cells short; one idea per cell.
- Docstrings state *what* the function does; comments state *why this implementation choice*.
- Follow **PEP 8** (indentation, spaces around operators, max line length).

Example of a good docstring and type hints:
```python
def percent_change(old: float, new: float) -> float:
    """Return percent change from `old` to `new` as a float in [-1, inf)."""
    assert old != 0, "old must be non-zero."
    return (new - old) / old
```
If you need strict style checks, consider adding a pre-commit hook with a linter/formatter (e.g., ruff + black).


In [None]:

# --- Mini Pipeline Demonstration ---------------------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(META.seed)
n = 200
df = pd.DataFrame({
    "group": rng.choice(list("ABC"), size=n, p=[0.4, 0.4, 0.2]),
    "x": rng.normal(0, 1, size=n),
})
df["y"] = 2.5*df["x"] + rng.normal(0, 0.75, size=n) + df["group"].map({"A":0,"B":0.5,"C":-0.5})

def drop_na(d: pd.DataFrame) -> pd.DataFrame:
    return d.dropna(axis=0).copy()

def add_zscores(d: pd.DataFrame) -> pd.DataFrame:
    d = d.copy()
    d["x_z"] = utils.zscore(d["x"].tolist())
    d["y_z"] = utils.zscore(d["y"].tolist())
    return d

def encode_group(d: pd.DataFrame) -> pd.DataFrame:
    return pd.get_dummies(d, columns=["group"], drop_first=True)

clean = (
    df
    .pipe(drop_na)
    .pipe(add_zscores)
    .pipe(encode_group)
)

# Lightweight test: dimensions should match expected columns
expected_cols = {"x","y","x_z","y_z","group_B","group_C"}
assert expected_cols.issubset(set(clean.columns)), f"Missing columns: {expected_cols - set(clean.columns)}"

# Simple plot (kept basic for portability)
plt.figure()
plt.scatter(clean["x"], clean["y"], alpha=0.6)
plt.title("Scatter: x vs y")
plt.xlabel("x")
plt.ylabel("y")
plt.show()


In [None]:

# --- Simple Model + Sanity Checks --------------------------------------------
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

X = clean[["x","group_B","group_C"]].values
y = clean["y"].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=META.seed)

model = LinearRegression()
model.fit(X_train, y_train)
pred = model.predict(X_test)

score = r2_score(y_test, pred)
print("R^2:", round(score, 3))

# Sanity tests (Correct)
assert score > 0.5, "Model underperforming; check features or data generation."
assert len(pred) == len(y_test), "Prediction length mismatch."



## Creative Reuse Patterns

When a notebook proves useful beyond one analysis, extract the logic:

1. **Move functions** from this notebook into `notebook_utils.py` (or a tiny `src/` package).
2. **Write minimal tests** (asserts) next to each function. If logic grows, promote to `pytest`.
3. **Parameterize**: add a single configuration cell (paths, seeds, toggles). Keep it declarative.
4. **Document decisions**: capture *why* you chose methods or thresholds. Future you will thank present you.
5. **Publish patterns**: internal docs, a template repo, or snippets library.

> Only build a new tool if it clearly improves on existing options; otherwise compose existing libraries.



## Reusability Checklist

- [ ] Centralized utilities used (no copy-paste)
- [ ] Functions are small, typed, and documented
- [ ] Assertions guard assumptions and tie-cases
- [ ] Variable names are descriptive
- [ ] Style is consistent (PEP 8)
- [ ] Pipeline uses `.pipe` or composition
- [ ] Outputs and seeds are declared
- [ ] Decisions are recorded in prose
- [ ] Any general logic promoted to utils/module


In [None]:

# --- Appendix: find_most_common parity demo ----------------------------------
print("Most common in [1,2,2,3]:", utils.find_most_common([1,2,2,3]))

try:
    utils.find_most_common([1,1,2,2])
except AssertionError as e:
    print("Tie detected as expected:", e)
