# Reusable Notebook Master Template
**Author: Cazzy Aporbo**  
**Date:** 2025-09-17

A single-file, professional template that **teaches and enforces reusability**.  
It includes: a reusable utilities module, correctness checks, readability/style guidance, a compact data pipeline, and creative reuse patterns—wrapped in a **pastel ombré** visual theme.


In [None]:

%%html
<style>
:root{
  --fg: #243037;
  --muted: #667085;
  --lilac: #E9D8FD;
  --blush: #FDE2E4;
  --mint:  #D1FAE5;
  --sky:   #DBEAFE;
  --ink:   #2B283A;
  --accent: #7C3AED;
}
body, .jp-Notebook, .jp-NotebookPanel {
  color: var(--fg);
  font-family: ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, Helvetica, Arial;
  line-height: 1.45;
}
.jp-Notebook h1, .jp-Notebook h2, .jp-Notebook h3{ letter-spacing: .2px; }
.jp-Notebook h1 {
  background: linear-gradient(90deg, var(--lilac), var(--blush), var(--mint), var(--sky));
  -webkit-background-clip: text;
  -webkit-text-fill-color: transparent;
}
.jp-Notebook h2 {
  padding-bottom: .3rem;
  border-image: linear-gradient(90deg, var(--lilac), var(--mint)) 1;
  border-bottom: 2px solid transparent;
}
.callout {
  border-left: 5px solid var(--accent);
  background: linear-gradient(90deg, rgba(233,216,253,.35), rgba(209,250,229,.35));
  padding: .9rem 1rem;
  margin: 1rem 0;
  border-radius: 8px;
}
.callout .title{ font-weight: 700; color: var(--ink); margin-bottom: .35rem; }
.badge {
  display: inline-block; padding: 0.25rem 0.6rem; border-radius: 9999px;
  background: var(--mint); color: #064E3B; font-size: .85rem; font-weight: 600;
  margin: 0 .35rem .35rem 0; border: 1px solid rgba(6,78,59,.15);
}
.jp-RenderedHTMLCommon table { border-collapse: collapse; margin: .75rem 0; }
.jp-RenderedHTMLCommon th, .jp-RenderedHTMLCommon td{ border: 1px solid #eee; padding: 8px 10px; }
.jp-RenderedHTMLCommon th{ background: var(--sky); text-align: left; }
.jp-Cell-inputWrapper, .jp-Cell-outputWrapper { border-radius: 6px; }
</style>



## What this template enforces

<div class="callout">
  <div class="title">The single best upgrade: make code reusable</div>
  Reusable code prevents duplication, shortens future projects, and elevates quality.
  This template demonstrates patterns that make reuse the default.
</div>

### Six principles
- <span class="badge">Modular</span> small, single-purpose functions; centralize shared logic
- <span class="badge">Correct</span> verify behavior with asserts and tests
- <span class="badge">Readable</span> names, docstrings, comments; concise cells
- <span class="badge">Stylish</span> one style (PEP 8); consistent imports and spacing
- <span class="badge">Versatile</span> anticipate variation in data; parameterize
- <span class="badge">Creative</span> only build new things that clearly improve on existing tools



## How to use this template

1. **Duplicate** this notebook per project.
2. Update **Project Metadata** at the top.
3. Extend the local **utils** module instead of pasting code across notebooks.
4. Add small **assertions** whenever you introduce a new function or assumption.
5. Keep cells short; keep narrative close to the code.
6. If logic generalizes, **extract** to a package and add lightweight tests.


In [None]:

# --- Project Metadata ---------------------------------------------------------
from dataclasses import dataclass, field
from typing import List, Optional
from pathlib import Path

@dataclass
class ProjectMeta:
    project_name: str
    owner: str
    description: str
    data_sources: List[str] = field(default_factory=list)
    outputs: List[str] = field(default_factory=list)
    seed: int = 42
    notebook_dir: Optional[str] = None

META = ProjectMeta(
    project_name="Reusable Notebook Master Template",
    owner="Cazzy Aporbo",
    description=(
        "A teaching + production template for reusable, readable, and tested notebooks. "
        "It demonstrates a local utils module, lightweight tests, and a small pipeline."
    ),
    data_sources=["./data/raw/example.csv (optional)"],
    outputs=["./reports/figures/example_plot.png (generated)"],
    seed=42,
    notebook_dir=str(Path.cwd())
)

META


In [None]:

# --- Environment & Reproducibility -------------------------------------------
import sys, platform, os, random
import numpy as np

def set_all_seeds(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    # Add: torch.manual_seed(seed), etc., if used.

set_all_seeds(META.seed)

print("Python:", sys.version.split()[0], "| Platform:", platform.platform())
print("Working dir:", os.getcwd())


In [None]:

# --- Local Utilities Module (modular, typed, documented) ---------------------
# Re-run this cell after editing to refresh in-session imports.
UTILS_PATH = "notebook_utils.py"

utils_src = """"""Reusable utilities for data notebooks.

Principles:
- Functions are pure when possible.
- Typed signatures and docstrings clarify intent.
- Keep the surface small; compose steps (pandas .pipe or function composition).
"""
from typing import Iterable, Tuple, List, Any, Callable
from collections import Counter
import logging
import math

logging.basicConfig(level=logging.INFO, format=\"%(levelname)s:%(name)s:%(message)s\")
logger = logging.getLogger(\"notebook_utils\")

def find_most_common(values: Iterable[Any]) -> Any:
    \"\"Return the single most-common value in an iterable.
    Raises:
        AssertionError: if there's a tie for most common or the iterable is empty.
    \"\"
    counts = Counter(values)
    top_two: List[Tuple[Any,int]] = counts.most_common(2)
    assert len(top_two) >= 1, "Empty iterable."
    if len(top_two) == 1:
        return top_two[0][0]
    assert top_two[0][1] != top_two[1][1], "There's a tie for most common value."
    return top_two[0][0]

def safe_mean(x: Iterable[float]) -> float:
    \"\"Mean with basic validation and helpful errors.\"\"
    x = list(x)
    assert len(x) > 0, "safe_mean() received an empty sequence."
    return sum(x) / len(x)

def zscore(seq: Iterable[float]) -> List[float]:
    \"\"Standardize a sequence to mean 0, std 1. Returns a new list.\"\"
    data = list(seq)
    mu = safe_mean(data)
    var = sum((v - mu)**2 for v in data) / len(data) if len(data) > 0 else 0.0
    sd = math.sqrt(var) if var > 0 else 1.0
    return [(v - mu)/sd for v in data]

def compose(*funcs: Callable) -> Callable:
    \"\"Functional composition: compose(f, g, h)(x) -> f(g(h(x)))\"\"
    def _inner(x):
        out = x
        for fn in reversed(funcs):
            out = fn(out)
        return out
    return _inner
"""
with open(UTILS_PATH, "w", encoding="utf-8") as f:
    f.write(utils_src)

import importlib
utils = importlib.import_module("notebook_utils")
importlib.reload(utils)

# Smoke tests (Correctness)
assert utils.find_most_common([1,2,2,3]) == 2
try:
    utils.find_most_common([1,1,2,2])
except AssertionError as e:
    print("Expected tie caught:", e)
assert round(utils.safe_mean([1,2,3]), 3) == 2.0
zs = utils.zscore([1,2,3])
assert abs(sum(zs)) < 1e-6

print("Utilities ready.")



## Readability and Style (PEP 8 in practice)

- Prefer descriptive names: `sales_data_jan` over `data2`.
- Keep cells short; one idea per cell.
- Docstrings: describe *what*; comments: explain *why this approach*.
- Use one style toolchain consistently (e.g., **ruff + black**).

**Example**
```python
def percent_change(old: float, new: float) -> float:
    """Return percent change from `old` to `new` in [-1, inf)."""
    assert old != 0, "old must be non-zero."
    return (new - old) / old
```


In [None]:

# --- Mini Pipeline Demo (pandas .pipe) ---------------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(META.seed)
n = 240
df = pd.DataFrame({
    "group": rng.choice(list("ABC"), size=n, p=[0.45, 0.35, 0.20]),
    "x": rng.normal(0, 1, size=n),
})
df["y"] = 2.5*df["x"] + rng.normal(0, 0.75, size=n) + df["group"].map({"A":0,"B":0.5,"C":-0.5})

def drop_na(d: pd.DataFrame) -> pd.DataFrame:
    return d.dropna(axis=0).copy()

def add_zscores(d: pd.DataFrame) -> pd.DataFrame:
    d = d.copy()
    d["x_z"] = utils.zscore(d["x"].tolist())
    d["y_z"] = utils.zscore(d["y"].tolist())
    return d

def encode_group(d: pd.DataFrame) -> pd.DataFrame:
    return pd.get_dummies(d, columns=["group"], drop_first=True)

clean = (
    df
    .pipe(drop_na)
    .pipe(add_zscores)
    .pipe(encode_group)
)

# Guardrails (Correctness)
expected = {"x","y","x_z","y_z","group_B","group_C"}
assert expected.issubset(set(clean.columns)), f"Missing columns: {expected - set(clean.columns)}"

# Simple scatter (no custom colors, portable)
plt.figure()
plt.scatter(clean["x"], clean["y"], alpha=0.6)
plt.title("Scatter: x vs y")
plt.xlabel("x"); plt.ylabel("y")
plt.show()


In [None]:

# --- Quick Model + Sanity Checks ---------------------------------------------
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

X = clean[["x","group_B","group_C"]].values
y = clean["y"].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=META.seed)
model = LinearRegression().fit(X_train, y_train)
pred = model.predict(X_test)
score = r2_score(y_test, pred)
print("R^2:", round(score, 3))

# Sanity tests
assert score > 0.5, "Model underperforming; check features or data generation."
assert len(pred) == len(y_test), "Prediction length mismatch."



## Versatility & Creative Reuse

**When to generalize**: As soon as you paste a block twice, promote it into `notebook_utils.py` and import it.  
**Parameterize**: Create a single configuration cell (paths, seeds, toggles) and read from there.  
**Data variation**: Validate assumptions—column names, types, ranges—*before* modeling.

**Packaging pattern**
1. Move utilities into `src/yourpkg/` with `__init__.py`.
2. Add minimal `pyproject.toml` (build + deps) and tests (`pytest`).
3. Document decisions (why a threshold or method) in Markdown cells.
4. Publish internally; reuse across notebooks or pipelines.



## Reusability Checklist

- [ ] Centralized utilities used (no copy-paste)
- [ ] Functions are small, typed, and documented
- [ ] Assertions guard assumptions and tie-cases
- [ ] Variable names are descriptive
- [ ] Style is consistent (PEP 8) or your chosen style
- [ ] Pipeline uses `.pipe` or function composition
- [ ] Seeds and outputs are declared
- [ ] Decisions are recorded in Markdown
- [ ] General logic promoted to utils or a package


In [None]:

# --- Appendix: Parity Demo for find_most_common ------------------------------
print("Most common in [1,2,2,3]:", utils.find_most_common([1,2,2,3]))
try:
    utils.find_most_common([1,1,2,2])
except AssertionError as e:
    print("Tie detected as expected:", e)
