Skip to content

cycling-data-lab/structural-bounds-framework

Repository files navigation

structural-bounds-framework

License: MIT Python 3.12+ Status: working draft v0.5 Target: JMLR / FoCM DOI

Manuscript: A universal spectral lower bound on the leave-node-out generalisation error of graph-supervised learning. Rohan Fossé and Gaël Pallares, CESI LINEACT, 2026. In preparation for submission to Journal of Machine Learning Research (JMLR) or Foundations of Computational Mathematics (FoCM).

This repository develops the foundational theoretical paper of the cycling-data-lab structural-bounds research program: a regressor-free, universal lower bound on the expected leave-node-out generalisation error of any graph-supervised learner, together with a matching Berry–Esseen minimax tightness statement. The framework contains the materials-applicability-bound (MLST, in submission) and mobility-applicability-bound (TR-B, in preparation) results as one-paragraph corollaries.

Headline result

Theorem Statement Status
Universal lower bound (Thm 1) E[L_{T^c}(f̂)] ≥ (1 - R²_spec(H_d, y)) · Var(y) — exact in expectation, no concentration slack proved (notes/01)
Universal upper bound (Thm 2, sharp transductive form) ERM saturates the lower bound to 2·R^trans_n(H_d) + 5.05·M²·√((n+u)·log(2/η)/(n·u)); rho factor eliminated via El-Yaniv-Pechyony 2006 (notes/04) proved (notes/02, 04)
Berry–Esseen minimax tightness (Thm 3) No estimator can drive the slack below Ω(M²/√(N-n)) in the worst case, even with oracle access to y on V proved (notes/02)
Cramér–Rao analog (Cor 1) The pair of bounds is the graph-supervised-learning analog of Var(θ̂) ≥ 1/I(θ); rates and constants match to a factor ≤ 10 direct corollary
Negative transfer (Thm 6, Cor 2 = C2) Under spectral disjointness, E[NT] ≥ (‖P_{H_d}y_src‖² + ‖P_{H_d}y_tgt‖²)/n_t; transfer is unavoidably harmful proved (paper §C2)
Active learning (Thm 7, Cor C3) Importance-weighted leverage-score sampling achieves rate O(M²·√(d·N·log(1/η))/n) — parametric 1/n instead of passive 1/√n proved (paper §C3)

The bound is learner-specific: R²_spec(H_d, y) := 1 - L_V(f*_{H_d}) / Var(y) where f*_{H_d} := argmin_{f ∈ H_d} L_V(f) is the population-risk minimiser of the learner's hypothesis class. For closed linear H_d, this reduces to the projection- of y on H_d in the eigenbasis of the graph Laplacian.

What's in here

structural-bounds-framework/
├── paper.tex                              # Main manuscript (working draft v0.5, 21 pages, C1+C2+C3 full, 2 figures, full empirical section, 4 P-polish edits from notes/06 applied)
├── paper.pdf                              # Compiled manuscript (regenerable via pdflatex)
├── paper_si.tex                           # Supplementary Information (5 appendices, 5 pages compiled)
├── paper_si.pdf                           # Compiled SI
├── paper_si.tex                           # Supplementary Information (stub)
├── notes/                                 # Research notes — formal proofs
│   ├── 01_universal_bound_proof.tex         # Theorem 1 (universal LB)
│   ├── 02_minimax_saturation.tex            # Theorem 2 (UB) + Theorem 3 (BE)
│   ├── 03_non_realisable_resolution.tex     # Closure: non-realisable regime is vacuous
│   ├── 04_sharp_transductive_constant.tex   # Sharp transductive constant via El-Yaniv-Pechyony
│   ├── 05_consolidated_theorem_manifesto.tex # One-page reference: all 7 theorems + status
│   ├── 06_adversarial_self_review.tex       # Gemini-Pro-2.5-style pre-empt of reviewer objections
│   └── README.md                            # Conventions and reading order
├── references/references.bib              # BibTeX
├── experiments/                           # d01 ... dNN reproducible scripts (stubs)
│   ├── _plot_style.py                       # Paul Tol palette plot helper
│   ├── d01_pilot.py                         # Smoke-test pilot (placeholder)
│   └── README.md                            # Script-numbering convention
├── figures/                               # Publication figures (PDF)
│   ├── fig1_spectral_projection.pdf         # Headline: R²_spec vs ΔR²_LSO scatter (8 MatBench tasks)
│   └── fig2_slack_comparison.pdf            # Slack-rate comparison: rho/sqrt(n) vs transductive vs Berry-Esseen
├── outputs/                               # Per-experiment JSON / CSV / NPZ
├── cover_letter.md                        # Cover letter draft
├── .zenodo.json                           # Zenodo deposit metadata
├── CITATION.cff                           # Citation File Format
└── README.md                              # This file

The notes/ directory contains the canonical formal proofs that feed into paper.tex. Reading order: notes/01 → notes/02 (errata to notes/01) → notes/03 (closure, withdraws notes/02 Conjecture 1). The manuscript paper.tex absorbs the post-closure statements.

Reproducing the manuscript

# Build the manuscript (run from repo root)
pdflatex paper.tex
bibtex   paper
pdflatex paper.tex
pdflatex paper.tex

# Build the Supplementary Information
pdflatex paper_si.tex
bibtex   paper_si
pdflatex paper_si.tex
pdflatex paper_si.tex

# Build the research notes (notes/01–03 standalone, notes/04 needs bibtex)
pdflatex notes/01_universal_bound_proof.tex
pdflatex notes/02_minimax_saturation.tex
pdflatex notes/03_non_realisable_resolution.tex
( cd notes && pdflatex 04_sharp_transductive_constant.tex \
            && bibtex 04_sharp_transductive_constant \
            && pdflatex 04_sharp_transductive_constant.tex \
            && pdflatex 04_sharp_transductive_constant.tex )

# Regenerate figures (uses matplotlib + Paul Tol palette in _plot_style.py)
python3.12 experiments/d01_spectral_projection_figure.py
python3.12 experiments/d02_slack_comparison_figure.py

The paper uses standard article class (not iopjournal.cls, since the target is JMLR / FoCM, not an IOP venue). A class swap at submission time is mechanical.

Sibling repos in the cycling-data-lab structural-bounds program

The organisation README presents the full program structure with mermaid diagram of corollary dependencies.

How to cite

A machine-readable citation is provided in CITATION.cff (rendered as a "Cite this repository" button on GitHub). Plain BibTeX:

@unpublished{FossePallares2026spectralBound,
  author = {Foss\'e, Rohan and Pallares, Ga\"el},
  title  = {A universal spectral lower bound on the leave-node-out
            generalisation error of graph-supervised learning},
  note   = {Manuscript in preparation, CESI LINEACT, 2026.
            \url{https://github.com/cycling-data-lab/structural-bounds-framework}},
  year   = {2026}
}

License

MIT.

Contact

Rohan Fossé — rfosse@cesi.frORCID Gaël Pallares — ORCID

About

Paper (in preparation): the universal spectral lower bound on graph-supervised learning. Unifies materials-applicability-bound (Theorem 1), mobility-applicability-bound, planned negative-transfer and active-learning corollaries under a single regressor-free Cramér-Rao-style framework. Target: JMLR / FoCM.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors