Skip to content

feat(MachineLearning/PACLearning): add VersionSpace abstraction#2

Open
Zetetic-Dhruv wants to merge 1 commit intoSamuelSchlesinger:feat/pac-learning-defsfrom
Zetetic-Dhruv:version-space
Open

feat(MachineLearning/PACLearning): add VersionSpace abstraction#2
Zetetic-Dhruv wants to merge 1 commit intoSamuelSchlesinger:feat/pac-learning-defsfrom
Zetetic-Dhruv:version-space

Conversation

@Zetetic-Dhruv
Copy link
Copy Markdown

@Zetetic-Dhruv Zetetic-Dhruv commented Apr 18, 2026

Adds the classical version space abstraction (Mitchell 1982, Angluin 1980) as a companion to the PAC learning definitions in leanprover#492.

Contents:

  • VersionSpace C S: the subset of C whose concepts agree with S on every sample point
  • versionSpace_subset, versionSpace_empty_sample: sanity lemmas
  • versionSpace_antitone: more data gives a smaller version space
  • IsConsistent A C: predicate on learners whose output always lies in the version space
  • IsConsistent.output_mem_conceptClass, IsConsistent.output_consistent: derived properties
  • mem_versionSpace_of_realizable, versionSpace_nonempty_of_realizable: realizable-case bridge

This is required for PAC lower bound, infinite NFL and Shauer Shelah proofs. I feel version space and complexity go hand in hand. If you feel this is not fully relevant, I will add this separately.
But this is a short addon and might be helpful to satisfy Shreyas. Let me know

Adds the classical version space abstraction (Mitchell 1982, Angluin 1980)
to the PAC learning module. Version space — the subset of a concept class
consistent with observed labeled data — is the structural substrate every
sample complexity theorem operates on (Baby PAC, Sauer-Shelah, PAC lower
bounds, NFL). This PR complements leanprover#492 by providing:

- VersionSpace C S: the consistent subset of C given sample S
- versionSpace_subset, versionSpace_empty_sample: sanity lemmas
- versionSpace_antitone: structural monotonicity (more data → smaller VS),
  dual to sample complexity monotonicity
- IsConsistent A C: predicate on learners (output always in version space)
- IsConsistent.output_mem_conceptClass, output_consistent: consistent learner
  properties
- mem_versionSpace_of_realizable: under realizable data, the target concept
  lies in the version space (the realizable-case bridge)
- versionSpace_nonempty_of_realizable: corollary

No measure theory, no new Mathlib dependencies, ~150 lines, 0 sorry.

Together with leanprover#492 these suffice to state the Baby PAC theorem, Sauer-Shelah
sample complexity, PAC lower bounds, and NFL as structural statements rather
than ad-hoc computations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants