Skip to content

docs: add "Understanding the Data Pipeline" page#140

Merged
kinggongzilla merged 2 commits into
Emmi-AI:mainfrom
tcapelle:docs/understanding-data-pipeline
Apr 3, 2026
Merged

docs: add "Understanding the Data Pipeline" page#140
kinggongzilla merged 2 commits into
Emmi-AI:mainfrom
tcapelle:docs/understanding-data-pipeline

Conversation

@tcapelle
Copy link
Copy Markdown
Contributor

We put together (me and Claude) a data doc. I don't know exactly where this should live.

Covers data flow from disk to batch, point subsampling mechanics, epoch semantics with subsampling, two-level subsampling strategy, shuffling, interleaved sampling, and performance characteristics.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 31, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@tcapelle
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@tcapelle tcapelle force-pushed the docs/understanding-data-pipeline branch from 7af4e38 to 1e1499e Compare April 1, 2026 09:18
Copy link
Copy Markdown
Collaborator

@HennerM HennerM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tcapelle for the contribution. This is really nice to have because there are so many moving parts (as you noticed) in the data loading pipeline. I just left one comment otherwise LGTM

Comment thread docs/source/noether/understanding_the_data_pipeline.rst Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

Coverage

Tests Skipped Failures Errors Time
1204 21 💤 0 ❌ 0 🔥 26.767s ⏱️

tcapelle and others added 2 commits April 3, 2026 10:44
Covers data flow from disk to batch, point subsampling mechanics,
epoch semantics with subsampling, two-level subsampling strategy,
shuffling, interleaved sampling, and performance characteristics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A sample can vary in ways beyond geometry (e.g. different inflow
parameters), so 'sample' is the more accurate term.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@HennerM HennerM force-pushed the docs/understanding-data-pipeline branch from 1f93375 to 0452178 Compare April 3, 2026 08:44
@kinggongzilla kinggongzilla merged commit 35b7578 into Emmi-AI:main Apr 3, 2026
9 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants