[ENH] Add tutorial hub checkpoints and simplify examples#985
Conversation
- Add `braindecode/_tutorial_hub.py` for loading/saving tutorial checkpoints from HuggingFace Hub - Add `scripts/train_and_push_tutorial_checkpoints.py` for offline training with wandb logging and loss curve generation - Simplify all 9 tutorial examples: always train with small n_epochs, then load pretrained checkpoint from HF Hub to show longer training curves and final metrics - Replace `braindecode.visualization.plot_confusion_matrix` with `sklearn.metrics.ConfusionMatrixDisplay.from_predictions` in all tutorials - Eldele tutorial: 6 subjects, full hyperparameter setup matching offline training (74.6% balanced accuracy), wandb dashboard embed - All checkpoints published to huggingface.co/braindecode/
There was a problem hiding this comment.
Pull request overview
This PR introduces a small Hugging Face Hub helper module and an offline training/push script to publish tutorial checkpoints, then updates multiple gallery tutorials to run quickly in docs builds while still demonstrating full training curves and final metrics via downloaded pretrained artifacts.
Changes:
- Add
braindecode/_tutorial_hub.pyutilities for saving/uploading tutorial artifacts and (optionally) downloading them from Hugging Face Hub. - Add
scripts/train_and_push_tutorial_checkpoints.pyto train tutorial reference runs and push checkpoints/search artifacts to the Hub (with optional W&B logging). - Update multiple tutorials to (a) train for a few epochs locally, (b) load pretrained params/history from the Hub, and (c) use
sklearn.metrics.ConfusionMatrixDisplay.from_predictionsinstead ofbraindecode.visualization.plot_confusion_matrix.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
braindecode/_tutorial_hub.py |
New internal helper for downloading/saving/uploading tutorial artifacts. |
scripts/train_and_push_tutorial_checkpoints.py |
New offline trainer/publisher for tutorial checkpoints and augmentation-search artifacts. |
braindecode/preprocessing/eegprep_preprocess.py |
Preserve/restore event durations across EEGLAB round-trips (notably around resampling). |
test/unit_tests/preprocessing/test_eegprep_preprocessor.py |
Add regression test ensuring EEGPrep resampling preserves MNE annotation durations. |
examples/model_building/plot_bcic_iv_2a_moabb_trial.py |
Short local training + load pretrained checkpoint/history from Hub; confusion matrix via sklearn. |
examples/model_building/plot_bcic_iv_2a_moabb_cropped.py |
Same tutorial flow updates + sklearn confusion matrix. |
examples/model_building/plot_bcic_iv_2a_eegprep_cleaning.py |
Switch to EEGNeX, short local training + Hub checkpoint/history, sklearn confusion matrix. |
examples/applied_examples/plot_sleep_staging_usleep.py |
Add early stopping, MPS support, Hub checkpoint/history loading, sklearn confusion matrix. |
examples/applied_examples/plot_sleep_staging_eldele2021.py |
Expand to 6 subjects, add more training callbacks, embed W&B iframe, Hub checkpoint/history loading, sklearn confusion matrix. |
examples/applied_examples/plot_sleep_staging_chambon2018.py |
Add early stopping + Hub checkpoint/history loading + sklearn confusion matrix. |
examples/applied_examples/bcic_iv_4_ecog_trial.py |
Add early stopping, Hub checkpoint/history loading, updated narrative around longer training. |
examples/advanced_training/bcic_iv_4_ecog_cropped.py |
Add early stopping + Hub checkpoint/history loading, updated narrative around longer training. |
examples/advanced_training/plot_data_augmentation_search.py |
Load published search results/metadata from Hub when available; otherwise run short local search; new plotting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a283ae0999
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 14 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fixes from Copilot/Codex automated review on PR #985: - scripts/train_and_push_tutorial_checkpoints.py: * Widen TutorialArtifacts.clf type to EEGClassifier | EEGRegressor * Set use_safetensors=True for USleep (tutorial downloads .safetensors) * Fix _sleep_attnsleep subject split to match tutorial (subjects 0-5, valid [4,5]) * Fix misindented closing paren in _cropped_shallow call - Sleep tutorials (eldele, chambon): replace broken y_true comprehension with valid_sampler iteration pattern used in USleep - _tutorial_hub.py: make metadata.json truly optional in load_tutorial_checkpoint_metadata by splitting the required and best-effort downloads - Tutorials: route Hub downloads through load_tutorial_checkpoint_metadata so the example falls back to the locally trained model when huggingface_hub is missing or unreachable - ConfusionMatrixDisplay: pass explicit labels= alongside display_labels in all 6 tutorials to avoid shape mismatches when classes are absent - docs/whats_new.rst: HuggingFace -> Hugging Face
…b-checkpoints # Conflicts: # docs/whats_new.rst
|
@bruAristimunha Related to this: I just fixed some missing channel embeddings from the sJEPA checkpoint #991 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Remove the private `braindecode/_tutorial_hub.py` helper and the reproducibility script `scripts/train_and_push_tutorial_checkpoints.py`. The helper was premature abstraction for 8 nearly-identical callers and added a public-looking `braindecode/_tutorial_*` module that tutorials do not need. Each tutorial now inlines a small try/except around `hf_hub_download` and `clf.load_params(...)` / `regressor.load_params(...)`. If `huggingface_hub` is missing or the download fails, the tutorial warns and continues with the locally trained short-run model. The offline training script is now hosted as a public gist: https://gist.github.com/bruAristimunha/27d74c8410fe9d0db258a03f42efa7c6 Also inlines the search artifact download in `plot_data_augmentation_search.py`.
…/braindecode/braindecode into feature/tutorial-hub-checkpoints
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| clf.initialize() | ||
| clf.load_params( | ||
| f_params=hf_hub_download(repo_id, "params.safetensors"), | ||
| f_history=hf_hub_download(repo_id, "history.json"), |
There was a problem hiding this comment.
Same issue as above: calling clf.initialize() before attempting HF Hub downloads means a failed download/load will leave the model reinitialized, so the warning about continuing with the locally trained model is no longer true. Avoid initialize() here or load into a separate classifier instance after downloads succeed.
| clf.initialize() | |
| clf.load_params( | |
| f_params=hf_hub_download(repo_id, "params.safetensors"), | |
| f_history=hf_hub_download(repo_id, "history.json"), | |
| params_path = hf_hub_download(repo_id, "params.safetensors") | |
| history_path = hf_hub_download(repo_id, "history.json") | |
| clf.initialize() | |
| clf.load_params( | |
| f_params=params_path, | |
| f_history=history_path, |
| try: | ||
| from huggingface_hub import hf_hub_download | ||
|
|
||
| clf.initialize() | ||
| clf.load_params( | ||
| f_params=hf_hub_download(repo_id, "params.pt"), | ||
| f_history=hf_hub_download(repo_id, "history.json"), | ||
| use_safetensors=False, | ||
| ) | ||
| except Exception as exc: |
There was a problem hiding this comment.
The HF Hub loading block re-runs clf.initialize() before download/load. If any error occurs, the code warns it will continue with the locally trained model, but initialization has already reset the network. To keep the fallback meaningful, avoid calling initialize() here (the net is initialized after fit), or load into a separate classifier instance and only swap on successful load.
Summary
braindecode/_tutorial_hub.pymodule for loading/saving tutorial checkpoints from HuggingFace Hubscripts/train_and_push_tutorial_checkpoints.pyfor offline training with wandb logging and loss curve generationn_epochs, then load the pretrained checkpoint from HF Hub to show longer training curves and final metricsbraindecode.visualization.plot_confusion_matrixwithsklearn.metrics.ConfusionMatrixDisplay.from_predictionsin all tutorialsTutorial flow (after this PR)
Each tutorial now follows a clean pattern:
hf_hub_download+clf.load_params()Published checkpoints
Test plan
make clean && make htmlbuilds successfully with 0 errors