Skip to content

Hardware acceleration support and NSampler parallelization#23

Merged
ben-nowacki merged 8 commits intoREIL-UConn:mainfrom
ben-nowacki:main
Mar 14, 2026
Merged

Hardware acceleration support and NSampler parallelization#23
ben-nowacki merged 8 commits intoREIL-UConn:mainfrom
ben-nowacki:main

Conversation

@ben-nowacki
Copy link
Copy Markdown
Collaborator

@ben-nowacki ben-nowacki commented Mar 14, 2026

Description

NSampler (and Paired/Triplet subclasses) now support parallelization via an n_workers argument in __init__.

  • Anchor samples are batched into n_workers, where each worker independently find valid pairs for its own anchor samples.
  • Worker batches are then joined on completion and re-sorted by anchor indices.
  • Fixed phases execution clearing sampler outputs on every call, even if the sample configuration did not change. Sampler outputs are now effectively cached, and any sampler computation is only re-run if the phase input bindings or sampler configuration changes.

Hardware acceleration:

  • New Accelerator class provides unified implementation of hardware acceleration for PyTorch and Tensorflow models.
  • Supports CPU (default), GPU (and CUDA), and MPS (Apple Silicon; PyTorch only)
  • All BatchViews produced by a phase's sampler are now pre-materialized before any training/evaluation occurs. This adds a slightly overhead at the start of each phase but removes redundant re-materialization on every epoch of training.
  • All allocation of data and models onto defined devices (eg, GPU) now occurs during this materialization stage prior to phase execution
  • A new how-to notebook has been added to the documentation outlining Accelerator usage

Related Issues

Closes #21
Closes #22

How Has This Been Tested?

  • Local pytest run (No new unit tests added but new example notebook executes).
  • CI passes (all nox tests pass)
  • Manual checks
    • GPU and MPS support tested locally with PyTorch ModelGraph.
    • NSampler with 4 workers reduced sampling time from ~50 seconds to <10 seconds.
    • Execution of same TrainingPhase and ModelGraph setup reduced from ~3 minutes to <30 seconds with Accelerator("gpu:0", pin_memory=True)

Checklist

  • My code follows project style guidelines: nox -s pre-commit
  • I have added tests that prove my fix is effective or my feature works: nox -s unit
  • I have updated documentation if needed: nox -s docs
  • I have linked related issues

@ben-nowacki ben-nowacki merged commit f2dd4dd into REIL-UConn:main Mar 14, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-threading support for NSampler and subclasses GPU Support

2 participants