Skip to content

Add fp16/fp32 SafeTensors export and inference loading support#9

Closed
IgorBaratta wants to merge 3 commits into
mainfrom
igor/safetensor_support
Closed

Add fp16/fp32 SafeTensors export and inference loading support#9
IgorBaratta wants to merge 3 commits into
mainfrom
igor/safetensor_support

Conversation

@IgorBaratta
Copy link
Copy Markdown
Collaborator

Convert a .pt checkpoint to a .safetensors file (fp16 or fp32).

    PYTHONPATH=code python code/export/checkpoint_to_safetensors.py \
        --checkpoint models/PreDecoderModelMemory_v1.0.94.pt \
        --model-id 1 [--fp16]

That later can be used as:

PREDECODER_SAFETENSORS_CHECKPOINT=models/PreDecoderModelMemory_v1.0.94_fp16.safetensors \
PREDECODER_INFERENCE_NUM_SAMPLES=65650 WORKFLOW=inference DISTANCE=13 N_ROUNDS=13 \
EXPERIMENT_NAME=predecoder_model_1 bash code/scripts/local_run.sh

@IgorBaratta IgorBaratta requested a review from ivanbasov March 5, 2026 21:15
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ivanbasov added a commit that referenced this pull request Mar 6, 2026
… matrix (#8)

* Consolidate CI test jobs: merge GPU smoke test and add Python version matrix

- Remove separate smoke-test-gpu job (was serial after gpu-tests, increasing
  pipeline time). Smoke training+inference now runs in the same gpu-tests job.
- Replace python-compat matrix (6 jobs, SKIP_TESTS=1) with two focused job
  groups that actually run tests:
  * gpu-tests: matrix over Python 3.11/3.12/3.13 on GPU runners — installs
    train deps, runs full test suite (CPU+GPU), then smoke training+inference.
  * inference-tests: matrix over Python 3.11/3.12/3.13 on CPU — installs
    inference deps, runs tests with pre-trained models (GPU tests auto-skip).

Reduces total jobs from 11 to 9 while increasing actual test coverage.

Made-with: Cursor

* Fix GPU CI: set DEBIAN_FRONTEND=noninteractive to prevent tzdata hang

The deadsnakes PPA pulls in tzdata as a dependency, which triggers an
interactive timezone configuration prompt in the container. This caused
all 3 GPU matrix jobs to hang for 45 minutes until timeout.

Made-with: Cursor

* Add pull_request trigger and gate GPU jobs to push/merge_group only

Without the pull_request trigger, CI never fires on PRs — checks aren't
even planned (e.g. PR #9 shows zero checks). GPU jobs are gated to
push/merge_group events to avoid consuming self-hosted GPU runners on
every PR update.

Made-with: Cursor

* Remove event gate on GPU jobs so they run on PRs too

GPU jobs complete in ~5-10 minutes and serve as a useful pre-merge check.

Made-with: Cursor

* Remove pull-request/[0-9]+ from push trigger to fix duplicate CI runs

The copy-pr-bot creates pull-request/N branches for each PR, which
matched the push trigger and caused every CI job to run twice (once
from pull_request, once from push). The pull_request trigger already
covers PRs targeting main, so the push pattern is redundant.

Made-with: Cursor

* Fix GPU CI: gate on event type, restore push trigger for copy-pr-bot

NVIDIA self-hosted runners block pull_request events outright. GPU CI
must run via push events — either to main or to pull-request/[0-9]+
branches created by copy-pr-bot for PR testing.

- Restore "pull-request/[0-9]+" in push trigger
- Gate gpu-tests with if: github.event_name != 'pull_request'
- CPU jobs (inference-tests, unit-tests, etc.) still run on pull_request

Made-with: Cursor

* Remove pull-request/[0-9]+ push pattern and pull_request gate on GPU jobs

Simplify triggers: all jobs (including GPU) run on pull_request, push
to main, and merge_group. The pull-request/[0-9]+ branch convention is
not used by contributors.

Made-with: Cursor

* Merge unit-tests + inference-tests, gate GPU jobs from pull_request

- Combine unit-tests (py3.12) and inference-tests (py3.11/3.12/3.13)
  into a single unit-tests matrix job across all three Python versions.
  Both ran identical test suites with inference requirements.
- Re-add if: github.event_name != 'pull_request' on gpu-tests since
  NVIDIA self-hosted runners block pull_request events entirely.
  GPU CI runs on push to main and merge_group.

Made-with: Cursor

* Split GPU tests into separate workflow to avoid skipped PR noise

NVIDIA self-hosted runners block pull_request events, so GPU jobs in
the main CI workflow always showed as a single "Skipped" entry with
unresolved matrix names on every PR.

Move GPU jobs to ci-gpu.yml (triggers: push to main, merge_group,
workflow_dispatch). The main ci.yml keeps CPU jobs only (triggers:
pull_request, push to main, merge_group, workflow_dispatch).

Made-with: Cursor

* Enable GPU CI on PRs via copy-pr-bot push trigger

Add pull-request/[0-9]+ to ci-gpu.yml push trigger so GPU tests run
when copy-pr-bot creates the corresponding branch for a PR.

Made-with: Cursor

* Fix smoke test step: use bash shell for source command

The container default shell is sh, which doesn't have the source
builtin. Explicitly set shell: bash for the venv activation step.

Made-with: Cursor

* Install gcc in GPU container for torch.compile/inductor

The smoke training step uses torch.compile which invokes the inductor
backend, requiring a C compiler. The ubuntu:22.04 container doesn't
ship with gcc.

Made-with: Cursor

* Switch CPU jobs to NVIDIA self-hosted linux-amd64-cpu4 runners

Use nv-cpu-general runner group instead of GitHub-hosted ubuntu-latest.
Also restore pull-request/[0-9]+ push trigger in case self-hosted CPU
runners block pull_request events (same as GPU runners).

Made-with: Cursor

* Remove pull_request trigger since all runners are NVIDIA self-hosted

NVIDIA self-hosted runners block pull_request events. All CI (CPU and
GPU) now runs via copy-pr-bot push to pull-request/[0-9]+ branches.

Made-with: Cursor
Signed-off-by: Igor Baratta <ialmeidabara@nvidia.com>
@ivanbasov
Copy link
Copy Markdown
Collaborator

created #11 with signed commits. closing this one

@ivanbasov ivanbasov closed this Mar 6, 2026
ivanbasov added a commit that referenced this pull request Apr 10, 2026
… matrix (#8)

* Consolidate CI test jobs: merge GPU smoke test and add Python version matrix

- Remove separate smoke-test-gpu job (was serial after gpu-tests, increasing
  pipeline time). Smoke training+inference now runs in the same gpu-tests job.
- Replace python-compat matrix (6 jobs, SKIP_TESTS=1) with two focused job
  groups that actually run tests:
  * gpu-tests: matrix over Python 3.11/3.12/3.13 on GPU runners — installs
    train deps, runs full test suite (CPU+GPU), then smoke training+inference.
  * inference-tests: matrix over Python 3.11/3.12/3.13 on CPU — installs
    inference deps, runs tests with pre-trained models (GPU tests auto-skip).

Reduces total jobs from 11 to 9 while increasing actual test coverage.

Made-with: Cursor

* Fix GPU CI: set DEBIAN_FRONTEND=noninteractive to prevent tzdata hang

The deadsnakes PPA pulls in tzdata as a dependency, which triggers an
interactive timezone configuration prompt in the container. This caused
all 3 GPU matrix jobs to hang for 45 minutes until timeout.

Made-with: Cursor

* Add pull_request trigger and gate GPU jobs to push/merge_group only

Without the pull_request trigger, CI never fires on PRs — checks aren't
even planned (e.g. PR #9 shows zero checks). GPU jobs are gated to
push/merge_group events to avoid consuming self-hosted GPU runners on
every PR update.

Made-with: Cursor

* Remove event gate on GPU jobs so they run on PRs too

GPU jobs complete in ~5-10 minutes and serve as a useful pre-merge check.

Made-with: Cursor

* Remove pull-request/[0-9]+ from push trigger to fix duplicate CI runs

The copy-pr-bot creates pull-request/N branches for each PR, which
matched the push trigger and caused every CI job to run twice (once
from pull_request, once from push). The pull_request trigger already
covers PRs targeting main, so the push pattern is redundant.

Made-with: Cursor

* Fix GPU CI: gate on event type, restore push trigger for copy-pr-bot

NVIDIA self-hosted runners block pull_request events outright. GPU CI
must run via push events — either to main or to pull-request/[0-9]+
branches created by copy-pr-bot for PR testing.

- Restore "pull-request/[0-9]+" in push trigger
- Gate gpu-tests with if: github.event_name != 'pull_request'
- CPU jobs (inference-tests, unit-tests, etc.) still run on pull_request

Made-with: Cursor

* Remove pull-request/[0-9]+ push pattern and pull_request gate on GPU jobs

Simplify triggers: all jobs (including GPU) run on pull_request, push
to main, and merge_group. The pull-request/[0-9]+ branch convention is
not used by contributors.

Made-with: Cursor

* Merge unit-tests + inference-tests, gate GPU jobs from pull_request

- Combine unit-tests (py3.12) and inference-tests (py3.11/3.12/3.13)
  into a single unit-tests matrix job across all three Python versions.
  Both ran identical test suites with inference requirements.
- Re-add if: github.event_name != 'pull_request' on gpu-tests since
  NVIDIA self-hosted runners block pull_request events entirely.
  GPU CI runs on push to main and merge_group.

Made-with: Cursor

* Split GPU tests into separate workflow to avoid skipped PR noise

NVIDIA self-hosted runners block pull_request events, so GPU jobs in
the main CI workflow always showed as a single "Skipped" entry with
unresolved matrix names on every PR.

Move GPU jobs to ci-gpu.yml (triggers: push to main, merge_group,
workflow_dispatch). The main ci.yml keeps CPU jobs only (triggers:
pull_request, push to main, merge_group, workflow_dispatch).

Made-with: Cursor

* Enable GPU CI on PRs via copy-pr-bot push trigger

Add pull-request/[0-9]+ to ci-gpu.yml push trigger so GPU tests run
when copy-pr-bot creates the corresponding branch for a PR.

Made-with: Cursor

* Fix smoke test step: use bash shell for source command

The container default shell is sh, which doesn't have the source
builtin. Explicitly set shell: bash for the venv activation step.

Made-with: Cursor

* Install gcc in GPU container for torch.compile/inductor

The smoke training step uses torch.compile which invokes the inductor
backend, requiring a C compiler. The ubuntu:22.04 container doesn't
ship with gcc.

Made-with: Cursor

* Switch CPU jobs to NVIDIA self-hosted linux-amd64-cpu4 runners

Use nv-cpu-general runner group instead of GitHub-hosted ubuntu-latest.
Also restore pull-request/[0-9]+ push trigger in case self-hosted CPU
runners block pull_request events (same as GPU runners).

Made-with: Cursor

* Remove pull_request trigger since all runners are NVIDIA self-hosted

NVIDIA self-hosted runners block pull_request events. All CI (CPU and
GPU) now runs via copy-pr-bot push to pull-request/[0-9]+ branches.

Made-with: Cursor
ivanbasov added a commit that referenced this pull request Apr 10, 2026
… matrix (#8)

* Consolidate CI test jobs: merge GPU smoke test and add Python version matrix

- Remove separate smoke-test-gpu job (was serial after gpu-tests, increasing
  pipeline time). Smoke training+inference now runs in the same gpu-tests job.
- Replace python-compat matrix (6 jobs, SKIP_TESTS=1) with two focused job
  groups that actually run tests:
  * gpu-tests: matrix over Python 3.11/3.12/3.13 on GPU runners — installs
    train deps, runs full test suite (CPU+GPU), then smoke training+inference.
  * inference-tests: matrix over Python 3.11/3.12/3.13 on CPU — installs
    inference deps, runs tests with pre-trained models (GPU tests auto-skip).

Reduces total jobs from 11 to 9 while increasing actual test coverage.

Made-with: Cursor

* Fix GPU CI: set DEBIAN_FRONTEND=noninteractive to prevent tzdata hang

The deadsnakes PPA pulls in tzdata as a dependency, which triggers an
interactive timezone configuration prompt in the container. This caused
all 3 GPU matrix jobs to hang for 45 minutes until timeout.

Made-with: Cursor

* Add pull_request trigger and gate GPU jobs to push/merge_group only

Without the pull_request trigger, CI never fires on PRs — checks aren't
even planned (e.g. PR #9 shows zero checks). GPU jobs are gated to
push/merge_group events to avoid consuming self-hosted GPU runners on
every PR update.

Made-with: Cursor

* Remove event gate on GPU jobs so they run on PRs too

GPU jobs complete in ~5-10 minutes and serve as a useful pre-merge check.

Made-with: Cursor

* Remove pull-request/[0-9]+ from push trigger to fix duplicate CI runs

The copy-pr-bot creates pull-request/N branches for each PR, which
matched the push trigger and caused every CI job to run twice (once
from pull_request, once from push). The pull_request trigger already
covers PRs targeting main, so the push pattern is redundant.

Made-with: Cursor

* Fix GPU CI: gate on event type, restore push trigger for copy-pr-bot

NVIDIA self-hosted runners block pull_request events outright. GPU CI
must run via push events — either to main or to pull-request/[0-9]+
branches created by copy-pr-bot for PR testing.

- Restore "pull-request/[0-9]+" in push trigger
- Gate gpu-tests with if: github.event_name != 'pull_request'
- CPU jobs (inference-tests, unit-tests, etc.) still run on pull_request

Made-with: Cursor

* Remove pull-request/[0-9]+ push pattern and pull_request gate on GPU jobs

Simplify triggers: all jobs (including GPU) run on pull_request, push
to main, and merge_group. The pull-request/[0-9]+ branch convention is
not used by contributors.

Made-with: Cursor

* Merge unit-tests + inference-tests, gate GPU jobs from pull_request

- Combine unit-tests (py3.12) and inference-tests (py3.11/3.12/3.13)
  into a single unit-tests matrix job across all three Python versions.
  Both ran identical test suites with inference requirements.
- Re-add if: github.event_name != 'pull_request' on gpu-tests since
  NVIDIA self-hosted runners block pull_request events entirely.
  GPU CI runs on push to main and merge_group.

Made-with: Cursor

* Split GPU tests into separate workflow to avoid skipped PR noise

NVIDIA self-hosted runners block pull_request events, so GPU jobs in
the main CI workflow always showed as a single "Skipped" entry with
unresolved matrix names on every PR.

Move GPU jobs to ci-gpu.yml (triggers: push to main, merge_group,
workflow_dispatch). The main ci.yml keeps CPU jobs only (triggers:
pull_request, push to main, merge_group, workflow_dispatch).

Made-with: Cursor

* Enable GPU CI on PRs via copy-pr-bot push trigger

Add pull-request/[0-9]+ to ci-gpu.yml push trigger so GPU tests run
when copy-pr-bot creates the corresponding branch for a PR.

Made-with: Cursor

* Fix smoke test step: use bash shell for source command

The container default shell is sh, which doesn't have the source
builtin. Explicitly set shell: bash for the venv activation step.

Made-with: Cursor

* Install gcc in GPU container for torch.compile/inductor

The smoke training step uses torch.compile which invokes the inductor
backend, requiring a C compiler. The ubuntu:22.04 container doesn't
ship with gcc.

Made-with: Cursor

* Switch CPU jobs to NVIDIA self-hosted linux-amd64-cpu4 runners

Use nv-cpu-general runner group instead of GitHub-hosted ubuntu-latest.
Also restore pull-request/[0-9]+ push trigger in case self-hosted CPU
runners block pull_request events (same as GPU runners).

Made-with: Cursor

* Remove pull_request trigger since all runners are NVIDIA self-hosted

NVIDIA self-hosted runners block pull_request events. All CI (CPU and
GPU) now runs via copy-pr-bot push to pull-request/[0-9]+ branches.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants