Skip to content

WIP: SAM2 Hiera backbone improvements#2

Merged
PABannier merged 4 commits intomainfrom
wip/sam2-improvements
Mar 31, 2026
Merged

WIP: SAM2 Hiera backbone improvements#2
PABannier merged 4 commits intomainfrom
wip/sam2-improvements

Conversation

@PABannier
Copy link
Copy Markdown
Owner

Summary

  • SAM2 Hiera backbone refinements and bug fixes
  • Benchmark improvements for all model variants
  • Updated plans for SAM2 support and optimization

Test plan

  • Run benchmark suite with SAM2 models
  • Verify video tracking pipeline

🤖 Generated with Claude Code

Work-in-progress changes to SAM2 support including Hiera backbone
refinements, benchmark improvements, and plan updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@PABannier PABannier force-pushed the wip/sam2-improvements branch from b9399e9 to 52eab22 Compare March 31, 2026 16:30
PABannier and others added 3 commits March 31, 2026 18:33
The cached graph and tensor_copy_async optimizations in the WIP commit
caused tracking masks to degrade to zero within 5 frames. Reverting to
the working (non-cached) implementation until these optimizations can
be properly debugged.

The three video tracking bug fixes from the previous commit remain
intact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds sam3_params.encode_img_size to override the model's native image
resolution at runtime (e.g. 512 instead of 1024).  When set, the Hiera
backbone, FPN neck, SAM decoder, memory encoder, and all PE caches use
the effective resolution, producing ~4x faster encoding at half the
spatial dimensions.

All pipeline functions (propagate_single, encode_memory, segment_pvs,
build_prompt_and_pos, build_sam_dec_graph, populate_pe_cache,
ensure_tracker_pe_caches) now accept or derive effective feat_size from
state rather than hardcoding hp.feat_size().

Verified: tracking works at both 1024 (native) and 512 (half-res) on
SAM2.1 tiny f32 with the bedroom video.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
main_video.cpp was not parsing --encode-img-size, so the parameter was
always 0 (default = native resolution).  Also fixed the SAM3 ViT encode
path to use sam3_eff_img_size() instead of hardcoded hp.img_size.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@PABannier PABannier merged commit 9bfe6f5 into main Mar 31, 2026
@PABannier PABannier deleted the wip/sam2-improvements branch April 3, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant