Skip to content

update codec evaluation#36

Merged
FeilongTangmonash merged 5 commits intomainfrom
anxiang_v2
Dec 30, 2025
Merged

update codec evaluation#36
FeilongTangmonash merged 5 commits intomainfrom
anxiang_v2

Conversation

@anxiangsir
Copy link
Copy Markdown
Collaborator

No description provided.

- Rename Docker image to `onevision-encoder:2601`
- Add `--rm` flag to docker run command for cleaner cleanup
- Refactor Dockerfile to improve layer caching and reduce size:
  - Install system deps and static ffmpeg binary
  - Use requirements.txt for Python dependencies
  - Set environment variables for better container behavior
- Add `torchmetrics` to requirements.txt
- Minor formatting fixes in README.md
Copilot AI review requested due to automatic review settings December 30, 2025 06:13
@anxiangsir anxiangsir changed the title update codec evalulation update codec evaluation Dec 30, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the codec evaluation infrastructure by modernizing naming conventions and streamlining the codebase. The changes rename "llava_vit" references to more descriptive names ("chunk_wise_sampling" and "ov_encoder"), remove obsolete evaluation scripts, update the Docker base image, and improve documentation formatting.

Key Changes:

  • Renamed model family from "llava_vit_sampling" to "chunk_wise_sampling" and "llava_vit_codec" to "ov_encoder_codec" across all evaluation scripts and Python code
  • Removed 7 obsolete evaluation shell scripts for llava_vit variants (base and large models with various frame configurations)
  • Updated Docker base image from PyTorch 2.7.0 to NVIDIA PyTorch 25.04 and added required system dependencies
  • Improved README documentation formatting and updated evaluation examples to reflect new naming conventions

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
eval_encoder/shells_eval_ap/eval_ov_encoder_large_8frames.sh Updated MODEL_FAMILY from "llava_vit_sampling" to "chunk_wise_sampling"
eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh Updated MODEL_FAMILY from "llava_vit_sampling" to "chunk_wise_sampling"
eval_encoder/shells_eval_ap/eval_llava_vit_large_8frames_hf.sh Removed obsolete evaluation script for HuggingFace llava_vit_large model with 8 frames
eval_encoder/shells_eval_ap/eval_llava_vit_large_8frames.sh Removed obsolete evaluation script for llava_vit_large model with 8 frames
eval_encoder/shells_eval_ap/eval_llava_vit_large_16frames_hf.sh Removed obsolete evaluation script for HuggingFace llava_vit_large model with 16 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_8frames.sh Removed obsolete evaluation script for llava_vit_base model with 8 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_64frames.sh Removed obsolete evaluation script for llava_vit_base model with 64 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_32frames.sh Removed obsolete evaluation script for llava_vit_base model with 32 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_16frames.sh Removed obsolete evaluation script for llava_vit_base model with 16 frames
eval_encoder/attentive_probe_codec.py Updated model family names throughout, changed default arguments to ov_encoder variants, removed llava_vit_si from supported models list, removed obsolete dataset configurations (COIN, jester), and cleaned up whitespace
eval_encoder/attentive_probe.py Updated model family from "llava_vit_sampling" to "chunk_wise_sampling", reorganized imports, removed llava_vit_si from supported models, and removed obsolete comment
dockerfile Updated base image from pytorch/pytorch:2.7.0 to nvcr.io/nvidia/pytorch:25.04-py3 and added libgl1 and libglib2.0-0 dependencies
README.md Reformatted video processing pipeline documentation for better readability, simplified evaluation commands to use shell scripts, and updated codec evaluation example with new model family names
.gitignore Added ckpts directory patterns and a misspelled .gitginore entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
@@ -272,25 +276,14 @@ Training configurations and hyperparameters will be documented soon. For now, pl
To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter description is misleading. The section is titled "Chunk-wise Sampling Evaluation" but the description says "To evaluate the encoder with uniform frame sampling". Based on the code changes where "llava_vit_sampling" was renamed to "chunk_wise_sampling", this should describe "chunk-wise sampling" or the actual sampling strategy used, not "uniform frame sampling".

Suggested change
To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:
To evaluate the encoder with chunk-wise sampling, first navigate to the evaluation directory:

Copilot uses AI. Check for mistakes.
Comment thread README.md

**Codec-Specific Parameters:**
- `K_keep`: Number of patches to keep.
- `cache_dir`: Directory for cached codec patches. This is where the codec-selected patches will be stored/loaded.
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions cache_dir parameter as a codec-specific parameter, but this parameter is not present in the example command shown above (lines 303-319). Either the parameter should be added to the example command, or the documentation should clarify that it's optional.

Suggested change
- `cache_dir`: Directory for cached codec patches. This is where the codec-selected patches will be stored/loaded.
- `cache_dir` (optional): Directory for cached codec patches. Use this to specify where codec-selected patches are stored/loaded when you want to persist or reuse them.

Copilot uses AI. Check for mistakes.
Comment thread README.md
--model_name hf_llava_vit_large_ln \
--embedding_size 1024 \
--frames_token_num 256
bash eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command references a file path that doesn't match the actual location. The command shows bash eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh but since the instructions say to first navigate to the eval_encoder directory with cd eval_encoder, the path should be bash shells_eval_ap/eval_ov_encoder_large_16frames.sh (without the eval_encoder/ prefix).

Suggested change
bash eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh
bash shells_eval_ap/eval_ov_encoder_large_16frames.sh

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +23
from model_factory.layers import (Siglip2MultiheadAttentionPoolingHead,
Siglip2TransformerAttentionPoolingHead)
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Siglip2TransformerAttentionPoolingHead' is not used.

Suggested change
from model_factory.layers import (Siglip2MultiheadAttentionPoolingHead,
Siglip2TransformerAttentionPoolingHead)
from model_factory.layers import Siglip2MultiheadAttentionPoolingHead

Copilot uses AI. Check for mistakes.
@FeilongTangmonash FeilongTangmonash merged commit 944fa73 into main Dec 30, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants