update codec evaluation by anxiangsir · Pull Request #36 · EvolvingLMMs-Lab/OneVision-Encoder

anxiangsir · 2025-12-30T06:13:53Z

No description provided.

- Rename Docker image to `onevision-encoder:2601` - Add `--rm` flag to docker run command for cleaner cleanup - Refactor Dockerfile to improve layer caching and reduce size: - Install system deps and static ffmpeg binary - Use requirements.txt for Python dependencies - Set environment variables for better container behavior - Add `torchmetrics` to requirements.txt - Minor formatting fixes in README.md

…Encoder into anxiang_v2

Copilot

Pull request overview

This PR updates the codec evaluation infrastructure by modernizing naming conventions and streamlining the codebase. The changes rename "llava_vit" references to more descriptive names ("chunk_wise_sampling" and "ov_encoder"), remove obsolete evaluation scripts, update the Docker base image, and improve documentation formatting.

Key Changes:

Renamed model family from "llava_vit_sampling" to "chunk_wise_sampling" and "llava_vit_codec" to "ov_encoder_codec" across all evaluation scripts and Python code
Removed 7 obsolete evaluation shell scripts for llava_vit variants (base and large models with various frame configurations)
Updated Docker base image from PyTorch 2.7.0 to NVIDIA PyTorch 25.04 and added required system dependencies
Improved README documentation formatting and updated evaluation examples to reflect new naming conventions

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
eval_encoder/shells_eval_ap/eval_ov_encoder_large_8frames.sh	Updated MODEL_FAMILY from "llava_vit_sampling" to "chunk_wise_sampling"
eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh	Updated MODEL_FAMILY from "llava_vit_sampling" to "chunk_wise_sampling"
eval_encoder/shells_eval_ap/eval_llava_vit_large_8frames_hf.sh	Removed obsolete evaluation script for HuggingFace llava_vit_large model with 8 frames
eval_encoder/shells_eval_ap/eval_llava_vit_large_8frames.sh	Removed obsolete evaluation script for llava_vit_large model with 8 frames
eval_encoder/shells_eval_ap/eval_llava_vit_large_16frames_hf.sh	Removed obsolete evaluation script for HuggingFace llava_vit_large model with 16 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_8frames.sh	Removed obsolete evaluation script for llava_vit_base model with 8 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_64frames.sh	Removed obsolete evaluation script for llava_vit_base model with 64 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_32frames.sh	Removed obsolete evaluation script for llava_vit_base model with 32 frames
eval_encoder/shells_eval_ap/eval_llava_vit_base_16frames.sh	Removed obsolete evaluation script for llava_vit_base model with 16 frames
eval_encoder/attentive_probe_codec.py	Updated model family names throughout, changed default arguments to ov_encoder variants, removed llava_vit_si from supported models list, removed obsolete dataset configurations (COIN, jester), and cleaned up whitespace
eval_encoder/attentive_probe.py	Updated model family from "llava_vit_sampling" to "chunk_wise_sampling", reorganized imports, removed llava_vit_si from supported models, and removed obsolete comment
dockerfile	Updated base image from pytorch/pytorch:2.7.0 to nvcr.io/nvidia/pytorch:25.04-py3 and added libgl1 and libglib2.0-0 dependencies
README.md	Reformatted video processing pipeline documentation for better readability, simplified evaluation commands to use shell scripts, and updated codec evaluation example with new model family names
.gitignore	Added ckpts directory patterns and a misspelled .gitginore entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T06:16:44Z

@@ -272,25 +276,14 @@ Training configurations and hyperparameters will be documented soon. For now, pl
 To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:


The parameter description is misleading. The section is titled "Chunk-wise Sampling Evaluation" but the description says "To evaluate the encoder with uniform frame sampling". Based on the code changes where "llava_vit_sampling" was renamed to "chunk_wise_sampling", this should describe "chunk-wise sampling" or the actual sampling strategy used, not "uniform frame sampling".

Suggested change

To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:

To evaluate the encoder with chunk-wise sampling, first navigate to the evaluation directory:

Copilot · 2025-12-30T06:16:44Z


 **Codec-Specific Parameters:**
+- `K_keep`: Number of patches to keep.
 - `cache_dir`: Directory for cached codec patches. This is where the codec-selected patches will be stored/loaded.


The documentation mentions cache_dir parameter as a codec-specific parameter, but this parameter is not present in the example command shown above (lines 303-319). Either the parameter should be added to the example command, or the documentation should clarify that it's optional.

Suggested change

- `cache_dir`: Directory for cached codec patches. This is where the codec-selected patches will be stored/loaded.

- `cache_dir` (optional): Directory for cached codec patches. Use this to specify where codec-selected patches are stored/loaded when you want to persist or reuse them.

Copilot · 2025-12-30T06:16:45Z

-  --model_name hf_llava_vit_large_ln \
-  --embedding_size 1024 \
-  --frames_token_num 256
+bash eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh


The command references a file path that doesn't match the actual location. The command shows bash eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh but since the instructions say to first navigate to the eval_encoder directory with cd eval_encoder, the path should be bash shells_eval_ap/eval_ov_encoder_large_16frames.sh (without the eval_encoder/ prefix).

Suggested change

bash eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh

bash shells_eval_ap/eval_ov_encoder_large_16frames.sh

Copilot · 2025-12-30T06:16:45Z

+from model_factory.layers import (Siglip2MultiheadAttentionPoolingHead,
+                                  Siglip2TransformerAttentionPoolingHead)


Import of 'Siglip2TransformerAttentionPoolingHead' is not used.

Suggested change

from model_factory.layers import (Siglip2MultiheadAttentionPoolingHead,

Siglip2TransformerAttentionPoolingHead)

from model_factory.layers import Siglip2MultiheadAttentionPoolingHead

anxiangsir added 5 commits December 29, 2025 21:00

docs: enhance video processing pipeline section in README.md

897848d

Refactor: Remove unused dataset loaders and utility functions

e216034

Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/OneVision-…

5996db6

…Encoder into anxiang_v2

updated

5c794b1

Copilot AI review requested due to automatic review settings December 30, 2025 06:13

anxiangsir changed the title ~~update codec evalulation~~ update codec evaluation Dec 30, 2025

Copilot started reviewing on behalf of anxiangsir December 30, 2025 06:14 View session

Copilot AI reviewed Dec 30, 2025

View reviewed changes

FeilongTangmonash merged commit 944fa73 into main Dec 30, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update codec evaluation#36

update codec evaluation#36
FeilongTangmonash merged 5 commits intomainfrom
anxiang_v2

anxiangsir commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -272,25 +276,14 @@ Training configurations and hyperparameters will be documented soon. For now, pl
		To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:

	To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:
	To evaluate the encoder with chunk-wise sampling, first navigate to the evaluation directory:

	- `cache_dir`: Directory for cached codec patches. This is where the codec-selected patches will be stored/loaded.
	- `cache_dir` (optional): Directory for cached codec patches. Use this to specify where codec-selected patches are stored/loaded when you want to persist or reuse them.

	bash eval_encoder/shells_eval_ap/eval_ov_encoder_large_16frames.sh
	bash shells_eval_ap/eval_ov_encoder_large_16frames.sh

		from model_factory.layers import (Siglip2MultiheadAttentionPoolingHead,
		Siglip2TransformerAttentionPoolingHead)

	from model_factory.layers import (Siglip2MultiheadAttentionPoolingHead,
	Siglip2TransformerAttentionPoolingHead)
	from model_factory.layers import Siglip2MultiheadAttentionPoolingHead

Conversation

anxiangsir commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants