Skip to content

Add LMM Probe Results reproduction guide#79

Merged
anxiangsir merged 2 commits intomainfrom
copilot/train-model-on-lm-probe
Feb 6, 2026
Merged

Add LMM Probe Results reproduction guide#79
anxiangsir merged 2 commits intomainfrom
copilot/train-model-on-lm-probe

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 6, 2026

The LMM Probe Results section displayed benchmark results but lacked reproduction instructions. Users needed clear guidance on using the llava-next folder for Stage-2 fine-tuning with the 740K+800K sample mixed dataset.

Changes

  • Added reproduction subsection under LMM Probe Results with prominent note directing users to llava_next folder
  • Included step-by-step workflow: Docker setup → data preparation → Stage-2 fine-tuning → evaluation
  • Cross-referenced llava-next README for detailed training data format and troubleshooting

The instructions follow the existing collapsible section pattern for consistency.

Example

cd llava_next
bash scripts/sft_ov_encoder.sh
TASKS="videomme" bash scripts/eval/eval_ov_encoder.sh
Original prompt

LMM Probe Results
We train the model on a mixed dataset comprising 740K samples from LLaVA-OneVision and 800K samples from LLaVA-Video SFT, proceeding directly to Stage-2 fine-tuning. Following a streamlined native-resolution strategy inspired by LLaVA-OneVision, input frames that match the model’s native resolution are fed directly into the network without tiling or cropping, allowing us to fully evaluate the ViT’s native-resolution modeling capability.

这里,强调下,使用llava-next文件夹的内容来复现


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…folder

Co-authored-by: anxiangsir <31175974+anxiangsir@users.noreply.github.com>
Copilot AI changed the title [WIP] Train model with LLaVA mixed dataset for probing Add LMM Probe Results reproduction guide Feb 6, 2026
Copilot AI requested a review from anxiangsir February 6, 2026 06:11
@anxiangsir anxiangsir marked this pull request as ready for review February 6, 2026 06:13
Copilot AI review requested due to automatic review settings February 6, 2026 06:13
@anxiangsir anxiangsir merged commit c453abf into main Feb 6, 2026
3 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds reproduction instructions for the LMM Probe Results benchmarks that were previously displayed without guidance on how to reproduce them. The instructions guide users to use the llava_next folder for Stage-2 fine-tuning with a mixed dataset of 740K LLaVA-OneVision and 800K LLaVA-Video samples.

Changes:

  • Added a new "Reproducing LMM Probe Results" subsection with step-by-step instructions
  • Included Docker setup, data preparation, training, and evaluation commands
  • Cross-referenced the llava_next README for detailed documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment on lines +147 to +151
4. **Run Stage-2 fine-tuning:**
```bash
# Configure the training script with your data paths
bash scripts/sft_ov_encoder.sh
```
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 4 mentions running Stage-2 fine-tuning, but the training script (scripts/sft_ov_encoder.sh line 30) requires a pretrained projector checkpoint that is not mentioned in these reproduction instructions. Users will need to either:

  1. Run Stage-1 pretraining first to generate the mm_projector.bin file, or
  2. Download a pretrained projector checkpoint

Consider adding a note about this prerequisite or adding a step for Stage-1 pretraining before Stage-2 fine-tuning.

Copilot uses AI. Check for mistakes.
Comment thread README.md
Comment on lines +136 to +138
-v "$(pwd)":/workspace/OV-Encoder-Llava \
-w /workspace/OV-Encoder-Llava \
ov_encoder_llava:26.01 bash
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker command here is simplified compared to the official llava_next/README.md and may cause issues with the DeepSpeed-based Stage-2 fine-tuning in step 4. The training script uses DeepSpeed for distributed training, which typically requires additional Docker flags for proper IPC and memory management.

Consider using the complete Docker command from llava_next/README.md (lines 34-40) which includes:

  • --cap-add IPC_LOCK for memory locking
  • --ulimit memlock=-1 --ulimit stack=67108864 for memory limits
  • bash -c "service ssh restart; bash" for multi-node training support

Alternatively, add a note that this simplified command is suitable for evaluation only, and users should refer to llava_next/README.md for the complete setup when running training.

Suggested change
-v "$(pwd)":/workspace/OV-Encoder-Llava \
-w /workspace/OV-Encoder-Llava \
ov_encoder_llava:26.01 bash
--cap-add IPC_LOCK \
--ulimit memlock=-1 --ulimit stack=67108864 \
-v "$(pwd)":/workspace/OV-Encoder-Llava \
-w /workspace/OV-Encoder-Llava \
ov_encoder_llava:26.01 bash -c "service ssh restart; bash"

Copilot uses AI. Check for mistakes.
Comment thread README.md
```bash
# Using Docker (recommended)
docker build -t ov_encoder_llava:26.01 .
docker run -it --gpus all --ipc host --net host --privileged \
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended docker run command uses --privileged and --net host, which effectively disables container isolation and grants the container near-complete control over the host system and network stack. If the image or any of its dependencies are ever compromised (e.g., via a supply-chain attack or malicious contribution), an attacker running inside this container could escape to the host, access host files, services, and credentials, and fully compromise the machine. Please drop --privileged and --net host here and instead document a minimal set of flags required for GPU access and networking (relying on default isolation wherever possible).

Suggested change
docker run -it --gpus all --ipc host --net host --privileged \
docker run -it --gpus all --ipc host \

Copilot uses AI. Check for mistakes.
@anxiangsir anxiangsir deleted the copilot/train-model-on-lm-probe branch March 2, 2026 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants