Add LMM Probe Results reproduction guide#79
Conversation
…folder Co-authored-by: anxiangsir <31175974+anxiangsir@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds reproduction instructions for the LMM Probe Results benchmarks that were previously displayed without guidance on how to reproduce them. The instructions guide users to use the llava_next folder for Stage-2 fine-tuning with a mixed dataset of 740K LLaVA-OneVision and 800K LLaVA-Video samples.
Changes:
- Added a new "Reproducing LMM Probe Results" subsection with step-by-step instructions
- Included Docker setup, data preparation, training, and evaluation commands
- Cross-referenced the llava_next README for detailed documentation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 4. **Run Stage-2 fine-tuning:** | ||
| ```bash | ||
| # Configure the training script with your data paths | ||
| bash scripts/sft_ov_encoder.sh | ||
| ``` |
There was a problem hiding this comment.
Step 4 mentions running Stage-2 fine-tuning, but the training script (scripts/sft_ov_encoder.sh line 30) requires a pretrained projector checkpoint that is not mentioned in these reproduction instructions. Users will need to either:
- Run Stage-1 pretraining first to generate the mm_projector.bin file, or
- Download a pretrained projector checkpoint
Consider adding a note about this prerequisite or adding a step for Stage-1 pretraining before Stage-2 fine-tuning.
| -v "$(pwd)":/workspace/OV-Encoder-Llava \ | ||
| -w /workspace/OV-Encoder-Llava \ | ||
| ov_encoder_llava:26.01 bash |
There was a problem hiding this comment.
The Docker command here is simplified compared to the official llava_next/README.md and may cause issues with the DeepSpeed-based Stage-2 fine-tuning in step 4. The training script uses DeepSpeed for distributed training, which typically requires additional Docker flags for proper IPC and memory management.
Consider using the complete Docker command from llava_next/README.md (lines 34-40) which includes:
- --cap-add IPC_LOCK for memory locking
- --ulimit memlock=-1 --ulimit stack=67108864 for memory limits
- bash -c "service ssh restart; bash" for multi-node training support
Alternatively, add a note that this simplified command is suitable for evaluation only, and users should refer to llava_next/README.md for the complete setup when running training.
| -v "$(pwd)":/workspace/OV-Encoder-Llava \ | |
| -w /workspace/OV-Encoder-Llava \ | |
| ov_encoder_llava:26.01 bash | |
| --cap-add IPC_LOCK \ | |
| --ulimit memlock=-1 --ulimit stack=67108864 \ | |
| -v "$(pwd)":/workspace/OV-Encoder-Llava \ | |
| -w /workspace/OV-Encoder-Llava \ | |
| ov_encoder_llava:26.01 bash -c "service ssh restart; bash" |
| ```bash | ||
| # Using Docker (recommended) | ||
| docker build -t ov_encoder_llava:26.01 . | ||
| docker run -it --gpus all --ipc host --net host --privileged \ |
There was a problem hiding this comment.
The recommended docker run command uses --privileged and --net host, which effectively disables container isolation and grants the container near-complete control over the host system and network stack. If the image or any of its dependencies are ever compromised (e.g., via a supply-chain attack or malicious contribution), an attacker running inside this container could escape to the host, access host files, services, and credentials, and fully compromise the machine. Please drop --privileged and --net host here and instead document a minimal set of flags required for GPU access and networking (relying on default isolation wherever possible).
| docker run -it --gpus all --ipc host --net host --privileged \ | |
| docker run -it --gpus all --ipc host \ |
The LMM Probe Results section displayed benchmark results but lacked reproduction instructions. Users needed clear guidance on using the llava-next folder for Stage-2 fine-tuning with the 740K+800K sample mixed dataset.
Changes
llava_nextfolderThe instructions follow the existing collapsible section pattern for consistency.
Example
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.