An open-source implementation for training LLaVA-NeXT.
-
Updated
Oct 23, 2024 - Python
An open-source implementation for training LLaVA-NeXT.
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Matryoshka Multimodal Models
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
[CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".
Visual Language Model focusing on testing different parsing techniques from generated responses
NoteMR enhances multimodal large language models for visual question answering by integrating structured notes. This implementation aims to reduce reasoning errors and improve visual feature perception. 🐙📚
Add a description, image, and links to the llava-next topic page so that developers can more easily learn about it.
To associate your repository with the llava-next topic, visit your repo's landing page and select "manage topics."