EVE Series: Encoder-Free Vision-Language Models from BAAI
-
Updated
Mar 1, 2025 - Python
EVE Series: Encoder-Free Vision-Language Models from BAAI
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
[CVPR 2025] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
[ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
This is a curated list of "Continual Learning with Pretrained Models" research.
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24
Add a description, image, and links to the vision-language-models topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-models topic, visit your repo's landing page and select "manage topics."