#

vision-language-model

Here are 131 public repositories matching this topic...

zytx121 / Awesome-VLGFM

A Survey on Vision-Language Geo-Foundation Models (VLGFMs)

survey remote-sensing foundation-models vision-language-model

Updated Jun 19, 2024

MbodiAI / mbodied-agents

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal large-language-models llm generative-ai vision-language-model

Updated Jun 18, 2024
Python

llm-jp / awesome-japanese-llm

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated Jun 17, 2024
TypeScript

YunzeMan / Situation3D

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

deep-learning multimodal-learning multi-modal-learning 3d-scene-understanding vision-language-model vision-language-learning

Updated Jun 17, 2024
Python

OpenGVLab / MM-NIAH

This is the official implementation of the paper "Needle In A Multimodal Haystack"

benchmark long-context vision-language-model multimodal-large-language-models

Updated Jun 17, 2024
Python

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification mme image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated Jun 16, 2024
Python

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Jun 16, 2024
C++

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

chatbot llama multimodal gpt-4 chatgpt vision-language-model vision-language-learning large-multimodal-models

Updated Jun 16, 2024
Python

Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

mlx vision-framework apple-silicon vision-transformer llm vision-language-model llava local-ai idefics paligemma

Updated Jun 15, 2024
Python

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated Jun 14, 2024
Markdown

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Jun 14, 2024
Python

BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Jun 14, 2024
Python

InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Jun 14, 2024
Python

linzhiqiu / t2v_metrics

Evaluating text-to-image/video/3D models with VQAScore

generative-ai vision-language-model

Updated Jun 14, 2024
Python

ashleykleynhans / llava-docker

Docker image for LLaVA: Large Language and Vision Assistant

docker ai docker-image chatbot llama multimodal gpt-4 foundation-models visual-language-learning llm runpod chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Jun 13, 2024
Shell

richard-peng-xia / CARES

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

trustworthy-ai vision-language-model large-vision-language-model medical-multimodal-learning

Updated Jun 13, 2024

zhengli97 / PromptKD

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

clip knowledge-distillation multi-modal-learning prompt-learning vision-language-model cvpr2024

Updated Jun 13, 2024
Python

anishmadan23 / foundational_fsod

This repository contains the implementation for the paper "Revisiting Few Shot Object Detection with Vision-Language Models"

benchmark object-detection few-shot-learning few-shot-object-detection lvis nuimages foundation-models vision-language-model

Updated Jun 13, 2024
Python

lhurr / GroundingDINO-BrainHack

Vision Language Model fine-tuning for TIL AI BrainHack - Advanced Track

vision-language-model

Updated Jun 12, 2024
Python

xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation of LLaVA-NeXT.

chatbot llama multimodal multi-modality gpt-4 visual-language-learning chatgpt vision-language-model llava large-multimodal-models llama3 gpt4o llava-next

Updated Jun 12, 2024
Python

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."