#

qwen2-vl

Here are 32 public repositories matching this topic...

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).

deploy llama lora embedding liger peft multimodal sft megatron distill rft llm internvl qwen2-vl qwen2-5 llama3-3 deepseek-r1 grpo open-r1

Updated Mar 21, 2025
Python

langmanus

langmanus / langmanus

A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, crawling, and Python code execution, while giving back to the community that made this possible.

agent automation ai agi multi-agent agents multi-agent-systems llm langchain qwen deepseek langgraph qwen-vl qwen2-vl deepseek-r1 deep-research

Updated Mar 21, 2025
Python

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2 qwen2-vl

Updated Mar 19, 2025
Python

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Mar 21, 2025
Python

2U1 / Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

chatbot multimodal vision-language vision-language-model qwen2-vl qwen2-5

Updated Mar 16, 2025
Python

NetEase-Media / grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

openai multi-modal phi function-call qwq ai-agent llm llama-index chatglm internvideo tensorrt-llm qwen2 llama3 internvl2 qwen2-vl deepseek-r1 janus-pro olmocr

Updated Mar 20, 2025
Python

arcstep / illufly

✨🦋 illufly 是自我进化的 Agent 框架: 基于自我进化，快速创造价值

agent ai growth openai multiagent gpt rag llm longtext qwen qwen2 dashscope glm-4 zhipu qwen2-vl illufly

Updated Mar 20, 2025
Python

drive-bench / toolkit

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

Updated Feb 22, 2025
Python

soulteary / dify-with-qwen-vl

视频理解：千问视频多模态模型 & Dify

dify qwen2 qwen2-vl

Updated Sep 2, 2024
Python

fireicewolf / wd-llm-caption-cli

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

image-caption wd14 llama3-vision florence-2 qwen2-vl joy-caption

Updated Mar 18, 2025
Python

shaadclt / Qwen2-VL-OCR-VQA

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.

optical-character-recognition visual-question-answering qwen2-vl

Updated Oct 18, 2024
Jupyter Notebook

see2023 / autoXHS

基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

spider selenium-webdriver xiaohongshu llm qwen2-vl

Updated Nov 6, 2024
Python

BUAADreamer / Qwen2-VL-History

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

beauty museum history supervised-finetuning mllm multimodal-large-language-models llama-factory qwen2-vl

Updated Sep 17, 2024

ZachcZhang / Qwen2-VL-inference

An open-source server implementation for inference Qwen2-VL series model using fastapi.

inference fastapi huggingface mllm qwen2-vl

Updated Nov 20, 2024
Python

Kazuhito00 / Qwen2-VL-Colaboratory-Sample

Colaboratory上でQwenLM/Qwen2-VLをお試しするサンプル

python vlm colaboratory qwen2-vl

Updated Sep 4, 2024
Jupyter Notebook

Valdanitooooo / chat_with_qwen2_vl_test

qwen2-vl

Updated Dec 27, 2024
Python

aws-samples / multi-modal-examples-for-amazon-sagemaker

A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.

sagemaker multi-modality sagemaker-example sagemaker-studio llm vllm video-llava internvl2 qwen2-vl

Updated Mar 16, 2025
Jupyter Notebook

Pavansomisetty21 / Qwen2-Vision-Finetuning-Unsloth---Maths-OCR-Formulae-Extraction-

we finetune unsloth llama model to extract mathematical fomulas in the images with optical character recognition(OCR)

ocr llama maths optical-character-recognition vlm ocr-recognition llm vision-language-model qwen2 unsloth qwen2-vl

Updated Jan 8, 2025
Jupyter Notebook

aws-samples / sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

swift aws llama idp document-processing fine-tuning multimodal sagemaker sft huggingface qwen2-vl

Updated Mar 19, 2025
Jupyter Notebook

PRITHIVSAKTHIUR / Multimodal-OCR

OCR Vision Language Model

ocr vlm vision-transformer multimodal-large-language-models qwen2-vl

Updated Mar 8, 2025
Python

Improve this page

Add a description, image, and links to the qwen2-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen2-vl topic, visit your repo's landing page and select "manage topics."