Lists (1)
Sort Name ascending (A-Z)
Stars
[CVPR'25] Official Implementation of MambaIC: State Space Models for High-Performance Learned Image Compression
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
Multimodal Large Language Models for Code Generation under Multimodal Scenarios
Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
Solve Visual Understanding with Reinforced VLMs
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Latest Advances on System-2 Reasoning
Official code repo for the paper "ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving" (previously "Tooling or Not Tooling? The Impact of Tools on Language Agents fo…
[ICLR 2025] Official Implementation of Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection
This repository collects awesome survey, resource, and paper for lifelong learning LLM agents
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
Learning to Use Medical Tools with Multi-modal Agent
The official implementation of S2TD-Face in ACM-MM 2024.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Mod…
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation
Reading list for research topics in multimodal machine learning
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.