Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
Model merging is a highly efficient approach for long-to-short reasoning.
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…
Understanding R1-Zero-Like Training: A Critical Perspective
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
Awesome-LLM: a curated list of Large Language Model
Solve Visual Understanding with Reinforced VLMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Align Anything: Training All-modality Model with Feedback
FineReason: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
Latest Advances on System-2 Reasoning
Fully open reproduction of DeepSeek-R1
Efficient triton implementation of Native Sparse Attention.
A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)
Official Repo for Open-Reasoner-Zero
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
🧑🚀 全世界最好的LLM资料总结(数据处理、模型训练、模型部署、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
A suite of image and video neural tokenizers
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
A fork to add multimodal model training to open-r1
Witness the aha moment of VLM with less than $3.
Frontier Multimodal Foundation Models for Image and Video Understanding