Stars
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
Frontier Multimodal Foundation Models for Image and Video Understanding
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
🙃 A delightful community-driven (with 2,400+ contributors) framework for managing your zsh configuration. Includes 300+ optional plugins (rails, git, macOS, hub, docker, homebrew, node, php, python…
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
Solve Visual Understanding with Reinforced VLMs
Embodied Reasoning Question Answer (ERQA) Benchmark
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes.
⚡ Dynamically generated stats for your github readmes
Scripts for converting OpenX(rlds) dataset to LeRobot dataset.
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Janus-Series: Unified Multimodal Understanding and Generation Models
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Fully open reproduction of DeepSeek-R1
A generative world for general-purpose robotics & embodied AI learning.