Skip to content
View WayneMao's full-sized avatar

Block or report WayneMao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

具身智能入门指南 Embodied-AI-Guide

3,786 226 Updated Mar 21, 2025

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Python 244 9 Updated Mar 26, 2025

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

Python 606 169 Updated Feb 17, 2025

The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"

Python 473 26 Updated Mar 26, 2025

Curated list of recent visual autoregressive (VAR) modeling works

29 Updated Mar 17, 2025
Python 313 15 Updated Jan 24, 2025

Latest Advances on System-2 Reasoning

Python 860 33 Updated Mar 27, 2025

🧑‍🚀 全世界最好的LLM资料总结(数据处理、模型训练、模型部署、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.

4,509 470 Updated Mar 28, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,399 1,442 Updated Mar 10, 2025

Solve Visual Understanding with Reinforced VLMs

Python 4,395 271 Updated Mar 24, 2025

[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs

Shell 48 1 Updated Feb 27, 2025

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Python 124 8 Updated Mar 23, 2025

DeepTimber Robotics Talent Call | DeepTimber社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, full-time, etc

500 10 Updated Mar 28, 2025

Witness the aha moment of VLM with less than $3.

Python 3,422 267 Updated Mar 1, 2025

ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation

Python 41 1 Updated Mar 26, 2025

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 14,205 1,624 Updated Mar 22, 2025

🦄 Record your terminal and generate animated gif images or share a web player

JavaScript 15,569 508 Updated Aug 29, 2024

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

1,553 93 Updated Mar 18, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,924 2,212 Updated Feb 1, 2025

Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence

Python 781 46 Updated Jan 31, 2025

基于Qwen-2.5-1.5B 进行DPO fine-tuning后,意外说真话的AI暴躁哥

Jupyter Notebook 42 3 Updated Jan 18, 2025

[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

403 9 Updated Jan 17, 2025

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,962 112 Updated Jul 29, 2024

Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

Python 121 7 Updated Feb 17, 2025

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,834 502 Updated Mar 27, 2025

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 5,056 451 Updated Jan 22, 2025

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 28,146 3,511 Updated Jul 23, 2024

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Python 1,375 63 Updated Mar 13, 2025
Next
Showing results