WayneMao

Follow

Wayne Mao WayneMao

Follow

Ph.D student of Waseda University

18 followers · 18 following

https://scholar.google.com/citations?user=6-bYm5EAAAAJ

Achievements

Achievements

Stars

TianxingChen / Embodied-AI-Guide

具身智能入门指南 Embodied-AI-Guide

3,786 226 Updated Mar 21, 2025

moojink / openvla-oft

Forked from openvla/openvla

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Python 244 9 Updated Mar 26, 2025

graspnet / graspnet-baseline

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

Python 606 169 Updated Feb 17, 2025

jonyzhang2023 / awesome-embodied-vla-va-vln

301 10 Updated Mar 28, 2025

Westlake-AGI-Lab / Distill-Any-Depth

The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"

Python 473 26 Updated Mar 26, 2025

JCZ404 / Awesome-Visual-Autoregressive

Curated list of recent visual autoregressive (VAR) modeling works

29 Updated Mar 17, 2025

Robot-VLAs / RoboVLMs

Python 313 15 Updated Jan 24, 2025

zzli2022 / Awesome-System2-Reasoning-LLM

Latest Advances on System-2 Reasoning

Python 860 33 Updated Mar 27, 2025

WangRongsheng / awesome-LLM-resourses

🧑‍🚀 全世界最好的LLM资料总结（数据处理、模型训练、模型部署、o1 模型、MCP、小语言模型、视觉语言模型） | Summary of the world's best LLM resources.

4,509 470 Updated Mar 28, 2025

Jiayi-Pan / TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,399 1,442 Updated Mar 10, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 4,395 271 Updated Mar 24, 2025

linkangheng / Video-UTR

[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs

Shell 48 1 Updated Feb 27, 2025

qizekun / SoFar

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Python 124 8 Updated Mar 23, 2025

StarCycle / Awesome-Embodied-AI-Job

DeepTimber Robotics Talent Call | DeepTimber社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, full-time, etc

500 10 Updated Mar 28, 2025

Deep-Agent / R1-V

Witness the aha moment of VLM with less than $3.

Python 3,422 267 Updated Mar 1, 2025

thkkk / manibox

ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation

Python 41 1 Updated Mar 26, 2025

datawhalechina / self-llm

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调（全参数/Lora）、部署国内外开源大模型（LLM）/多模态大模型（MLLM）教程

Jupyter Notebook 14,205 1,624 Updated Mar 22, 2025

faressoft / terminalizer

🦄 Record your terminal and generate animated gif images or share a web player

JavaScript 15,569 508 Updated Aug 29, 2024

Physical-Intelligence / openpi

Python 2,744 254 Updated Mar 20, 2025

ActiveVisionLab / Awesome-LLM-3D

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

1,553 93 Updated Mar 18, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,924 2,212 Updated Feb 1, 2025

allenzren / open-pi-zero

Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence

Python 781 46 Updated Jan 31, 2025

mlzoo / BaoZaoAI

基于Qwen-2.5-1.5B 进行DPO fine-tuning后，意外说真话的AI暴躁哥

Jupyter Notebook 42 3 Updated Jan 18, 2025

LMM101 / Awesome-Multimodal-Next-Token-Prediction

[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

403 9 Updated Jan 17, 2025

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,962 112 Updated Jul 29, 2024

Ucas-HaoranWei / Slow-Perception

Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

Python 121 7 Updated Feb 17, 2025

NVIDIA / Cosmos

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,834 502 Updated Mar 27, 2025

DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 5,056 451 Updated Jan 22, 2025

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 28,146 3,511 Updated Jul 23, 2024

facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Python 1,375 63 Updated Mar 13, 2025