The Open All-in-One Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
-
Updated
Jun 29, 2025 - TypeScript
The Open All-in-One Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.
An open-sourced end-to-end VLM-based GUI Agent
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use" (ACL 2025 Oral).
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
Create your self-hosted, open-source Operator model.
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
Official repository of the paper "Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms"
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.
Release of code, datasets and model for our work TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials
Create your self-hosted, open-source Operator model.
Add a description, image, and links to the gui-agent topic page so that developers can more easily learn about it.
To associate your repository with the gui-agent topic, visit your repo's landing page and select "manage topics."