CAT is a system designed to automate computer interactions using advanced AI vision and natural language processing. The system operates in two parts: a GPU-powered server component (OmniParser) for AI processing, and a local component (LocalExecutor) for GUI automation.
CAT_Demo.mp4
The system requires two separate installations:
- Server setup: OmniParser/README.md
- Client setup: LocalExecutor/README.md
- Start the server:
# On the GPU server
cd OmniParser
python CAT_server.py- Start the client:
# On your local machine
cd LocalExecutor
python CAT_local.py- LocalExecutor captures screen information and sends it to OmniParser
- OmniParser processes the image using AI vision models
- Natural language instructions are interpreted into computer actions
- LocalExecutor executes the determined actions
- The process repeats for continuous automation
This project builds upon the OmniParser framework by Microsoft. We thank the original authors for their groundbreaking work in computer vision and automation.