Prism

Windows Desktop AI Agent powered by Google Gemini 3

Overview

Prism is a windows desktop ai agent that can understand what is on your screen and safely automate real tasks across applications. It combines an electron overlay ui with a fastapi backend and a tool execution loop powered by the google-gemini-3-api.

Prism supports two high level behaviors:

chat for direct questions and summaries
task execution for multi step desktop automation with streaming progress, verification, and recovery

How-it-works

Prism is built as two cooperating services:

electron-app
- always on top desktop shell
- tray integration and global shortcut
- chat ui with task bubbles
- communicates with the backend over http and server-sent-events at http://127.0.0.1:8000
python-backend
- fastapi service for intent routing, planning, execution, streaming updates, and session capture
- desktop automation using pyautogui plus windows specific actions
- browser automation using keyboard and url flows and playwright actions
- visual memory capture and visual-dom scanning endpoints

high level execution loop

accept a user command in the ui
route into a fast path or a strict task path
plan actions with gemini and structured outputs
execute actions locally on the desktop or in the browser
stream task progress to the ui
verify outcomes and retry when needed

Features

core-execution

natural language command execution via /execute and /execute-stream
real time streaming task progress and reply chunks over server-sent-events
preset workflow execution via /execute-preset
stop and reset controls via /stop and /reset

fast-and-strict-intent-paths

fast direct chat path for question style prompts
fast app open path for open launch start <app>
strict browser navigation path for open <browser> and go to <site>
strict attached file summarize then open google doc flow using https://docs.new
strict attached file send flow using whatsapp web https://web.whatsapp.com

desktop-automation-actions

open and focus apps and windows
click type scroll drag drop
tab and window navigation, back forward refresh, address bar navigation
system actions including windows bluetooth toggle
file attachment helpers
- select a file in a dialog
- copy file to clipboard and paste

file-workflows

attach files from the ui using picker and drag drop
document centric flows for read summarize and downstream actions
attachment bubble lifecycle in the ui

visual-memory

periodic screenshot session capture default every 15s
ocr and redaction integration
endpoints
- /memory-status
- /memory-toggle
- /memory-clear
auto wipe by time window default 180 min
auto reset when storage cap is exceeded default 500 MB

visual-understanding and visual-dom

screen analysis and summarization actions
visual-dom scan and query endpoints
- /visual-dom/scan
- /visual-dom/elements
- /visual-dom/interactive
- /visual-dom/forms
- /visual-dom/find
- /visual-dom/overlay

electron-ux

global shortcut support default alt+space
settings panel and api key setup flow
task and status ui with streaming updates
automation overlay support through main process ipc

Quickstart-windows

prerequisites

windows 10 or 11
python 3.11 recommended
nodejs and npm
google-gemini-api-key from google-ai-studio

## setup
### prerequisites
- windows 10 or 11
- python 3.11
- nodejs and npm
- google-gemini-api-key from google-ai-studio

### 1 clone-the-repo
```powershell
git clone https://github.com/Manoj7ar/Prism.git
cd Prism

2 configure-env

create a file at python-backend/.env and add:

GEMINI_API_KEY=your_key_here
GEMINI_MODEL=gemini-3.0-flash
CORS_ORIGINS=http://127.0.0.1:3000

3 install-and-run-backend

open terminal 1:

cd python-backend
pip install -r requirements.txt
python -u main.py

confirm the backend is running:

http://127.0.0.1:8000/health

4 install-and-run-electron-ui

open terminal 2:

cd electron-app
npm install
npx electron .

5 quick-test

in prism, try:

open chrome and go to https://example.com
summarize what is on my screen
attach a pdf and run summarize this file in five bullets

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
electron-app		electron-app
python-backend		python-backend
shared		shared
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prism

Overview

How-it-works

Features

core-execution

fast-and-strict-intent-paths

desktop-automation-actions

file-workflows

visual-memory

visual-understanding and visual-dom

electron-ux

Quickstart-windows

prerequisites

2 configure-env

3 install-and-run-backend

4 install-and-run-electron-ui

5 quick-test

License

About

Uh oh!

Releases

Packages

Languages

Manoj7ar/Prism

Folders and files

Latest commit

History

Repository files navigation

Prism

Overview

How-it-works

Features

core-execution

fast-and-strict-intent-paths

desktop-automation-actions

file-workflows

visual-memory

visual-understanding and visual-dom

electron-ux

Quickstart-windows

prerequisites

2 configure-env

3 install-and-run-backend

4 install-and-run-electron-ui

5 quick-test

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages