A real-time voice assistant that captures screen context and provides intelligent responses using OpenAI's Realtime API and Supabase for data management.
- Node.js (v18 or higher)
- Python (3.8 or higher)
- Git
- Environment file (
.env- request from project owner)
- Homebrew (recommended for package management)
- Xcode Command Line Tools (install with
xcode-select --install)
- Visual Studio Build Tools or Visual Studio Community
- Windows PowerShell or Git Bash
git clone https://github.com/akeildev/seek-clarity.git
cd ClarityImportant: Request the .env file from the project owner. This file contains necessary API keys and configuration settings including:
- OpenAI API keys
- Supabase credentials
- Other service configurations
Once received, place the .env file in the root directory of the project.
npm installpip install -r requirements.txtIf requirements.txt doesn't exist, install these packages:
pip install openai python-dotenv pyaudio numpy websockets aiohttp-
Grant Permissions: The app requires screen recording and microphone permissions
- Go to System Settings > Privacy & Security
- Enable permissions for Terminal/IDE for:
- Screen Recording
- Microphone
-
Install Audio Dependencies (if needed):
brew install portaudio
-
Grant Permissions:
- Windows will prompt for microphone access on first run
- Grant the necessary permissions when prompted
-
Install Audio Dependencies (if needed):
- Download and install PyAudio wheel from here
- Or use:
pip install pipwin && pipwin install pyaudio
npm startpython src/voice_agent.pyClarity/
├── src/
│ ├── main/ # Electron main process
│ │ └── services/ # Core services
│ │ ├── capture.js # Desktop capture service
│ │ ├── websocket.js # Screenshot WebSocket server
│ │ └── livekit.js # Voice streaming service
│ ├── renderer/ # Electron renderer process
│ ├── preload/ # Electron preload scripts
│ ├── agent/ # Python voice agent
│ │ └── voice_agent.py # OpenAI Realtime integration
│ ├── mcp/ # MCP server
│ │ ├── server.js # STDIO MCP server
│ │ ├── tools.js # Tool handlers
│ │ └── database.js # Supabase integration
│ └── automation/ # Automation modules
│ └── screenshot.js # Screenshot automation
├── .env # Environment variables (get from owner)
├── package.json # Node.js dependencies
└── requirements.txt # Python dependencies
- Screen Capture: Desktop capture service with OpenAI Vision API analysis
- Voice Interaction: Real-time voice conversations using OpenAI Realtime API
- WebSocket Server: Real-time screenshot bridge on port 8765
- MCP Server Integration: STDIO-based MCP server for Supabase database operations
- Automation: Screenshot automation with voice command support
- Cross-platform: Works on both macOS and Windows
- "Permission denied" for screen recording: Restart the app after granting permissions in System Settings
- PyAudio installation fails: Install Xcode Command Line Tools and portaudio first
- PyAudio installation fails: Use the pre-built wheel file or pipwin
- Microphone not detected: Check Windows Sound Settings and ensure default input device is set
- Missing .env file: Contact the project owner for the environment configuration
- Port already in use: Check if another instance is running or change the port in the configuration
- API errors: Verify your API keys in the .env file are valid and have proper permissions
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
For issues or questions:
- Check the Issues page
- Contact the project owner for .env file and API access
MIT