zer0 is an AI-powered browser extension that acts as a bridge between your thoughts and actions. Using advanced language models (OpenAI/Azure OpenAI), zer0 can understand natural language instructions and autonomously perform tasks in your browser.
- AI-Powered Automation: Execute complex browser tasks using natural language commands
- Intelligent DOM Interaction: Automatically clicks, fills forms, and navigates websites
- Task History Tracking: Monitor and review all actions performed by the agent
- Voice Input Support: Control the extension using voice commands
- Multi-Model Support: Compatible with OpenAI and Azure OpenAI services
- Privacy-Focused: All processing happens locally in your browser
- Frontend: React 18, TypeScript, Chakra UI
- State Management: Zustand with Immer
- AI Integration: OpenAI API, Azure OpenAI
- Build Tools: Webpack, Babel
- Testing: Jest
- Extension Platform: Chrome Extension Manifest V3
browser-extension-master/
βββ src/
β βββ common/ # Reusable React components
β β βββ App.tsx
β β βββ TaskUI.tsx
β β βββ TaskHistory.tsx
β β βββ ModelDropdown.tsx
β βββ helpers/ # Core logic and utilities
β β βββ availableActions.ts # Defines browser actions
β β βββ determineNextAction.ts # AI decision logic
β β βββ domActions.ts # DOM manipulation
β β βββ parseResponse.ts # AI response parsing
β β βββ simplifyDom.ts # DOM simplification
β βββ pages/ # Extension pages
β β βββ Background/ # Service worker
β β βββ Content/ # Content script
β β βββ Popup/ # Extension popup
β β βββ Options/ # Settings page
β β βββ Panel/ # DevTools panel
β βββ state/ # Zustand state stores
β β βββ currentTask.ts
β β βββ settings.ts
β β βββ ui.ts
β βββ manifest.json # Extension manifest
βββ utils/ # Build scripts
βββ package.json
βββ webpack.config.js
- Led the architecture and development of the AI agent system
- Implemented OpenAI/Azure OpenAI integration
- Designed the task execution pipeline
- Developed the DOM interaction and manipulation system
- Built the content script injection mechanism
- Optimized the browser automation engine
- Implemented testing framework and wrote unit tests
- Focused on debugging and error handling
- Enhanced the deployment and build pipeline
- Created the UI components and user interface
- Developed the responsive layout using Chakra UI
- Implemented the task history and status displays
- Node.js (v14 or higher)
- pnpm (recommended) or npm
- A Chromium-based browser (Chrome, Edge, Brave, etc.)
- OpenAI API key or Azure OpenAI credentials
-
Clone the repository
git clone https://github.com/avnishsinha/Zero-.git cd Zero-/browser-extension-master -
Install dependencies
pnpm install # or npm install -
Build the extension
pnpm build # or npm run build -
Load the extension in Chrome
- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode" (toggle in top-right corner)
- Click "Load unpacked"
- Select the
buildfolder from the project directory
- Open Chrome and navigate to
-
Configure API credentials
- Click the extension icon in your browser toolbar
- Go to Options/Settings
- Enter your OpenAI API key or Azure OpenAI credentials
- Select your preferred model
-
Activate the extension
- Use the keyboard shortcut:
Ctrl+Shift+Y(Windows/Linux) orCmd+Shift+Y(Mac) - Or click the extension icon in your toolbar
- Use the keyboard shortcut:
-
Enter a task
- Type a natural language instruction (e.g., "Fill out this form with my information")
- Or use voice input to describe the task
-
Watch zer0 work
- The AI agent will analyze the page
- Execute actions step-by-step
- Display progress in the task history
-
Review results
- Check the task history to see all actions performed
- Review any errors or completion messages
- "Click the login button"
- "Fill in the email field with example@email.com"
- "Navigate to the pricing page"
- "Search for AI tools on this website"
pnpm start
# or
npm startpnpm test
# or
npm testpnpm prettier
# or
npm run prettierpnpm lint
# or
npm run lintThe AI agent can perform the following actions on web pages:
- click: Click on any element
- setValue: Fill input fields with text
- finish: Mark the task as complete
- fail: Indicate inability to complete the task
More actions can be added by extending the availableActions array in the codebase.
- Add more browser actions (hover, scroll, drag-and-drop)
- Support for Firefox and Safari
- Enhanced error recovery and retry logic
- Local model support (Ollama, LM Studio)
- Multi-step task planning and execution
- Screenshot and vision capabilities
- Task templates and saved workflows
- Collaborative task sharing
- Data Privacy: All AI processing happens through your API keys; we don't store or access your data
- API Keys: Your API keys are stored locally in the browser's secure storage
- Permissions: The extension requires broad permissions to interact with web pages as requested
- Review Actions: Always review the task history to understand what actions were performed
Contributions are welcome! Please feel free to submit issues or pull requests.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
For inquiries or collaboration:
- Email: aks526@nau.edu
- LinkedIn: https://www.linkedin.com/in/avnishkumarsinha/
This project is licensed under the MIT License.
- OpenAI and Azure OpenAI for providing the AI capabilities
- The Chrome Extensions community for excellent documentation
- All contributors and testers who helped improve zer0
zer0 - Bridging the gap between your thoughts and browser actions π