WebWand is a tool that redefines web interaction, making complex online tasks as simple as uttering a single command.
Crafted on top of the innovative fusion of multi-modal Large Language Models (GPT-4V), WebWand embodies a sophisticated Web AI Partner. Imagine having an intelligent companion that not only grasps your intent but also possesses a broad awareness of website content, enabling it to autonomously execute tasks on your behalf and augment your workflow. With WebWand, this vision becomes reality.
WebWand leverages the power of multi-modal Large Language Model, DOM state awareness, and semantic understanding of HTML to focus on essential webpage elements while filtering out noise.
Here is an example of WebWand annotating the website to better understand the environment.
It also features a unique "Prior Knowledge Augmentation" system that allows WebWand to navigate websites with the wisdom of collective past experiences, crowdsourced from WebWand users.
We do NOT collect your screenshots, browsing information or your prompts. It lives in your browser and is directly sent to the LLM API of your choice.
- Go to the releases page, find the latest version of the extension and download "webwand.zip".
- Unzip the file.
- Load your extension on Chrome by doing the following:
- Navigate to
chrome://extensions/
- Toggle
Developer mode
- Click on
Load unpacked extension
- Select the unzipped folder
- Navigate to
Please note that you might need to refresh the page for the extension to work.
- Find the WebWand icon in the top right corner of your browser and click on it to open the sidepanel.
- The next thing you need to do is create or access an existing OpenAI API Key and paste it in the provided box. This key will be stored securely in your browser, and will not be uploaded to a third party.
- Finally, navigate to a webpage you want WebWand and type in the task you want it to perform.
If you want to build the extension from source, follow these instructions:
- Ensure you have Node.js. The development was done on Node v20 but it should work with some lower versions.
- Clone this repository
- Install
pnpm
globally:npm install -g pnpm
- Run
pnpm install
- Run
pnpm dev
to start the development server, orpnpm build
to build the extension.
When loading the extension, you will need to load the dist
folder created by the build process.
- Expose API for easy integration with browser automation frameworks (e.g. Puppeteer, Playwright, Selenium)
- Evaluate the performance of the WebWand in real-world scenarios
- Add support for more complex & cross-tab workflows
- Add support for more AI Models
- Add support for more browsing behaviors (select from dropdown, extract text etc.)
- Add support for saving workflows
- Add support for sharing workflows & knowledge with others
- Create wikipedia-like knowledge base where users can work together to create knowledge that can improve the WebWand's performance
Check out our Troubleshooting Guide for help with common problems.
Interested in contributing to WebWand? We'd love your help! Check out our Contribution Guide for guidelines on how to contribute, report bugs, suggest enhancements, and more.
We also encourage everyone in the community to add new knowledge to the "Prior Knowledge Augmentation" system to make WebWand even smarter. For detailed instructions on what kind of knowledge we're looking for and how to test and submit it, please see our Contributing Knowledge Guide.
- WebWand's image annotation method was inspired by Microsoft's UFO paper.
- Web Agent as a tool that lives in the browser sidepanel was inspired by TaxyAI's browser extension. We also used some of its UI code.
- The Chrome extension set-up leveraged an awesome boilerplate project Jonghakseo/chrome-extension-boilerplate-react-vite.
- The Fuji logo is from Toss Face Emoji design set.