Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github		.github
.husky		.husky
example_results		example_results
media		media
public		public
src		src
test-utils		test-utils
utils		utils
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTING_KNOWLEDGE.md		CONTRIBUTING_KNOWLEDGE.md
LICENSE		LICENSE
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
benchmark.py		benchmark.py
commitlint.config.js		commitlint.config.js
jest.config.js		jest.config.js
manifest.js		manifest.js
package.json		package.json
package.lib.json		package.lib.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
process_log.py		process_log.py
rollup.lib.config.js		rollup.lib.config.js
tailwind.config.js		tailwind.config.js
tasks_test.jsonl		tasks_test.jsonl
tasks_test_example.jsonl		tasks_test_example.jsonl
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
webwand_test_log_example.txt		webwand_test_log_example.txt

Repository files navigation

WebWand: Vision based Full Browser Automation 🪄

Demo

WebWand is a tool that redefines web interaction, making complex online tasks as simple as uttering a single command.

Crafted on top of the innovative fusion of multi-modal Large Language Models (GPT-4V), WebWand embodies a sophisticated Web AI Partner. Imagine having an intelligent companion that not only grasps your intent but also possesses a broad awareness of website content, enabling it to autonomously execute tasks on your behalf and augment your workflow. With WebWand, this vision becomes reality.

How does it work?

WebWand leverages the power of multi-modal Large Language Model, DOM state awareness, and semantic understanding of HTML to focus on essential webpage elements while filtering out noise.

Here is an example of WebWand annotating the website to better understand the environment.

It also features a unique "Prior Knowledge Augmentation" system that allows WebWand to navigate websites with the wisdom of collective past experiences, crowdsourced from WebWand users.

We do NOT collect your screenshots, browsing information or your prompts. It lives in your browser and is directly sent to the LLM API of your choice.

Installing and Running

Download and Install the extension in your browser

Go to the releases page, find the latest version of the extension and download "webwand.zip".
Unzip the file.
Load your extension on Chrome by doing the following:
1. Navigate to chrome://extensions/
2. Toggle Developer mode
3. Click on Load unpacked extension
4. Select the unzipped folder

Use the extension

Please note that you might need to refresh the page for the extension to work.

Find the WebWand icon in the top right corner of your browser and click on it to open the sidepanel.
The next thing you need to do is create or access an existing OpenAI API Key and paste it in the provided box. This key will be stored securely in your browser, and will not be uploaded to a third party.
Finally, navigate to a webpage you want WebWand and type in the task you want it to perform.

Build the extension

If you want to build the extension from source, follow these instructions:

Ensure you have Node.js. The development was done on Node v20 but it should work with some lower versions.
Clone this repository
Install pnpm globally: npm install -g pnpm
Run pnpm install
Run pnpm dev to start the development server, or pnpm build to build the extension.

When loading the extension, you will need to load the dist folder created by the build process.

Roadmap

Expose API for easy integration with browser automation frameworks (e.g. Puppeteer, Playwright, Selenium)
Evaluate the performance of the WebWand in real-world scenarios
Add support for more complex & cross-tab workflows
Add support for more AI Models
Add support for more browsing behaviors (select from dropdown, extract text etc.)
Add support for saving workflows
Add support for sharing workflows & knowledge with others
Create wikipedia-like knowledge base where users can work together to create knowledge that can improve the WebWand's performance

Troubleshooting

Check out our Troubleshooting Guide for help with common problems.

Contributing

Interested in contributing to WebWand? We'd love your help! Check out our Contribution Guide for guidelines on how to contribute, report bugs, suggest enhancements, and more.

We also encourage everyone in the community to add new knowledge to the "Prior Knowledge Augmentation" system to make WebWand even smarter. For detailed instructions on what kind of knowledge we're looking for and how to test and submit it, please see our Contributing Knowledge Guide.

Credits

WebWand's image annotation method was inspired by Microsoft's UFO paper.
Web Agent as a tool that lives in the browser sidepanel was inspired by TaxyAI's browser extension. We also used some of its UI code.
The Chrome extension set-up leveraged an awesome boilerplate project Jonghakseo/chrome-extension-boilerplate-react-vite.
The Fuji logo is from Toss Face Emoji design set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebWand: Vision based Full Browser Automation 🪄

How does it work?

Installing and Running

Download and Install the extension in your browser

Use the extension

Build the extension

Roadmap

Troubleshooting

Contributing

Credits

About

Releases 13

Contributors 4

Languages

License

normal-computing/fuji-web

Folders and files

Latest commit

History

Repository files navigation

WebWand: Vision based Full Browser Automation 🪄

How does it work?

Installing and Running

Download and Install the extension in your browser

Use the extension

Build the extension

Roadmap

Troubleshooting

Contributing

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 13

Contributors 4

Languages