Skip to content

ahadjawaid/webagent

Repository files navigation

Vimium is a Chrome extension that lets you navigate the web with only your keyboard.

Setup

Create environment with conda

conda create --name webagent

or

Create environment with venv

python3 -m venv venv

Install Python requirements

pip install -r requirements.txt

Download Vimium locally (have to load the extension manually when running Playwright)

./setup.sh

Other Setup stuff

Huggingface access token to access huggingface pretrained models

(Hugging Face guide)[https://huggingface.co/docs/transformers.js/guides/private] https://huggingface.co/settings/tokens

Download pytorch

Environment variables

This is for using GPT Vision API and TTS API (so if we don't use that anymore remove it)

OPENAI_API_KEY=

For pyaudio you may need to install portaudio

brew install portaudio

To use the LLaVa model

  • Clone llama.cpp and run make in the root directory.
  • Download the llava vision model. I'm using the quantized 4 bit version of the 7B param model. Download it here. I saved the ggml-model-q4_k.gguf file and the mmproj-model-f16.gguf file in a models/llava folder in the llama.cpp directory.
  • Run the llama.cpp server ./server -m models/llava/ggml-model-q4_k.gguf --mmproj models/llava/mmproj-model-f16.gguf
  • Now you can save a screenshot.png file in the root directory of this repo and run python3 llava.py to check it's working.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages