# LaVague

## Installing pre-requisites

### Selenium headless Chrome driver

We will pilot a Selenium browser and have to install the headless version to be run in this notebook

In [None]:
#!/usr/bin/bash

!echo "Update the repository and any packages..."
!sudo apt update && sudo apt upgrade -y

!echo "Install prerequisite system packages..."
!sudo apt install wget curl unzip jq -y

# Set metadata for Google Chrome repository...
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip

!echo "Install Chrome dependencies..."
!sudo apt install ca-certificates fonts-liberation \
    libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \
    libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 \
    libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 \
    libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
    libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 \
    libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils -y

!echo "Unzip the binary file..."
!unzip chrome-linux64.zip

!echo "Downloading latest Chromedriver..."
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip

!echo "Unzip the binary file and make it executable..."
!unzip chromedriver-linux64.zip

!echo "Install Selenium..."
!python3 -m pip install selenium

!echo "Removing archive files"
!rm chrome-linux64.zip  chromedriver-linux64.zip

Update the repository and any packages...
Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Hit:6 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Get:10 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [1,951 kB]
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:12 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,349 kB]
Get:13 http:/

### Requirements

In [None]:
!wget https://raw.githubusercontent.com/dhuynh95/LaVague/main/requirements.txt
!pip install -r requirements.txt

import locale
locale.getpreferredencoding = lambda: "UTF-8"

--2024-03-10 12:15:26--  https://raw.githubusercontent.com/dhuynh95/LaVague/main/requirements.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 190 [text/plain]
Saving to: ‘requirements.txt’


2024-03-10 12:15:27 (4.26 MB/s) - ‘requirements.txt’ saved [190/190]

Collecting llama_index (from -r requirements.txt (line 1))
  Downloading llama_index-0.10.18-py3-none-any.whl (5.6 kB)
Collecting llama-index-embeddings-huggingface (from -r requirements.txt (line 2))
  Downloading llama_index_embeddings_huggingface-0.1.4-py3-none-any.whl (7.7 kB)
Collecting llama-index-llms-huggingface (from -r requirements.txt (line 3))
  Downloading llama_index_llms_huggingface-0.1.3-py3-none-any.whl (7.2 kB)
Collecting llama-index-retrievers-bm25 (from -r requirements.txt (line

## Choosing inference engine

LaVague works with two kind of inference:
- Local
- API with Hugging Face Inference API

Local model allows full control over the experience, but might be slower to setup.
Starting with Hugging Face Inference API is good for a quick start but lacks flexibility and control.

Both options work, but in this Colab notebook, it might take longer to run with local model as weights download can take a while.

### Hugging Face Inference API

To have a fast and low-cost experience, we will use [Hugging Face Inference for PRO users](https://huggingface.co/blog/inference-pro).
You can swap the call to the ``Nous-Hermes-2-Mixtral-8x7B-DPO``.

You will need a Hugging Face Hub Token to use the ``Nous-Hermes-2-Mixtral-8x7B-DPO`` model from the Inference API. You can get one by signing up on the [Hugging Face Hub](https://huggingface.co/join).

If you prefer using a local model, skip to the next section [Local model](#Local-model).

In [None]:
from llama_index.llms.huggingface import HuggingFaceInferenceAPI

try:
  from google.colab import userdata
  HF_TOKEN = userdata.get('HF_TOKEN')
except:
  import os
  HF_TOKEN = os.environ["HF_TOKEN"]

if not HF_TOKEN:
  from getpass import getpass
  HF_TOKEN = getpass('Enter your HF Token (https://huggingface.co/docs/hub/en/security-tokens): ')

model_id = "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"
max_new_tokens = 512

# Monkey patch because stream_complete is not implemented in the current version of llama_index
from llama_index.core.base.llms.types import (
    CompletionResponse,
)
def stream_complete(
    self, prompt: str, formatted: bool = False, **kwargs
):
  def gen():
    text = ""
    for x in self._sync_client.text_generation(
            prompt, **{**{"max_new_tokens": self.num_output, "stream": True}, **kwargs}
        ):
      text += x
      yield CompletionResponse(text=text, delta=x)
  return gen()

HuggingFaceInferenceAPI.stream_complete = stream_complete

llm = HuggingFaceInferenceAPI(model_name=model_id, token=HF_TOKEN, num_output=max_new_tokens)

### Local model

Here we will use the latest ``HuggingFaceH4/zephyr-7b-gemma-v0.1`` to do local inference.

In [None]:
!pip install accelerate



In [None]:
!pip install -i https://pypi.org/simple/ bitsandbytes

Looking in indexes: https://pypi.org/simple/
Collecting bitsandbytes
  Downloading bitsandbytes-0.43.0-py3-none-manylinux_2_24_x86_64.whl (102.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.43.0


In [None]:
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "HuggingFaceH4/zephyr-7b-gemma-v0.1"

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config)

# We will stop generation as soon as the model outputs the end of Markdown to make inference faster
stop_token_id = [tokenizer.convert_tokens_to_ids("```"), tokenizer.convert_tokens_to_ids("``")]
llm = HuggingFaceLLM(model=model, tokenizer=tokenizer, max_new_tokens=1024, stopping_ids=stop_token_id)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

## Setting up embedding model and prompt template

We will use a ``bge-small-en-v1.5`` to perform semantic search.

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = "BAAI/bge-small-en-v1.5"
embedder = HuggingFaceEmbedding(model_name=embed_model, device="cuda")

We will use a specific prompt template that leverages Few shot learning with Chain of Thought to ensure the model performs correctly for our use case of Selenium code generation.

You can have a look at the template [here](https://github.com/dhuynh95/LaVague/blob/main/prompt_template.txt).

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!wget https://raw.githubusercontent.com/dhuynh95/LaVague/main/prompt_template.txt

with open("prompt_template.txt", "r") as file:
  PROMPT_TEMPLATE_STR = file.read()

--2024-03-10 13:18:31--  https://raw.githubusercontent.com/dhuynh95/LaVague/main/prompt_template.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5493 (5.4K) [text/plain]
Saving to: ‘prompt_template.txt.4’


2024-03-10 13:18:31 (61.6 MB/s) - ‘prompt_template.txt.4’ saved [5493/5493]



## Preparing the action engine

Here we will use llama_index to create an action engine that will be able to execute the instructions we will give it.

It will index the HTML of the current page to retrieve the most relevant information to feed the LLM to then generate Selenium code to execute.

In [None]:
from llama_index.core import Document
from llama_index.core.node_parser import CodeSplitter
from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core import VectorStoreIndex
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer
from llama_index.core import PromptTemplate

MAX_CHARS = 1500
K = 3

class ActionEngine:
    def __init__(self, llm, embedding):
        self.llm = llm
        self.embedding = embedding

    def _get_index(self, html):
        text_list = [html]
        documents = [Document(text=t) for t in text_list]

        splitter = CodeSplitter(
            language="html",
            chunk_lines=40,  # lines per chunk
            chunk_lines_overlap=200,  # lines overlap between chunks
            max_chars=MAX_CHARS,  # max chars per chunk
        )
        nodes = splitter.get_nodes_from_documents(documents)
        nodes = [node for node in nodes if node.text]

        index = VectorStoreIndex(nodes, embed_model=self.embedding)

        return index

    def get_query_engine(self, state):
        html = state
        index = self._get_index(html)

        retriever = BM25Retriever.from_defaults(
            index=index,
            similarity_top_k=K,
        )

        response_synthesizer = get_response_synthesizer(streaming=True, llm=self.llm)

        # assemble query engine
        query_engine = RetrieverQueryEngine(
            retriever=retriever,
            response_synthesizer=response_synthesizer,
        )

        prompt_template = PromptTemplate(PROMPT_TEMPLATE_STR)

        query_engine.update_prompts(
            {"response_synthesizer:text_qa_template": prompt_template}
        )

        return query_engine

## Code execution in action

In [None]:
!pip install gradio

Collecting gradio
  Downloading gradio-4.21.0-py3-none-any.whl (17.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.0/17.0 MB[0m [31m50.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.3.2.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client==0.12.0 (from gradio)
  Downloading gradio_client-0.12.0-py3-none-any.whl (310 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.7/310.7 kB[0m [31m36.7 MB/s[0m eta [36m0:00:00[0m
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting python-multipart>=0.0.9 (from gradio)
  Downloading python_multipart-0.0.9-py3-none-any.whl (22 kB)
Collecting ruff>=0.2.2 (from gradio)
  Downloading ruff-0.3.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB)
[2K     [90m━━━

In [None]:
action_engine = ActionEngine(llm, embedder)

In [None]:
import gradio as gr
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys

MAX_CHARS = 1500

action_engine = ActionEngine(llm, embedder)

## Setup chrome options
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1600,900")

# Set path to chrome/chromedriver as per your configuration

try:
    import google.colab
    chrome_options.binary_location = "/content/chrome-linux64/chrome"
    webdriver_service = Service("/content/chromedriver-linux64/chromedriver")
except:
    import os.path
    homedir = os.path.expanduser("~")
    chrome_options.binary_location = f"{homedir}/chrome-linux64/chrome"
    webdriver_service = Service(f"{homedir}/chromedriver-linux64/chromedriver")


title = """
<div align="center">
  <h1>🌊 Welcome to LaVague</h1>
  <p>Redefining internet surfing by transforming natural language instructions into seamless browser interactions.</p>
</div>
"""

# Choose Chrome Browser
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# action_engine = ActionEngine(llm, embedder)

def process_url(url):
    driver.get(url)
    driver.save_screenshot("screenshot.png")
    # This function is supposed to fetch and return the image from the URL.
    # Placeholder function: replace with actual image fetching logic.
    return "screenshot.png"

def process_instruction(query):
    state = driver.page_source
    query_engine = action_engine.get_query_engine(state)
    streaming_response = query_engine.query(query)

    source_nodes = streaming_response.get_formatted_sources(MAX_CHARS)

    response = ""

    for text in streaming_response.response_gen:
    # do something with text as they arrive.
        response += text
        yield response, source_nodes

def exec_code(code):
    code = code.split("```")[0]
    try:
        exec(code)
        return "Successful code execution", code
    except Exception as e:
        output = f"Error in code execution: {str(e)}"
        return output, code

def update_image_display(img):
    driver.save_screenshot("screenshot.png")
    url = driver.current_url
    return "screenshot.png", url

def create_demo(base_url, instructions):
  with gr.Blocks() as demo:
      with gr.Row():
          gr.HTML(title)
      with gr.Row():
          url_input = gr.Textbox(value=base_url, label="Enter URL and press 'Enter' to load the page.")

      with gr.Row():
          with gr.Column(scale=8):
              image_display = gr.Image(label="Browser", interactive=False)

          with gr.Column(scale=2):
              text_area = gr.Textbox(label="Instructions")
              gr.Examples(examples=instructions, inputs=text_area,

              )
              generate_btn = gr.Button(value="Execute")
              code_display = gr.Code(label="Generated code", language="python",
                                    lines=5, interactive=False)
              with gr.Accordion(label="Logs", open=False) as log_accordion:
                  log_display = gr.Textbox(interactive=False)
                  source_display = gr.Textbox(label="Retrieved nodes", interactive=False)
      # Linking components
      url_input.submit(process_url, inputs=url_input, outputs=image_display)
      generate_btn.click(process_instruction, inputs=text_area, outputs=[code_display, source_display]).then(
          exec_code, inputs=code_display, outputs=[log_display, code_display]
      ).then(
          update_image_display, inputs=image_display, outputs=[image_display, url_input]
      )
  demo.launch(share=True)


In [None]:
base_url = "https://huggingface.co/"

instructions = ["Click on the Datasets item on the menu, between Models and Spaces",
                "Click on the search bar 'Filter by name', type 'The Stack', and press 'Enter'",
                "Scroll by 500 pixels",]

create_demo(base_url, instructions)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://4b591cac74e73f8c34.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


In [None]:
base_url = "https://www.irs.gov/"

instructions = ["Click on the 'Pay' item on the menu, between 'File' and 'Refunds'",
                "Click on 'Pay Now with Direct Pay' just below 'Pay from your Bank Account'",
                "Click on 'Make a Payment', just above 'Answers to common questions'",]

create_demo(base_url, instructions)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://1de7fa38a8df824289.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
