# LaVague

## Choosing an inference engine

LaVague works with two kind of inference:
- Local
- API with Hugging Face Inference API

Local model allows full control over the experience, but might be slower to setup.
Starting with Hugging Face Inference API is good for a quick start but lacks flexibility and control.

Both options work, but in this Colab notebook, it might take longer to run with local model as weights download can take a while.

### Hugging Face Inference API

To have a fast and low-cost experience, we will use [Hugging Face Inference for PRO users](https://huggingface.co/blog/inference-pro).

You will need a Hugging Face Hub Token to use the ``Nous-Hermes-2-Mixtral-8x7B-DPO`` model from the Inference API. You can get one by signing up on the [Hugging Face Hub](https://huggingface.co/join).

If you prefer using a local model, you can provide Action engine with a DefaultLocalLLM, or import the huggingface model of your choice. The default local model is ``HuggingFaceH4/zephyr-7b-gemma-v0.1``.

We will use a ``bge-small-en-v1.5`` to perform semantic search, but you can provide the embedder of your choice.

We use a specific prompt template that leverages Few shot learning with Chain of Thought to ensure the model performs correctly for our use case of Selenium code generation.

You can have a look at the template [here](https://github.com/dhuynh95/LaVague/blob/main/prompt_template.txt).

# Set up

In [4]:
# !apt install ca-certificates fonts-liberation unzip \
#     libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \
#     libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 \
#     libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 \
#     libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
#     libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 \
#     libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip
!unzip chrome-linux64.zip
!unzip chromedriver-linux64.zip
!rm chrome-linux64.zip chromedriver-linux64.zip

--2024-03-14 16:51:57--  https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.75.251, 142.250.179.123, 216.58.213.91, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.75.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149157879 (142M) [application/zip]
Saving to: ‘chrome-linux64.zip’


2024-03-14 16:52:19 (6.39 MB/s) - ‘chrome-linux64.zip’ saved [149157879/149157879]

--2024-03-14 16:52:19--  https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.201.187, 172.217.20.187, 142.250.179.123, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.201.187|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8597995 (8.2M) [application/zip]
Saving to: ‘chromedriver-linux6

In [None]:
!pip install lavague.[dev]

## Code execution in action

In [9]:
import gradio as gr
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from lavague.ActionEngine import ActionEngine
from lavague.defaults import DefaultLocalLLM, DefaultLLM
from llama_index.llms.huggingface import HuggingFaceInferenceAPI

MAX_CHARS = 1500

# Use this action_engine instead to have a local inference
# action_engine = ActionEngine(llm=DefaultLocalLLM())

action_engine = ActionEngine()

## Setup chrome options
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1600,900")

# Set path to chrome/chromedriver as per your configuration

chrome_options.binary_location = f"chrome-linux64/chrome"
webdriver_service = Service(f"chromedriver-linux64/chromedriver")


title = """
<div align="center">
  <h1>🌊 Welcome to LaVague</h1>
  <p>Redefining internet surfing by transforming natural language instructions into seamless browser interactions.</p>
</div>
"""

# Choose Chrome Browser
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# action_engine = ActionEngine(llm, embedder)

def process_url(url):
    driver.get(url)
    driver.save_screenshot("screenshot.png")
    # This function is supposed to fetch and return the image from the URL.
    # Placeholder function: replace with actual image fetching logic.
    return "screenshot.png"

def process_instruction(query):
    state = driver.page_source
    query_engine = action_engine.get_query_engine(state)
    streaming_response = query_engine.query(query)

    source_nodes = streaming_response.get_formatted_sources(MAX_CHARS)

    response = ""

    for text in streaming_response.response_gen:
    # do something with text as they arrive.
        response += text
        yield response, source_nodes

def exec_code(code):
    code = code.split("```")[0]
    try:
        exec(code)
        return "Successful code execution", code
    except Exception as e:
        output = f"Error in code execution: {str(e)}"
        return output, code

def update_image_display(img):
    driver.save_screenshot("screenshot.png")
    url = driver.current_url
    return "screenshot.png", url

def create_demo(base_url, instructions):
  with gr.Blocks() as demo:
      with gr.Row():
          gr.HTML(title)
      with gr.Row():
          url_input = gr.Textbox(value=base_url, label="Enter URL and press 'Enter' to load the page.")

      with gr.Row():
          with gr.Column(scale=8):
              image_display = gr.Image(label="Browser", interactive=False)

          with gr.Column(scale=2):
              text_area = gr.Textbox(label="Instructions")
              gr.Examples(examples=instructions, inputs=text_area,

              )
              generate_btn = gr.Button(value="Execute")
              code_display = gr.Code(label="Generated code", language="python",
                                    lines=5, interactive=False)
              with gr.Accordion(label="Logs", open=False) as log_accordion:
                  log_display = gr.Textbox(interactive=False)
                  source_display = gr.Textbox(label="Retrieved nodes", interactive=False)
      # Linking components
      url_input.submit(process_url, inputs=url_input, outputs=image_display)
      generate_btn.click(process_instruction, inputs=text_area, outputs=[code_display, source_display]).then(
          exec_code, inputs=code_display, outputs=[log_display, code_display]
      ).then(
          update_image_display, inputs=image_display, outputs=[image_display, url_input]
      )
  demo.launch(share=True)

In [10]:
base_url = "https://huggingface.co/"

instructions = ["Click on the Datasets item on the menu, between Models and Spaces",
                "Click on the search bar 'Filter by name', type 'The Stack', and press 'Enter'",
                "Scroll by 500 pixels",]

create_demo(base_url, instructions)

TypeError: Object of type Textbox is not JSON serializable

In [8]:
base_url = "https://www.irs.gov/"

instructions = ["Click on the 'Pay' item on the menu, between 'File' and 'Refunds'",
                "Click on 'Pay Now with Direct Pay' just below 'Pay from your Bank Account'",
                "Click on 'Make a Payment', just above 'Answers to common questions'",]

create_demo(base_url, instructions)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://1de7fa38a8df824289.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
