<div id="colab_button\">
    <h1>LaVague: Quick-tour guide</h1>
    <a target="_blank\" href="https://colab.research.google.com/github/lavague-ai/lavague/blob/command-center-module/docs/docs/get-started/quick-tour.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
    </div>

## Introduction

LaVague is an open-source framework allowing users to leverage AI to turn natural language instructions into executable code to automate UI actions, such as filling in a form, etc.

In this quick tour, we are going to show you step-by-step how can you can set-up and use LaVague to perform a few example actions on webpages. We will create and launch a Gradio demo at the end of the notebook where you can test out using LaVague interactively.

> Pre-requisites: Note, if you are running the notebook locally, you will need python (test on python>=3.8) and pip installed.

> Note, this notebook uses remote inference with the HuggingFace API. For local inference, see the [local quick-tour](./local-quick-tour.ipynb) (coming soon).

## Initial set-up

### Installing driver for Selenium

In this example, we will generate code using [Selenium](https://www.selenium.dev/) to perform user interface actions.

Selenium requires a driver to interface with the chosen browser (Chrome, Firefox, etc.)

We therefore first need to download the Chrome driver.

⚠️ For instructions on how to install a driver for a different browser or instructions for downloading drivers on a different OS, [see the Selenium documentation](https://selenium-python.readthedocs.io/installation.html#drivers)

> Note that while we use Selenium for this example. It is possible to achieve the same results using a different automation tool such as Playwright.

In [None]:
# If you are missing any apt packages uncomment and run this command first:
# !sudo apt update

!sudo apt install -y ca-certificates fonts-liberation unzip \
libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \
libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 \
libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 \
libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 \
libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

In [None]:
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip
!unzip chrome-linux64.zip
!unzip chromedriver-linux64.zip
!rm chrome-linux64.zip chromedriver-linux64.zip

### Installing LaVague's Action Engine

We provide a PyPi package for LaVague which contains the `ActionEngine` module dedicated to handling all the key AI operations behind the scenes. 

You can download the PyPi package with the following code:

In [None]:
!pip install lavague

### Installing other PyPi dependencies

We also need to install`Gradio`, which we will use to interact to quickly build an interactive example.

In [None]:
pip install gradio

### HuggingFace set-up

⚠️ For remote inference with Hugging Face inference api, you will need to provide your HuggingFace API token.

Alternatively, you can run the notebook entirely locally (the model will be downloaded and run locally instead of via an API) with our [local quick-tour](./local-quick-tour.ipynb) (coming soon).

> A HuggingFace API token enables you to interact with models hosted by HuggingFace. If you don't have one, you will need to create a HuggingFace account and create one as detailed [here](https://huggingface.co/docs/hub/en/security-tokens).


In [8]:
import os
os.environ["HF_TOKEN"] = ""

## Building our Gradio demo

### Initializing the webdriver

First of all, we configure our webdriver settings to suite our use case (creating a Gradio demo).

> - We use `headless` mode to turn of the webdriver GUI as we want to perform these tasks in the background since we will use Gradio for our visual display for this demo.
> - We turn off the Chrome `sandbox` security feature which restricts the browser's access to the system it's running on so that we can share this quick-tour as a Google Colab notebook.
> - We set the `window-size` for the screenshots we will later capture to show the user the before/after results of our demo queries.




In [9]:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver

## Setup chrome options
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1600,900")

Next, we set the path to the ChromeDriver and initialize it, passing it the config options we just defined.

> If you are running the notebook locally and change the location of the driver, you will need to update the path here.

In [None]:
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from lavague.ActionEngine import ActionEngine
from lavague.defaults import DefaultLocalLLM, DefaultLLM
from llama_index.llms.huggingface import HuggingFaceInferenceAPI
import re

# Set path to chrome/chromedriver as per your configuration
try:
    import google.colab
    chrome_options.binary_location = "/content/chrome-linux64/chrome"
    webdriver_service = Service("/content/chromedriver-linux64/chromedriver")
except:
    import os.path
    homedir = os.path.expanduser(".")
    print(homedir)
    chrome_options.binary_location = f"chrome-linux64/chrome"
    webdriver_service = Service(f"chromedriver-linux64/chromedriver")

# Initialize Chrome Browser driver
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

### Initializing ActionEngine with our chosen LLM

Next, we will initialize our ActionEngine which will perform all the key AI operations needed to generate the Selenium code to perform the desired action.

We will leverage the default `LLM` (Nous-Hermes-2-Mixtral-8x7B-DPO) and `embedding` (bge-small-en-v1.5) options. However, you can pass any LlamaIndex LLM or embedding model to the constructor at this point.

> To use our default local LLM, you can pass our `DefaultLocalLLM()`

In [44]:
from lavague.ActionEngine import ActionEngine
from lavague.defaults import DefaultLocalLLM, DefaultLLM

action_engine = ActionEngine()

### The demo code

The demo code includes several auxiliary functions which we will not cover in detail, but one key function to understand is the `process_instructions` function. 

This function takes the user natural language instructions and website URL and returns AI-generated Selenium Python code needed to perform the desired action on the website.

The function first gets the HTML source code for the website we wish to perform our web workflow on. 

Next, it calls the `get_query_engine()` method which breaks down the HTML document into smaller, more manageable chunks, indexes them and retrieves the most relevant chunks HTML of code.

Then, it calls the `query_engine.query()` method to perform inference. This Action Engine method inserts the user’s instructions and the most relevant pieces of the HTML source code into our constructed prompt template and uses this to query the LLM. 

It returns the LLM response, aka. the generated Selenium code to perform the user's desired action, as well as the source nodes (chunks of HTML source code) in generating this response.

In [45]:
MAX_CHARS = 1500

def process_instruction(instructions, url_input):
    if url_input != driver.current_url:
        driver.get(url_input)
    source_code = driver.page_source
    query_engine = action_engine.get_query_engine(source_code)
    response = query_engine.query(instructions)
    source_nodes = response.get_formatted_sources(MAX_CHARS)
    return response.response, source_nodes

The following `exec_code` method, will extract the Python Selenium code generated by the LLM, to ensure there is no other values first and then execute the code to perform the desired action.

In [46]:
def extract_first_python_code(markdown_text):
    # Pattern to match the first ```python ``` code block
    pattern = r"```python(.*?)```"
    
    # Using re.DOTALL to make '.' match also newlines
    match = re.search(pattern, markdown_text, re.DOTALL)
    if match:
        # Return the first matched group, which is the code inside the ```python ```
        return match.group(1).strip()
    else:
        # Return None if no match is found
        return None

def exec_code(code, source_nodes, full_code):
    code = extract_first_python_code(code)
    html = driver.page_source
    try:
        exec(code)
        output = "Successful code execution"
        status = """<p style="color: green; font-size: 20px; font-weight: bold;">Success!</p>"""
        full_code += code
    except Exception as e:
        output = f"Error in code execution: {str(e)}"
        status = """<p style="color: red; font-size: 20px; font-weight: bold;">Failure! Open the Debug tab for more information</p>"""
    return output, code, html, status, full_code

The following methods are used to get screenshots of the website which we are performing our automated action on to show before/after images.

In [47]:
import gradio as gr

def process_url(url):
    driver.get(url)
    driver.save_screenshot("screenshot.png")
    return "screenshot.png"

def update_image_display(img):
    driver.save_screenshot("screenshot.png")
    url = driver.current_url
    return "screenshot.png", url

def update_image_display(img):
    driver.save_screenshot("screenshot.png")
    url = driver.current_url
    return "screenshot.png", url

The rest of the code sets up and launches our Gradio demo. It sets up the visual elements of the Gradio demo and executes the above functions as per user interaction.

In [48]:
title = """
<div align="center">
  <h1>🌊 Welcome to LaVague</h1>
  <p>Redefining internet surfing by transforming natural language instructions into seamless browser interactions.</p>
</div>
"""

def show_processing_message():
    return "Processing..."

def create_demo(base_url, instructions):
  with gr.Blocks() as demo:
      with gr.Tab("LaVague"):
        with gr.Row():
            gr.HTML(title)
        with gr.Row():
            url_input = gr.Textbox(value=base_url, label="Enter URL and press 'Enter' to load the page.")
        
        with gr.Row():
            with gr.Column(scale=7):
                image_display = gr.Image(label="Browser", interactive=False)
            
            with gr.Column(scale=3):
                with gr.Accordion(label="Full code", open=False):
                    full_code = gr.Code(value="", language="python", interactive=False)
                code_display = gr.Code(label="Generated code", language="python",
                                        lines=5, interactive=True)
                
                status_html = gr.HTML()
        with gr.Row():
            with gr.Column(scale=8):
                text_area = gr.Textbox(label="Enter instructions and press 'Enter' to generate code.")
                gr.Examples(examples=instructions, inputs=text_area)
      with gr.Tab("Debug"):
        with gr.Row():
            with gr.Column():
                log_display = gr.Textbox(interactive=False, lines=20)
            with gr.Column():
                source_display = gr.Code(language="html", label="Retrieved nodes", interactive=False, lines=20)
        with gr.Row():
            with gr.Accordion(label="Full HTML", open=False):
                full_html = gr.Code(language="html", label="Full HTML", interactive=False, lines=20)
  
      # Linking components
      url_input.submit(process_url, inputs=url_input, outputs=image_display)
      text_area.submit(show_processing_message, outputs=[status_html]).then(
          process_instruction, inputs=[text_area, url_input], outputs=[code_display, source_display]
          ).then(
          exec_code, inputs=[code_display, source_display, full_code], 
          outputs=[log_display, code_display, full_html, status_html, full_code]
      ).then(
          update_image_display, inputs=image_display, outputs=[image_display, url_input]
      )
  demo.launch(share=True, debug=True)

You can now try the demo where we use natural language instructions to automate an action on the Hugging Face website.

⚠️ You will need to interact with these examples, by clicking on the URL and pressing enter, and then selecting your chosen natural language instruction in the Gradio interface, and again clicking on the chosen instruction and pressing enter. The action should then be visibly executed in the visual interface.

> Note you can open the Gradio interface in your browser using the URL displayed in the cell output below.

In [None]:
base_url = "https://huggingface.co/"

instructions = ["Click on the Datasets item on the menu, between Models and Spaces",
                "Click on the search bar 'Filter by name', type 'The Stack', and press 'Enter'",
                "Scroll by 500 pixels",]

create_demo(base_url, instructions)

Below you can explore a second example with the IRS website.

In [None]:
base_url = "https://www.irs.gov/"

instructions = ["Click on the 'Pay' item on the menu, between 'File' and 'Refunds'",
                "Click on 'Pay Now with Direct Pay' just below 'Pay from your Bank Account'",
                "Click on 'Make a Payment', just above 'Answers to common questions'",]

create_demo(base_url, instructions)

That brings us to the end of this quick-tour. If you have any questions, join us on the LaVague Discord [here](https://discord.com/invite/SDxn9KpqX9).