<div id="colab_button\">
    <h1>LaVague: Quick-tour guide</h1>
    <a target="_blank\" href="https://colab.research.google.com/github/lavague-ai/lavague/blob/main/docs/docs/get-started/quick-tour.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
    </div>

## Introduction

LaVague is an open-source framework allowing users to leverage AI to turn natural language instructions into executable code to automate UI actions, such as filling in a form, etc.

In this quick tour, we are going to show you step-by-step how can you can set-up and use LaVague to perform a few example actions on webpages. We will create and launch a Gradio demo at the end of the notebook where you can test out using LaVague interactively.

> Pre-requisites: Note, if you are running the notebook locally, you will need python (test on python>=3.8) and pip installed.

> Note, this notebook uses remote inference with the HuggingFace or OpenAI API. For other LLM integration, such as local inference or Azure OpenAI, you can see scripts [examples](https://github.com/lavague-ai/LaVague/tree/main/examples) folder.

However, you will still need to install the necessary webdriver for Selenium - instructions to do so are detailed in the following step.

## Installation

### Installing driver for Selenium

In this example, we will generate code using [Selenium](https://www.selenium.dev/) to perform user interface actions.

Selenium requires a driver to interface with the chosen browser (Chrome, Firefox, etc.)

We therefore first need to download the Chrome driver.

⚠️ For instructions on how to install a driver on a different OS, [see the Selenium documentation](https://selenium-python.readthedocs.io/installation.html#drivers)

> Note that while we use Selenium for this example. We hope to integrate different automation tools such as Playwright at a later date.

In [None]:
# If you are missing any apt packages uncomment and run this command first:
# !sudo apt update

!sudo apt install -y ca-certificates fonts-liberation unzip \
libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \
libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 \
libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 \
libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 \
libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

In [None]:
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip
!unzip chrome-linux64.zip
!unzip chromedriver-linux64.zip
!rm chrome-linux64.zip chromedriver-linux64.zip

### Installing LaVague

We now need to download the LaVague PyPi package, which contains the `ActionEngine` module dedicated to handling all the key AI operations and the `CommandCenter` module, which orchestrates the whole workflow.

In [None]:
!pip install lavague

## Running LaVague

First to use LaVague we will need to prepare the LLM used for Selenium action generation.

### LLM backend setup

LaVague requires an LLM compatible with [LLamaIndex](https://docs.llamaindex.ai/en/stable/index.html)'s interface for [LLM](https://docs.llamaindex.ai/en/stable/api_reference/llms.html#ref-llms) API. We will see in this section how to set one to be compatible with LaVague.

For this quick tour we will choose a managed LLM API, but local models can be used as well. Here we will present two options:
- OpenAI API
- Hugging Face Inference API

#### OpenAI API

To use OpenAI, it is relatively straightforward:

In [None]:
# You might need to run this command
# !pip install llama-index-llms-openai

In [None]:
from llama_index.llms.openai import OpenAI

max_new_tokens = 512
# If you want to define the api_key manually
# api_key = YOUR API KEY
# llm = OpenAI(api_key=api_key, max_tokens=max_new_tokens)

llm = OpenAI(max_tokens=max_new_tokens)

#### Hugging Face Inference API

For remote inference with the Hugging Face Inference API, you will need to provide a HuggingFace user access token with `read` access in the code block below.

Here we will use [NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), one of the best open-source models currently available.

> If you don't have a HuggingFace user access token, you can get one for free by creating a HuggingFace account and following the instructions [here](https://huggingface.co/docs/hub/en/security-tokens).

In [None]:
from llama_index.llms.huggingface import HuggingFaceInferenceAPI
import os

model = "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"
token = os.environ["HF_TOKEN"] # ADD YOUR HF TOKEN HERE
max_new_tokens = 512

llm = HuggingFaceInferenceAPI(model_name=model, token=token, num_output=max_new_tokens)

### Preparing the CommandCenter

Now we are ready to initialize our `CommandCenter` class with the following arguments:

- An instance of `ActionEngine` with a LlamaIndex LLM, embedding model and prompt template. For this example, we will use the default HuggingFace API `LLM` (Nous-Hermes-2-Mixtral-8x7B-DPO) supplied with our HF token, the default `embedding` (bge-small-en-v1.5) and the default prompt template.
- The path to our chrome-linux64/Chrome folder
- The path to our chromedriver-linux64/chromedriver folder

In [None]:
from lavague import ActionEngine, CommandCenter
from lavague.defaults import DefaultEmbedder

action_engine = ActionEngine(
    llm=llm,
    embedding=DefaultEmbedder(),
)

commandCenter = CommandCenter(
    action_engine,
    chromePath="chrome-linux64/chrome",
    chromedriverPath="chromedriver-linux64/chromedriver",
)

### Launching LaVague

We are now ready to launch an interactive Gradio demo which will allow us to execute natural language instructions on a site of our choice.

To do this, we use the `commandCenter.run()` method, passing it the URL of the website we wish to perform actions on and three default instructions which will appear in the interactive Gradio page this will generate.

In [None]:
commandCenter.run(
    "https://huggingface.co",
    [
        "Click on the Datasets item on the menu, between Models and Spaces",
        "Click on the search bar 'Filter by name', type 'The Stack', and press 'Enter'",
        "Scroll by 500 pixels",
    ],
)

### How it works

⚠️ You will need to interact with the generated Gradio demo to perform automated actions. 

First, you should by click in the URL textbox and press enter. Then, you should select your chosen default natural language instruction or write your own, and again click within the instruction textbox and press enter.

At this point Selenium code in Python is generated by our LLM, which is then executed to perform the desired action on the website.

The action will then be visibly executed in the visual interface and you can also check out the code LaVague executed to perform this action on the right-hand side of the Gradio page.

> Note you can open the Gradio interface in your browser using the URL displayed in the cell output below.


That brings us to the end of this quick-tour. If you have any questions, join us on the LaVague Discord [here](https://discord.com/invite/SDxn9KpqX9).