<div id="colab_button\">
    <h1>LaVague: Local inference Quick-tour</h1>
    <a target="_blank\" href="https://colab.research.google.com/github/lavague-ai/lavague/blob/main/docs/docs/get-started/local-quick-tour.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
    </div>

## Introduction

LaVague is an open-source framework allowing users to leverage AI to turn natural language instructions into executable code to automate UI actions, such as filling in a form, etc.

In this quick tour, we are going to show you step-by-step how can you can set-up and use LaVague to perform a few example actions on webpages. We will create and launch a Gradio demo at the end of the notebook where you can test out using LaVague interactively.

> Pre-requisites: Note, if you are running the notebook locally, you will need python (test on python>=3.8) and pip installed.

> If you prefer to run LaVague as a Python script, you can do so by executing the `local_gemma.py` script in the `gradio_demos` folder. However, you will still need to install the necessary webdriver for Selenium - instructions to do so are detailed in the following step.

### Local vs Remote inference

LaVague is compatible with two modes of inference: remote or local. Users can choose whichever best suits their needs.

With local inference, the LLM used by LaVague and inference is downloaded and queried locally on the user's machine; whereas with remote inference, we can query an LLM hosted by a third-party (such as using the Hugging Face Inference API as we do in the [quick tour](./quick-tour.ipynb)).

Note that local inference can result in a longer wait time when using LaVague for the first time due to the time required to download the LLM. You also will need sufficient space on your device to download and run the model.

If you prefer to perform remote inference, see the [quick tour](./quick-tour.ipynb).


## Initial set-up

### Installing driver for Selenium

LaVague will generate code using [Selenium](https://www.selenium.dev/) to perform user interface actions.

Selenium requires a driver to interface with the chosen browser (Chrome, Firefox, etc.)

We therefore first need to download the Chrome driver.

⚠️ For instructions on how to install a driver on a different OS, [see the Selenium documentation](https://selenium-python.readthedocs.io/installation.html#drivers)

> Note that while we use Selenium for this example. We hope to integrate different automation tools such as Playwright at a later date.

In [None]:
# If you are missing any apt packages uncomment and run this command first:
# !sudo apt update

!sudo apt install -y ca-certificates fonts-liberation unzip \
libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \
libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 \
libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 \
libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 \
libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

In [None]:
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip
!unzip chrome-linux64.zip
!unzip chromedriver-linux64.zip
!rm chrome-linux64.zip chromedriver-linux64.zip

### Installing LaVague

We now need to download the LaVague PyPi package, which contains the `ActionEngine` module dedicated to handling all the key AI operations and the `CommandCenter` module, which orchestrates the whole workflow.

In [None]:
!pip install lavague

## Running LaVague

### Initial config

Now we are ready to initialize our `CommandCenter` class.

We will pass the class three key values:
- An instance of `ActionEngine` with a LlamaIndex LLM and embedding model. For this example, we will select the default `local LLM` (zephyr-7b-gemma-v0.1) and default `embedding` (bge-small-en-v1.5)
- The path to our chrome-linux64/Chrome folder
- The path to our chromedriver-linux64/chromedriver folder

In [9]:
from lavague import ActionEngine, CommandCenter
from lavague.defaults import DefaultLocalLLM, DefaultEmbedder

commandCenter = CommandCenter(
    ActionEngine(DefaultLocalLLM(), DefaultEmbedder()),
    chromePath="chrome-linux64/chrome",
    chromedriverPath="chromedriver-linux64/chromedriver",
)

### Launching LaVague

We are now ready to launch an interactive Gradio demo which will allow us to execute natural language instructions on a site of our choice.

To do this, we use the `commandCenter.run()` method, passing it the URL of the website we wish to perform actions on and three default instructions which will appear in the interactive Gradio page this will generate.

In [None]:
commandCenter.run(
    "https://huggingface.co",
    [
        "Click on the Datasets item on the menu, between Models and Spaces",
        "Click on the search bar 'Filter by name', type 'The Stack', and press 'Enter'",
        "Scroll by 500 pixels",
    ],
)

⚠️ You will need to interact with the generated Gradio demo to perform automated actions. 

First, you should by click in the URL textbox and press enter. Then, you should select your chosen default natural language instruction or write your own, and again click within the instruction textbox and press enter.

At this point Selenium code in Python is generated by our LLM, which is then executed to perform the desired action on the website.

The action will then be visibly executed in the visual interface and you can also check out the code LaVague executed to perform this action on the right-hand side of the Gradio page.

> Note you can open the Gradio interface in your browser using the URL displayed in the cell output below.


### Conclusions

That brings us to the end of this quick-tour. If you have any questions, join us on the LaVague Discord [here](https://discord.com/invite/SDxn9KpqX9).