In [2]:
# Install necessary elements
from lavague.drivers.selenium import SeleniumDriver
from lavague.core import ActionEngine, WorldModel
from lavague.core.agents import WebAgent

from llama_index.llms.gemini import Gemini
from llama_index.embeddings.fireworks import FireworksEmbedding
from llama_index.multi_modal_llms.gemini import GeminiMultiModal

embed_model = FireworksEmbedding()
mm_llm = GeminiMultiModal(model_name="models/gemini-1.5-pro-latest")
llm = Gemini(model_name="models/gemini-1.5-flash-latest")

from google.generativeai.types import HarmCategory, HarmBlockThreshold

llm._model._safety_settings={
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
}

# Set up our three key components: Driver, Action Engine, World Model
driver = SeleniumDriver()
action_engine = ActionEngine(driver, llm=llm, embedding=embed_model)
world_model = WorldModel()

# Create Web Agent
agent = WebAgent(world_model, action_engine)

# Set URL
agent.get("https://www.wikijournalclub.org/wiki/Main_Page")

objective = """I have the question: "what evidence is there for beta blockers helping patients after the patients suffered a myocardial infarction"
Find the most relevant paper to answer this question. Don't scan the whole page as it would be a waste of time as all articles are on the same page but scroll down or up if needed.
When answering the question using a given paper, explain how the paper is relevant, what were the major data that supports the answer, and how the experiment was designed."""

# Run agent with a specific objective
agent.demo(objective=objective)

  from .autonotebook import tqdm as notebook_tqdm
2024-06-23 15:06:17,796 - INFO - Screenshot folder cleared


Running on local URL:  http://127.0.0.1:7860
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.
--------
Running on public URL: https://827b732ecc7aa01798.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


2024-06-23 15:06:37,550 - INFO - Screenshot folder cleared
2024-06-23 15:06:40,993 - INFO - Thoughts:
- The current screenshot shows the main page of the Wiki Journal Club.
- The objective is to find the most relevant paper regarding the use of beta blockers in patients after a myocardial infarction.
- The page has categories for different medical specialties, which might help narrow down the search.
- The most relevant category for myocardial infarction would likely be "Cardiology".
- The next step should be to navigate to the "Cardiology" section to find relevant papers.

Next engine: Navigation Engine
Instruction: Click on the 'Cardiology' link under the 'Browse' section.
2024-06-23 15:06:58,399 - INFO - Thoughts:
- The current screenshot shows a list of articles under the 'Cardiology' section.
- The objective is to find the most relevant paper that provides evidence for beta blockers helping patients after a myocardial infarction.
- The titles of the articles are visible, and some 

Navigation error: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//h2[contains(text(), 'beta blockers') and contains(text(), 'myocardial infarction')]"}
  (Session info: chrome-headless-shell=123.0.6312.4); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
#0 0x559e77e0e863 <unknown>
#1 0x559e77b048c6 <unknown>
#2 0x559e77b4f618 <unknown>
#3 0x559e77b4f6d1 <unknown>
#4 0x559e77b92744 <unknown>
#5 0x559e77b715cd <unknown>
#6 0x559e77b8fc19 <unknown>
#7 0x559e77b71343 <unknown>
#8 0x559e77b42593 <unknown>
#9 0x559e77b42f5e <unknown>
#10 0x559e77dd284b <unknown>
#11 0x559e77dd67a5 <unknown>
#12 0x559e77dc0571 <unknown>
#13 0x559e77dd7332 <unknown>
#14 0x559e77da587f <unknown>
#15 0x559e77dfd728 <unknown>
#16 0x559e77dfd8fb <unknown>
#17 0x559e77e0d9b4 <unknown>
#18 0x7f3b812f7609 start_thread

Navigation error: Message: no such element: Unable to lo

2024-06-23 15:07:40,557 - INFO - Thoughts:
- The current screenshot shows a list of articles under the 'Cardiology' section.
- The objective is to find the most relevant paper that discusses the evidence for beta blockers helping patients after a myocardial infarction.
- The titles of the articles in the current view do not mention beta blockers or myocardial infarction.
- The next step is to scroll down to view more articles to find a relevant paper.

Next engine: Navigation Controls
Instruction: SCROLL_DOWN
Traceback (most recent call last):
  File "/home/dhuynh/miniconda3/envs/lavague/lib/python3.10/site-packages/gradio/queueing.py", line 566, in process_events
    response = await route_utils.call_process_api(
  File "/home/dhuynh/miniconda3/envs/lavague/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/dhuynh/miniconda3/envs/lavague/lib/python3.10/site-packages/gradio/blocks.py", line 1

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://827b732ecc7aa01798.gradio.live


In [2]:
added_knowledge = """Objective: Extract relevant information about this page's first profile's professional experience
Previous instructions:
- Click on 'Ethan Caldwell'
Last engine: Navigation Engine
Current state:
external_observations:
  vision: '[SCREENSHOT]'
internal_state:
    agent_outputs: []
    user_inputs: []

Thoughts:
- The current screenshot show a profile of an engineer working in finance.
- The objective is to find information about Ethan Caldwell's professional experience.
- We are only seeing the top of the page, so we need to scan the page for more information.
Next engine: Navigation Controls
Instruction: SCAN
-----
Objective: Extract relevant information about this page's first profile's professional experience
Previous instructions:
- Click on 'Ethan Caldwell'
- SCAN
Last engine: Navigation Controls
Current state:
external_observations:
  vision: '[SCREENSHOT]'
internal_state:
    agent_outputs: []
    user_inputs: []

Thoughts:
- The current screenshots show the whole profile of Ethan Caldwell.
- The objective is to find information about Ethan Caldwell's professional experience.
- We have sufficient information in the screenshots on this profile personal experience.
Next engine: COMPLETE
Instruction: Ethan has worked as a Data Scientist at BNP Paribas (June 2023 - Present, Paris, Hybrid), previously as a Business Intelligence Engineer Intern at Amazon (Oct 2022 - Feb 2023, Paris), a Data Scientist Intern at BNP Paribas Cardif (Apr 2022 - Oct 2022, Paris), and a Research Intern at ETIS Lab (Apr 2021 - Sep 2021, Paris), with skills in NumPy, SQL, Python, machine learning, R, databases, data analysis, and programming."""

In [4]:
print(added_knowledge)

Objective: Extract relevant information about this page's first profile's professional experience
Previous instructions:
- Click on 'Ethan Caldwell'
Last engine: Navigation Engine
Current state:
external_observations:
  vision: '[SCREENSHOT]'
internal_state:
    agent_outputs: []
    user_inputs: []

Thoughts:
- The current screenshot show a profile of an engineer working in finance.
- The objective is to find information about Ethan Caldwell's professional experience.
- We are only seeing the top of the page, so we need to scan the page for more information.
Next engine: Navigation Controls
Instruction: SCAN
-----
Objective: Extract relevant information about this page's first profile's professional experience
Previous instructions:
- Click on 'Ethan Caldwell'
- SCAN
Last engine: Navigation Controls
Current state:
external_observations:
  vision: '[SCREENSHOT]'
internal_state:
    agent_outputs: []
    user_inputs: []

Thoughts:
- The current screenshots show the whole profile of Etha

In [1]:
# Install necessary elements
from lavague.drivers.selenium import SeleniumDriver
from lavague.core import ActionEngine, WorldModel
from lavague.core.agents import WebAgent

from llama_index.llms.gemini import Gemini
from llama_index.embeddings.fireworks import FireworksEmbedding
from llama_index.multi_modal_llms.gemini import GeminiMultiModal

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")

embed_model = FireworksEmbedding()
mm_llm = GeminiMultiModal(model_name="models/gemini-1.5-pro-latest")
llm = Gemini(model_name="models/gemini-1.5-flash-latest")

from google.generativeai.types import HarmCategory, HarmBlockThreshold

llm._model._safety_settings={
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
}

# Set up our three key components: Driver, Action Engine, World Model
driver = SeleniumDriver(options=chrome_options)
action_engine = ActionEngine(driver, llm=llm, embedding=embed_model)
world_model = WorldModel(mm_llm=mm_llm)
world_model.add_knowledge("knowledge.txt")

# Create Web Agent
agent = WebAgent(world_model, action_engine)

# Set URL
agent.get("https://www.linkedin.com/search/results/people/?keywords=polytechnique%20mva&origin=SWITCH_SEARCH_VERTICAL&sid=QG%2C")

  from .autonotebook import tqdm as notebook_tqdm
2024-06-24 08:25:05,658 - INFO - Screenshot folder cleared


In [2]:
from langchain_community.document_loaders import UnstructuredHTMLLoader

import re

def clean_text(input_text):
    # Remove HTML-like tags
    text = re.sub(r'<[^>]+>', '', input_text)
    
    # Remove redundant whitespace
    text = re.sub(r'\s+', ' ', text)
    
    # Remove redundant punctuation
    text = re.sub(r'([.,!?])\1+', r'\1', text)
    
    # Remove lines that are just numbers (like follower counts)
    text = re.sub(r'^\d+(\.\d+)?$', '', text, flags=re.MULTILINE)
    
    # Remove lines with just "Show all" or similar
    text = re.sub(r'^Show all.*$', '', text, flags=re.MULTILINE)
    
    # Remove lines with just "Follow" or "Following"
    text = re.sub(r'^(Follow|Following)$', '', text, flags=re.MULTILINE)
    
    # Remove empty lines
    text = re.sub(r'^\s*$\n', '', text, flags=re.MULTILINE)
    
    return text.strip()

def unstructured_html2text(file_name):
    documents = UnstructuredHTMLLoader(file_name).load()
    input_text = documents[0].page_content
    return input_text

def clean_html(html, html2text=None):
    if html2text is None:
        html2text = unstructured_html2text
        
    file_name = "page.html"
    with open(file_name, "w") as f:
        f.write(html)
        
    input_text = html2text(file_name)
    cleaned_text = clean_text(input_text)
    return cleaned_text

In [3]:
action_engine.python_engine.clean_html = clean_html

In [4]:
objective = """Explore all profiles. Click on each profile to see the details. 
Once on a given profile, use the Python Engine to extract the user's professional background, with each experience listed in bullet points with dates geo, and what they did. 
Then go back and select the next profile. Use information about the history of previous clicks to not click on the same profile again.
When mentioning information about the next profile to click on, say precisely the name of the profile and nothing more, like 'Click on Ethan Caldwell'."""

# Run agent with a specific objective
agent.run(objective=objective)

2024-06-24 08:25:18,596 - INFO - Thoughts:
- The current screenshot shows a LinkedIn search results page with multiple profiles listed.
- The objective is to explore each profile, extract the user's professional background, and then move to the next profile.
- The first step is to click on the first profile in the list to start the exploration process.

Next engine: Navigation Engine
Instruction: Click on 'David, Zouhair KHATOURI'.
2024-06-24 08:25:46,416 - INFO - Thoughts:
- The current screenshot shows the LinkedIn profile of David, Zouhair KHATOURI.
- The objective is to explore all profiles and extract the professional background of each user.
- The next step is to extract the professional background of David, Zouhair KHATOURI.
- After extracting the information, we will need to go back and select the next profile.

Next engine: Python Engine
Instruction: Extract the professional background of David, Zouhair KHATOURI, listing each experience in bullet points.
2024-06-24 08:26:04,97

ActionResult(instruction=None, code='from selenium import webdriver\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.common.keys import Keys\nfrom selenium.webdriver.common.action_chains import ActionChains\n\nelse:\nchrome_options = Options()\nchrome_options.add_argument("--headless")\nchrome_options.add_argument("--no-sandbox")\nchrome_options.page_load_strategy = (\n)\n\ndriver = webdriver.Chrome(options=chrome_options)\n\ndriver.set_window_size(1080, 1080)\nviewport_height = driver.execute_script("return window.innerHeight;")\nheight_difference = 1080 - viewport_height\ndriver.set_window_size(1080, 1080 + height_difference)\n\ndriver.get("https://www.linkedin.com/search/results/people/?keywords=polytechnique%20mva&origin=SWITCH_SEARCH_VERTICAL&sid=QG%2C")\n# Let\'s think step by step\n\n# First, we notice that the query asks us to click on the link "David, Zouhair KHATOURI".\n\n# In the provided HTML, we ca

In [None]:
df = agent.logger.return_pandas()
print(df.iloc[4]["output"])
df.to_csv("examples.csv", index=None)

In [16]:
world_model_prompt = df.world_model_prompt[0]
world_model_prompt.split("Here is the next objective:")[1]

"\nObjective: Explore all profiles. Click on each profile to see the details. \nOnce on a given profile, use the Python Engine to extract the user's professional background, with each experience listed in bullet points. \nThen go back and select the next profile. Use information about the history of previous clicks to not click on the same profile again.\nWhen mentioning information about the next profile to click on, say precisely the name of the profile and nothing more, like 'Click on Ethan Caldwell'.\nPrevious instructions:\n[NONE]\nLast engine: [NONE]\nCurrent state:\nexternal_observations:\n  vision: '[SCREENSHOT]'\ninternal_state:\n  agent_outputs: []\n  user_inputs: []\n\n\nThought:\n"

In [21]:
df.world_model_output[0].replace("Thoughts:", "")

for index, row in df.iterrows():
    world_model_prompt = row["world_model_prompt"]
    world_model_output = row["world_model_output"]
    observation = world_model_prompt.split("Here is the next objective:")[1] + world_model_output.replace("Thoughts:", "")
    print(observation)
    print("-----")


Objective: Explore all profiles. Click on each profile to see the details. 
Once on a given profile, use the Python Engine to extract the user's professional background, with each experience listed in bullet points. 
Then go back and select the next profile. Use information about the history of previous clicks to not click on the same profile again.
When mentioning information about the next profile to click on, say precisely the name of the profile and nothing more, like 'Click on Ethan Caldwell'.
Previous instructions:
[NONE]
Last engine: [NONE]
Current state:
external_observations:
  vision: '[SCREENSHOT]'
internal_state:
  agent_outputs: []
  user_inputs: []


Thought:

- The current screenshot shows a LinkedIn search results page with multiple profiles listed.
- The objective is to explore each profile, extract the user's professional background, and then move to the next profile.
- The first step is to click on the first profile in the list to start the exploration process.

Nex