<a href="https://colab.research.google.com/github/JSJeong-me/LGE-PRI-1st/blob/main/interface-agents/interface_agents_naver.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Interface Agents

Interface agents address tasks by interacting with a user interface. They can be used to build systems that address tasks that are not easily accessible through an API.  


This notebook demonstrates how the `InterfaceAgent` package to address entire tasks by interacting with a user interface. The package is built on top of the `Playwright` library, which provides a high-level API to interact with web pages.





In [None]:
!pip install interfaceagent

In [None]:
!playwright install

In [None]:
from google.colab import userdata
import openai
import os

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
openai.api_key  = os.environ["OPENAI_API_KEY"]

In [None]:
from interfaceagent import WebBrowser, Planner, OpenAIPlannerModel

In [None]:
browser = WebBrowser(start_url="http://naver.com/",headless=True) # False
model = OpenAIPlannerModel(model="gpt-4o-mini-2024-07-18")

In [None]:
task = f"""
Here is the step-by-step English prompt to execute the described task:

Navigate to the URL:
Open the website [https://www.naver.com/].

Locate the Search Bar:
Find the search bar on the Naver homepage.

Input Search Query:
Type "선유도역 현재기온" (Seonyudo Station current temperature) into the search bar.

Execute Search:
Press the search button or hit Enter to submit the query.

Extract Current Temperature:
Wait for the search results to load, locate the section displaying the current temperature, and extract the temperature as text.
"""

In [None]:
planner = Planner(model=model, web_browser=browser, task=task)

In [None]:
result = await planner.run(task=task)

[32m2024-12-02 05:13:39.974[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mrun[0m:[36m254[0m - [1mWebBrowser not initialized. Initializing now.[0m
[32m2024-12-02 05:13:42.493[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36minitialize[0m:[36m39[0m - [1mWebBrowser successfully initialized.[0m
[32m2024-12-02 05:13:44.455[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mgenerate_plan[0m:[36m58[0m - [1mHigh-level plan: ['Navigate to the URL: Open the website https://www.naver.com/', 'Locate the Search Bar: Find the search bar on the Naver homepage.', "Input Search Query: Type '선유도역 현재기온' (Seonyudo Station current temperature) into the search bar.", 'Execute Search: Press the search button or hit Enter to submit the query.', 'Extract Current Temperature: Wait for the search results to load, locate the section displaying the current temperature, and extract the temperature as text.'][0m
[32m2024-12-0

In [None]:
print(result)

{'task': '\nHere is the step-by-step English prompt to execute the described task:\n\nNavigate to the URL:\nOpen the website [https://www.naver.com/].\n\nLocate the Search Bar:\nFind the search bar on the Naver homepage.\n\nInput Search Query:\nType "선유도역 현재기온" (Seonyudo Station current temperature) into the search bar.\n\nExecute Search:\nPress the search button or hit Enter to submit the query.\n\nExtract Current Temperature:\nWait for the search results to load, locate the section displaying the current temperature, and extract the temperature as text.\n', 'page_content': {'content': "메뉴 영역으로 바로가기\n본문 영역으로 바로가기\nNAVER\n한글 입력기\n자동완성 레이어\n검색\n사용자 링크\n로그인\n서비스 더보기\n블로그\n카페\n이미지\n지식iN\n인플루언서\n동영상\n쇼핑\n뉴스\n어학사전\n지도\n도서\n지식백과\n학술정보\n다음\n더보기\n공유\n선유도역 현재기온 검색 결과\n영등포구 양평동5가\n영등포구 양평동5가\n오늘\n \n내일\n \n모레\n \n월간\n \n과거\n날씨 제공사 설정\n 기상청\n \n 아큐웨더\n \n 웨더채널\n \n 웨더뉴스\n예보비교\n오늘의 날씨\n흐림\n현재 온도\n13.4°\n\n어제보다 3.9° \n높아요\n 흐림\n\n체감 13.4° 습도 70% 남서풍 1.9m/s\nCCTV\n날씨지도\n미세먼지\n보통\n \n초미세먼지\n보통\n \n자외

In [None]:
import base64
from IPython.display import HTML

def display_image(file_path):
    # Read the image file
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode()

    # Create the HTML to display the image
    html = f'<img src="data:image/png;base64,{encoded_string}" />'

    # Display the HTML
    return HTML(html)

# Usage
display_image('screenshot.png')

## Manually Stepping Through the Task

We accomplish this through the following:

- Initialize a browser object
- Use the planner to plan the next steps to take
- Manually execute each step and view responses

In [None]:
browser = WebBrowser(start_url="http://bing.com/",headless=True) # Change headless to True
model = OpenAIPlannerModel(model="gpt-4o-mini-2024-07-18")
task = "What is the website for the Manning Book - Multi-Agent Systems with AutoGen"

planner = Planner(model=model, web_browser=browser, task=task)

next_actions = await planner.next_actions()
print(next_actions)

[32m2024-12-02 05:00:01.490[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mnext_actions[0m:[36m70[0m - [1mWebBrowser not initialized. Initializing now.[0m
[32m2024-12-02 05:00:02.614[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36minitialize[0m:[36m39[0m - [1mWebBrowser successfully initialized.[0m
[32m2024-12-02 05:00:03.820[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36mget_interactive_elements[0m:[36m167[0m - [1mTotal interactive elements found: 1[0m
[32m2024-12-02 05:00:06.557[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mnext_actions[0m:[36m106[0m - [1mNext actions: [{'action': 'type', 'selector': "input[type='text']", 'selector_type': 'css', 'value': 'Multi-Agent Systems with AutoGen Manning Book', 'url': ''}, {'action': 'press', 'selector': "input[type='text']", 'selector_type': 'css', 'value': 'Enter', 'url': ''}][0m


[{'action': 'type', 'selector': "input[type='text']", 'selector_type': 'css', 'value': 'Multi-Agent Systems with AutoGen Manning Book', 'url': ''}, {'action': 'press', 'selector': "input[type='text']", 'selector_type': 'css', 'value': 'Enter', 'url': ''}]


In [None]:
await planner.execute_action(next_actions[0])
await browser.screenshot("screenshot.png")
display_image('screenshot.png')


[32m2024-12-02 05:00:21.130[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mexecute_action[0m:[36m152[0m - [1mExecuting: action='type' selector="input[type='text']" value='Multi-Agent Systems with AutoGen Manning Book'[0m
[32m2024-12-02 05:00:51.181[0m | [31m[1mERROR   [0m | [36minterfaceagent.interface.webbrowser[0m:[36m_handle_element_action[0m:[36m107[0m - [31m[1mTimeout occurred. Current URL: https://www.bing.com/?toWww=1&redig=58104824267047D884682FD54973943D, action: action='type' selector="input[type='text']" value='Multi-Agent Systems with AutoGen Manning Book'[0m
[32m2024-12-02 05:00:51.195[0m | [31m[1mERROR   [0m | [36minterfaceagent.interface.webbrowser[0m:[36m_handle_element_action[0m:[36m109[0m - [31m[1mPage title: Bing[0m


In [None]:
await planner.execute_action(next_actions[0])
await browser.screenshot("screenshot.png")
await planner.execute_action(next_actions[1])
display_image('screenshot.png')



[32m2024-12-02 04:52:39.994[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mexecute_action[0m:[36m152[0m - [1mExecuting: action='type' selector="input[type='text']" value='Multi-Agent Systems with AutoGen'[0m
[32m2024-12-02 04:53:10.016[0m | [31m[1mERROR   [0m | [36minterfaceagent.interface.webbrowser[0m:[36m_handle_element_action[0m:[36m107[0m - [31m[1mTimeout occurred. Current URL: https://www.bing.com/?toWww=1&redig=F3806696C6474091AA7946C65EBE0CF1, action: action='type' selector="input[type='text']" value='Multi-Agent Systems with AutoGen'[0m
[32m2024-12-02 04:53:10.033[0m | [31m[1mERROR   [0m | [36minterfaceagent.interface.webbrowser[0m:[36m_handle_element_action[0m:[36m109[0m - [31m[1mPage title: Microsoft Bing 搜尋[0m
[32m2024-12-02 04:53:10.416[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mexecute_action[0m:[36m152[0m - [1mExecuting: action='press' selector="input[type='text']" value='Enter'

In [None]:
await browser.close()