Skip to content

Commit

Permalink
Streamlit + Readme update: copy to cURL (#22)
Browse files Browse the repository at this point in the history
  • Loading branch information
suchintan committed Mar 4, 2024
1 parent 0495552 commit 3bf5671
Show file tree
Hide file tree
Showing 6 changed files with 79 additions and 34 deletions.
69 changes: 45 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,35 @@
<img src="images/geico_shu_recording_cropped.gif"/>
</p>

Want to see more examples of Skyvern in action? Click [here](#real-world-examples-of-skyvern)!
Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.

Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them.

This approach gives us a few advantages:

1. Skyvern can operate on websites it’s never seen before, as it’s able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, it’s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)


Want to see examples of Skyvern in action? Jump to [#real-world-examples-of-skyvern](#real-world-examples-of-skyvern)


# How it works
Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/).

<picture>
<source media="(prefers-color-scheme: dark)" srcset="images/skyvern-system-diagram-dark.png" />
<img src="images/skyvern-system-diagram-light.png" />
</picture>

<!-- TODO (suchintan):
Expand the diagram above to go deeper into how:
1. We draw bounding boxes
2. We parse the HTML + extract the image to generate an interactable element map
-->

# Quickstart
This quickstart guide will walk you through getting Skyvern up and running on your local machine.
Expand Down Expand Up @@ -72,20 +99,26 @@ pre-commit install

## Running your first automation

### Executing tasks (UI)
Once you have the UI running, you can start an automation by filling out the fields shown in the UI and clicking "Execute"

# How it works
Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major difference: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/).
<p align="center">
<img src="images/skyvern_visualizer_run_task.png"/>
</p>

<picture>
<source media="(prefers-color-scheme: dark)" srcset="images/skyvern-system-diagram-dark.png"/>
<img src="images/skyvern-system-diagram-light.png"/>
</picture>
### Executing tasks (cURL)

```
curl -X POST -H 'Content-Type: application/json' -H 'x-api-key: {Your local API key}' -d '{
"url": "https://www.geico.com",
"webhook_callback_url": "",
"navigation_goal": "Navigate through the website until you generate an auto insurance quote. Do not generate a home insurance quote. If this page contains an auto insurance quote, consider the goal achieved",
"data_extraction_goal": "Extract all quote information in JSON format including the premium amount, the timeframe for the quote.",
"navigation_payload": "{Your data here}",
"proxy_location": "NONE"
}' http://0.0.0.0:8000/api/v1/tasks
```

<!-- > TODO (suchintan):
Expand the diagram above to go deeper into how:
1. We draw bounding boxes
2. We parse the HTML + extract the image to generate an interactable element map
-->

# Real-world examples of Skyvern
<!-- > TODO (suchintan):
Expand Down Expand Up @@ -123,18 +156,6 @@ More extensive documentation can be found on our [documentation website](https:/

Our focus is bringing stability to browser-based workflows. We leverage LLMs to create an AI Agent capable of interacting with websites like you or I would — all via a simple API call.

Traditional approaches required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.

Skyvern operates like a human — increasing reliability by not relying on fragile scripts, instead relying on computer vision to parse items in the viewport and interact with them the way a human would.

This approach gives us a few advantages:

1. Skyvern can operate on websites it’s never seen before, as it’s able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern is able to circumvent or navigate through many bot detection methods as many of them rely on allowing people to access the websites
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, it’s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)


# Feature Roadmap
Expand Down
Binary file added images/skyvern_visualizer_run_task.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ asyncache = "^0.3.1"
orjson = "^3.9.10"
structlog = "^23.2.0"
plotly = "^5.18.0"
clipboard = "^0.0.4"
curlify = "^2.2.1"


[tool.poetry.group.dev.dependencies]
Expand All @@ -66,6 +68,7 @@ notebook = "^7.0.6"
freezegun = "^1.2.2"
snoop = "^0.4.3"
rich = {extras = ["jupyter"], version = "^13.7.0"}
clipboard = "^0.0.4"


[build-system]
Expand Down
16 changes: 15 additions & 1 deletion streamlit_app/visualizer/api.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import json
from typing import Any

import curlify
import requests
from requests import PreparedRequest

from skyvern.forge.sdk.schemas.tasks import TaskRequest

Expand All @@ -11,19 +13,31 @@ def __init__(self, base_url: str, credentials: str):
self.base_url = base_url
self.credentials = credentials

def create_task(self, task_request_body: TaskRequest) -> str | None:
def generate_curl_params(self, task_request_body: TaskRequest) -> PreparedRequest:
url = f"{self.base_url}/tasks"
payload = task_request_body.model_dump()
headers = {
"Content-Type": "application/json",
"x-api-key": self.credentials,
}

return url, payload, headers

def create_task(self, task_request_body: TaskRequest) -> str | None:
url, payload, headers = self.generate_curl_params(task_request_body)

response = requests.post(url, headers=headers, data=json.dumps(payload))
if "task_id" not in response.json():
return None
return response.json()["task_id"]

def copy_curl(self, task_request_body: TaskRequest) -> str:
url, payload, headers = self.generate_curl_params(task_request_body)

req = requests.Request("POST", url, headers=headers, data=json.dumps(payload, indent=4))

return curlify.to_curl(req.prepare())

def get_task(self, task_id: str) -> dict[str, Any] | None:
"""Get a task by id."""
url = f"{self.base_url}/internal/tasks/{task_id}"
Expand Down
11 changes: 3 additions & 8 deletions streamlit_app/visualizer/sample_data.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
from pydantic import BaseModel
from skyvern.forge.sdk.schemas.tasks import TaskRequest


class SampleData(BaseModel):
class SampleTaskRequest(TaskRequest):
name: str
url: str
navigation_goal: str
data_extraction_goal: str
navigation_payload: dict
extracted_information_schema: dict


geico_sample_data = SampleData(
geico_sample_data = SampleTaskRequest(
name="Geico",
url="https://www.geico.com",
navigation_goal="Navigate through the website until you generate an auto insurance quote. Do not generate a home insurance quote. If this page contains an auto insurance quote, consider the goal achieved",
Expand Down
14 changes: 13 additions & 1 deletion streamlit_app/visualizer/streamlit.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import clipboard
import pandas as pd
import streamlit as st

Expand Down Expand Up @@ -104,15 +105,26 @@ def select_step(step: dict) -> None:
st.markdown(f"### **{select_env} - {select_org}**")
execute_tab, visualizer_tab = st.tabs(["Execute", "Visualizer"])


def copy_curl_to_clipboard(task_request_body: TaskRequest) -> None:
clipboard.copy(client.copy_curl(task_request_body=task_request_body))


with execute_tab:
example_tabs = st.tabs([supported_example.name for supported_example in supported_examples])

for i, example_tab in enumerate(example_tabs):
with example_tab:
create_column, explanation_column = st.columns([1, 2])
with create_column:
run_task, copy_curl = st.columns([3, 1])
task_request_body = supported_examples[i]
copy_curl.button(
"Copy cURL", on_click=lambda: copy_curl_to_clipboard(task_request_body=task_request_body)
)
with st.form("task_form"):
st.markdown("## Run a task")
run_task.markdown("## Run a task")

example = supported_examples[i]
# Create all the fields to create a TaskRequest object
st_url = st.text_input("URL*", value=example.url, key="url")
Expand Down

0 comments on commit 3bf5671

Please sign in to comment.