Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 4-char element ids instead of sequential integers #361

Merged
merged 8 commits into from
Jun 3, 2024

Conversation

alexzarbn
Copy link
Contributor

Problem

This issue with element ids is tightly related to the iframes support. When element tree is built by injecting domUtils.js into each iframe, the resulting ids are being messed up, because at first the page element tree is built and then element tree of each iframe on that page, resulting in iframe children ids duplicating some ids of elements on the page.

Solution

Use random 4-char ids for elements. This way the we can manipulate the tree without worrying about uniqueness of the element ids.

@alexzarbn alexzarbn mentioned this pull request May 24, 2024
Copy link
Contributor

@ykeremy ykeremy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Tested on my local as well, worked really well. I've couple of small comments.

skyvern/webeye/scraper/domUtils.js Outdated Show resolved Hide resolved
skyvern/webeye/scraper/domUtils.js Show resolved Hide resolved
skyvern/webeye/scraper/domUtils.js Outdated Show resolved Hide resolved
@ykeremy
Copy link
Contributor

ykeremy commented May 26, 2024

There are some prettier and mypy errors as well, would be perfect if you could take a look at those.

@alexzarbn
Copy link
Contributor Author

alexzarbn commented May 28, 2024

@ykeremy

There are some prettier and mypy errors as well, would be perfect if you could take a look at those.

Fixed those, but for this mypy error I had to add unique ids to options too. Not sure, if this is the right course of action in this case.

skyvern/webeye/actions/handler.py:487: error: Invalid index type "int" for "dict[str, str]"; expected type "str"  [index]

Please met know, what was the intention behind mapping index -> id in the prompt, whether it was a typo (also considering that there is no optionIndex included in the prompt as far as I can see) or it was intentional, and whether including options ids in the prompt is okay. Perhaps the implementation should rely solely on option indexes and match by xpath could be removed?

"index": int, // the id corresponding to the optionIndex under the the select element.

@alexzarbn alexzarbn requested a review from ykeremy May 28, 2024 07:44
"value": str // the value of the option. MAKE SURE YOU USE THIS VALUE TO SELECT THE OPTION. DO NOT PUT ANYTHING OTHER THAN A VALID OPTION VALUE HERE
"id": str // the id of the option.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the option.id to prevent LLM hallucinating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can remove this. I don't think we're even passing the option id to the LLM.

@@ -484,7 +484,7 @@ async def handle_select_option_action(
return [ActionFailure(e)]

try:
option_xpath = scraped_page.id_to_xpath_dict[action.option.index]
option_xpath = scraped_page.id_to_xpath_dict[action.option.id]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is a legacy bug. We didn't store the option element as a single key in the scraped_page.id_to_xpath_dict. So the option.id here will never be found.

id_to_xpath_dict = {}
id_to_element_dict = {}
for element in elements:
element_id = element["id"]
# get_interactable_element_tree marks each interactable element with a unique_id attribute
id_to_xpath_dict[element_id] = f"//*[@{SKYVERN_ID_ATTR}='{element_id}']"
id_to_element_dict[element_id] = element

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah this is a big bug 😮 We should stop using id_to_xpath dict and deprecate it IMO

@LawyZheng
Copy link
Collaborator

this can trigger an error. Can you remove this?

element_id = int(element_id)

@LawyZheng
Copy link
Collaborator

LawyZheng commented May 31, 2024

@alexzarbn

Please met know, what was the intention behind mapping index -> id in the prompt, whether it was a typo (also considering that there is no optionIndex included in the prompt as far as I can see) or it was intentional, and whether including options ids in the prompt is okay. Perhaps the implementation should rely solely on option indexes and match by xpath could be removed?

"index": int, // the id corresponding to the optionIndex under the the select element.

nice catch. this is a legacy problem after we finished the HTML element tree.
can you also fix this in the extract-action-claude3-sonnet.j2 file?

@LawyZheng
Copy link
Collaborator

after you finish all suggestions, can you solve the conflict?
and we can start the testing.

@ykeremy
Copy link
Contributor

ykeremy commented Jun 3, 2024

@LawyZheng we took over this PR, feel free to merge this on a new branch and then take over.

"text": str, // Text for INPUT_TEXT action only
"file_url": str, // The url of the file to upload if applicable. This field must be present for UPLOAD_FILE but can also be present for CLICK only if the click is to upload the file. It should be null otherwise.
"download": bool, // Can only be true for CLICK actions. If true, the browser will trigger a download by clicking the element. If false, the browser will click the element without triggering a download.
"option": { // The option to select for SELECT_OPTION action only. null if not SELECT_OPTION action
"label": str, // the label of the option if any. MAKE SURE YOU USE THIS LABEL TO SELECT THE OPTION. DO NOT PUT ANYTHING OTHER THAN A VALID OPTION LABEL HERE
"index": int, // the id corresponding to the optionIndex under the the select element.
"index": int, // the index corresponding to the index under the select element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"index": int, // the index corresponding to the index under the select element.
"index": int, // the index corresponding to the option index under the select element.

We add a specific option index field to the element tree. Let's keep this wording the same.

"value": str // the value of the option. MAKE SURE YOU USE THIS VALUE TO SELECT THE OPTION. DO NOT PUT ANYTHING OTHER THAN A VALID OPTION VALUE HERE
"id": str // the id of the option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can remove this. I don't think we're even passing the option id to the LLM.

@@ -93,9 +93,10 @@ class SelectOption(BaseModel):
label: str | None
value: str | None
index: int | None
id: str | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LawyZheng let's remove this too

@@ -484,7 +484,7 @@ async def handle_select_option_action(
return [ActionFailure(e)]

try:
option_xpath = scraped_page.id_to_xpath_dict[action.option.index]
option_xpath = scraped_page.id_to_xpath_dict[action.option.id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah this is a big bug 😮 We should stop using id_to_xpath dict and deprecate it IMO

@LawyZheng LawyZheng merged commit d5abfc9 into Skyvern-AI:main Jun 3, 2024
2 checks passed
jesalg pushed a commit to Thoughtful-Automation/skyvern that referenced this pull request Jun 13, 2024
Co-authored-by: LawyZheng <lawyzheng1106@gmail.com>
jesalg pushed a commit to Thoughtful-Automation/skyvern that referenced this pull request Jun 13, 2024
Co-authored-by: LawyZheng <lawyzheng1106@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants