Understanding processing of Mind2Web dataset for Lumos grounding

Hello, 

I am trying to map the Lumos WebAgent grounding dataset onto the original Mind2Web dataset. Unfortunetly the ids (annotation_id, action_uid) were removed in the Lumos version but via query extraction and matching I can match 1001/1009 samples to their corresponding Mind2Web entries.

But the problem that I am facing now is that Lumos must have done some processing on the actions itself. Lumos appears to have sometimes more, sometimes less actions (i.e. user msgs defining a grounding sentence). Why is this the case?  Which processing was applied?

For my work I need a mapping of the Lumos grounding steps (that is the user msgs in the Lumos dataset) to the html_source code found in Mind2Web.

Happy to receive and guidance or advice and thanks for the great open-source work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding processing of Mind2Web dataset for Lumos grounding #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understanding processing of Mind2Web dataset for Lumos grounding #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions