Skip to content

Understanding processing of Mind2Web dataset for Lumos grounding #5

@DanielRoeder1

Description

@DanielRoeder1

Hello,

I am trying to map the Lumos WebAgent grounding dataset onto the original Mind2Web dataset. Unfortunetly the ids (annotation_id, action_uid) were removed in the Lumos version but via query extraction and matching I can match 1001/1009 samples to their corresponding Mind2Web entries.

But the problem that I am facing now is that Lumos must have done some processing on the actions itself. Lumos appears to have sometimes more, sometimes less actions (i.e. user msgs defining a grounding sentence). Why is this the case? Which processing was applied?

For my work I need a mapping of the Lumos grounding steps (that is the user msgs in the Lumos dataset) to the html_source code found in Mind2Web.

Happy to receive and guidance or advice and thanks for the great open-source work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions