Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate on OSWorld #642

Open
abrichr opened this issue Apr 29, 2024 · 0 comments
Open

Evaluate on OSWorld #642

abrichr opened this issue Apr 29, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@abrichr
Copy link
Contributor

abrichr commented Apr 29, 2024

Feature request

We would like to test OpenAdapt's ability to perform the tasks in https://os-world.github.io/.

This may involve creating recordings of the tasks described in the benchmark, since (as per https://github.com/xlang-ai/OSWorld/tree/main/evaluation_examples) the data sample are formatted as:

{
    "id": "uid", # unique id
    "snapshot": "snapshot_id", # the snapshot id of the environment, with some data already there and apps already opened, or just desktop
    "instruction": "natural_language_instruction", # the natural language instruction of the task, what we want the agent to do
    "source": "website_url", # where we know this example, some forum, or some website, or some paper
    "config": {xxx}, # the scripts to setup the donwload and open files actions, as the initial state of a task
    "trajectory": "trajectory_directory", # the trajectory directory, which contains the action sequence file, the screenshots and the recording video
    "related_apps": ["app1", "app2", ...], # the related apps, which are opened during the task
    "evaluator": "evaluation_dir", # the directory of the evaluator, which contains the evaluation script for this example
…
}

The ./trajectories file contains the annotated trajectories for each data item in ./examples for finishing the task.

Unfortunately this file does not appear to be included in the repo. Therefore completing this evaluation may involve manually re-creating the trajectories via openadapt.record.

Motivation

Evaluation

@abrichr abrichr added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant