Evaluate on OSWorld #642

abrichr · 2024-04-29T00:28:41Z

Feature request

We would like to test OpenAdapt's ability to perform the tasks in https://os-world.github.io/.

This may involve creating recordings of the tasks described in the benchmark, since (as per https://github.com/xlang-ai/OSWorld/tree/main/evaluation_examples) the data sample are formatted as:

{
    "id": "uid", # unique id
    "snapshot": "snapshot_id", # the snapshot id of the environment, with some data already there and apps already opened, or just desktop
    "instruction": "natural_language_instruction", # the natural language instruction of the task, what we want the agent to do
    "source": "website_url", # where we know this example, some forum, or some website, or some paper
    "config": {xxx}, # the scripts to setup the donwload and open files actions, as the initial state of a task
    "trajectory": "trajectory_directory", # the trajectory directory, which contains the action sequence file, the screenshots and the recording video
    "related_apps": ["app1", "app2", ...], # the related apps, which are opened during the task
    "evaluator": "evaluation_dir", # the directory of the evaluator, which contains the evaluation script for this example
…
}

The ./trajectories file contains the annotated trajectories for each data item in ./examples for finishing the task.

Unfortunately this file does not appear to be included in the repo. Therefore completing this evaluation may involve manually re-creating the trajectories via openadapt.record.

Motivation

Evaluation

The text was updated successfully, but these errors were encountered:

abrichr added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Apr 29, 2024

abrichr mentioned this issue Apr 29, 2024

Trajectories not included xlang-ai/OSWorld#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate on OSWorld #642

Evaluate on OSWorld #642

abrichr commented Apr 29, 2024 •

edited

Evaluate on OSWorld #642

Evaluate on OSWorld #642

Comments

abrichr commented Apr 29, 2024 • edited

Feature request

Motivation

abrichr commented Apr 29, 2024 •

edited