Skip to content

[Enhancement]: N1 Task Instruction with Notification #179

@clairetsai1222

Description

@clairetsai1222

What feature or enhancement are you proposing?

Thank you so much for generously providing the entire training dataset! I believe there's an aspect of the N1 sub-dataset that could be further optimized to make it even more usable.

During my data inspection, I noticed an inconsistency in the ordering of task entries between two files:

In the file
InternData-N1-mini\vln_n1\traj_data\matterport3d_d435i\B6ByNegPMKs\trajectory_14\meta\tasks.jsonl,
the task instructions are structured as follows:

{"task_index": 0, "task": "{\"sub_instruction\": \"Walk straight ahead, passing the black office chair on your left and the whiteboard on your right. Stop at the end of the corridor where the wall meets the floor.\", \"sub_indexes\": [0, 49], \"revised_sub_instruction\": \"March forward with purpose, gliding past the sleek obsidian chair to your left and the chalk-clad board to your right. Arrive at the corridor’s terminus where the wall folds into the floor, signaling the endpoint.\"}"}
{"task_index": 1, "task": "{\"sum_instruction\": \"March forward with purpose, gliding past the sleek obsidian chair to your left and the chalk-clad board to your right. Arrive at the corridor's terminus where the wall folds into the floor, signaling the endpoint.\", \"sum_indexes\": [0, 49]}"}

Whereas in the file
InternData-N1-mini\vln_n1\traj_data\matterport3d_d435i\1LXtFkjw3qL\trajectory_2\meta\tasks.jsonl,
the order is reversed:

{"task_index": 0, "task": "{\"sum_instruction\": \"Maintain a straight course, with the sleek ebony chair drifting by to your left and flowing ivory drapes swaying on the right—halt where the hallway meets the gentle curve of the staircase.\", \"sum_indexes\": [0, 121]}"}
{"task_index": 1, "task": "{\"sub_instruction\": \"Walk straight ahead, passing the black armchair on your left and the white curtains on your right. Stop at the end of the hallway where the staircase begins.\", \"sub_indexes\": [0, 121], \"revised_sub_instruction\": \"Maintain a straight course, with the sleek ebony chair drifting by to your left and flowing ivory drapes swaying on the right—halt where the hallway meets the gentle curve of the staircase.\"}"}

Specifically, the entry with task_index: 0 corresponds to the "sub_instruction" in the first file but to the "sum_instruction" in the second file. This inconsistent ordering may negatively impact fine-tuning efforts for models that rely on consistent task indexing across scenes.

It would be very helpful if this inconsistency could be explicitly noted in the dataset documentation, enabling future users to perform appropriate preprocessing and avoid potential issues during training.

Motivation

The intention of the suggestions I put forward is to enable subsequent users of the N1 sub - dataset to better apply this dataset. (It might be a bug. I'm not sure about this, so I'm posting it under the Enhancement category.)

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions