Skip to content

Some sensor-selection scenarios expect failure modes instead of sensors #311

@lentil32

Description

@lentil32

Hi, I noticed two sensor-selection scenarios where the prompt asks the agent to list sensors, but characteristic_form describes the expected answer as one or more failure modes.

Because characteristic_form is used as expected behavior during evaluation, this could make a correct sensor-list answer look wrong to the judge.

Source checked:

Affected rows:

id type Prompt asks for characteristic_form currently expects
111 FMSA sensors of Chiller 6 potentially relevant to Compressor Overheating one or more failure modes of Chiller 6
609 multiagent sensors of Chiller 6 at MAIN site potentially relevant to Compressor Overheating one or more failure modes of Chiller 6

Nearby rows suggest these should use sensor-oriented expected behavior:

  • id 112 asks which sensor should be prioritized for compressor overheating and expects sensors from the Chiller 6 sensor list.
  • id 610 is the multiagent version of that same sensor-prioritization task.
  • The project guideline also lists "List all sensors of Chiller 6 that are potentially relevant to Compressor Overheating." under Sensor-Failure Mode Mapping examples.

Possible fix:

  • Update ids 111 and 609 so characteristic_form expects Chiller 6 sensor names, not failure modes.
  • A replacement could use the same installed Chiller 6 sensor list already used by ids 112 and 610.

Minor adjacent cleanup: ids 112, 113, 610, and 611 already have sensor-oriented expected forms, but use the phrase one of more sensors; I think that should be one or more sensors.

While checking related task configs, I also noticed possible candidate-list cleanup items in data/task/failure_mapping_senarios.jsonl: ids 6 and 12 ask for failure modes, but some entries in their candidate failure-mode lists look like sensor/measurement terms (speed, power, pressure or vacuum, oil debris, etc.). That may be worth auditing separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions