Skip to content

sft阶段:Exception: image start token != image end tokens #96

@Yukang-Lin

Description

@Yukang-Lin

您好,我尝试基于agent-cpm进行sft,完全按照readme的构造格式,我的数据如下:
{'id': '0', 'image': 'GUI_Agent/data/data_source/ac/screenshot_10115_0.png', 'conversations': [{'role': 'system', 'content': '# Role\n你是一名熟悉安卓系统触屏GUI操作的智能体,将根据用户的问题,分析当前界面的GUI元素和布局,生成相应的操作。\n\n# Task\n针对用户问题,根据输入的当前屏幕截图,输出下一步的操作。\n\n# Rule\n- 以紧凑JSON格式输出\n- 输出操作必须遵循Schema约束\n\n# Schema\n{"type":"object","description":"执行操作并决定当前任务状态","additionalProperties":false,"required":["thought"],"properties":{"thought":{"type":"string","description":"智能体的思维过程"},"POINT":{"$ref":"#/$defs/Location","description":"点击屏幕上的指定位置"},"to":{"description":"移动,组合手势参数","oneOf":[{"enum":["up","down","left","right"],"description":"从当前点(POINT)出发,执行滑动手势操作,方向包括向上、向下、向左、向右"},{"$ref":"#/$defs/Location","description":"移动到某个位置"}]},"duration":{"type":"integer","description":"动作执行的时间或等待时间,毫秒","minimum":0,"default":200},"PRESS":{"type":"string","description":"触发特殊按键,HOME为回到主页按钮,BACK为返回按钮,ENTER为回车按钮","enum":["HOME","BACK","ENTER"]},"TYPE":{"type":"string","description":"输入文本"},"INTERACT":{"type":"string","description":"与用户发起交互的内容"},"STATUS":{"type":"string","description":"当前任务的状态。特殊情况:satisfied,无需操作;impossible,任务无法完成;interrupt,任务中断;need_feedback,需要用户反馈;","enum":["continue","finish","satisfied","impossible","interrupt","need_feedback"],"default":"continue"}},"$defs":{"Location":{"type":"array","description":"坐标为相对于屏幕左上角位原点的相对位置,并且按照宽高比例缩放到0~1000,数组第一个元素为横坐标x,第二个元素为纵坐标y","items":{"type":"integer","minimum":0,"maximum":1000},"minItems":2,"maxItems":2}}}'}, {'role': 'user', 'content': '<Question>Open the California Pizza app, then add a Miami Beast pizza in large size with a thin crust and make sure it is gluten-free, and add to the cart.</Question>\n当前屏幕截图:<image>'}, {'role': 'assistant', 'content': '{\'thought\': "I am given the task to add a Miami Beast pizza with large size, thin crust, and ensure it is gluten-free. However, in many pizza ordering contexts, thin crust and gluten-free crust are often mutually exclusive options—thin crust typically contains gluten, while gluten-free crust is a separate, specific type. The wording \'thin crust\' and \'gluten-free\' in the same request creates a direct conflict because they may not be simultaneously selectable. This requires confirmation to prioritize one constraint over the other, as proceeding without clarification risks violating the user\'s intent.", \'STATUS\': \'continue\', \'INTERACT\': \'Should I prioritize the thin crust or ensure the pizza is gluten-free?\'}'}], 'episode_id': 87, 'step_id': 0}

报错来自conversation_to_ids的

if len(image_start_tokens) != len(image_end_tokens):
        logger.error("image start token != image end tokens")
        raise Exception(f"image start token != image end tokens, ({len(image_start_tokens)},{len(image_end_tokens)})")

报错Exception: image start token != image end tokens, (4,3),使用的LLM_TYPE="qwen"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions