Skip to content

Question about the agent roles and prompts used for MATH500 and AIME experiments #4

@iieao-i

Description

@iieao-i

Hi, thank you for open-sourcing this great work!

I am trying to reproduce the MATH500 and AIME experiments in KVComm. I read the paper and the repository, and I found that the appendix provides prompt designs for GSM8K, MMLU, and HumanEval. However, I could not find a separate prompt design for the MATH500 and AIME experiments reported in the appendix/table.

I would like to ask a few questions:

  1. For the MATH500 and AIME experiments, did you directly reuse the GSM8K-style MathSolver prompts?
  2. When using 2, 3, or 4 collaborating agents on MATH500/AIME, what were the exact agent roles?
    • For example, was it:
      • 2 agents: Math Solver + Mathematical Analyst
      • 3 agents: Math Solver + Mathematical Analyst + Programming Expert
      • 4 agents: Math Solver + Mathematical Analyst + Programming Expert + Inspector
      • plus a final FinalRefer agent for answer aggregation?
  3. Was the Programming Expert agent enabled for MATH500/AIME, and was its Python code actually executed during inference?
  4. Did you modify the final-answer format for AIME, e.g., requiring the final answer to be an integer from 0 to 999?
  5. Could you share the exact prompt/config/script used to run the MATH500 and AIME experiments, if available?

The reason I am asking is that I am trying to evaluate KV reuse on mathematical reasoning benchmarks such as MATH500 and AIME, and I want to make sure my agent configuration and answer extraction are consistent with the original experiments.

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions