First of all, thank you for open-sourcing this work. I found the codebase very easy to follow, and I was able to run the full pipeline end-to-end without much difficulty.
I am trying to reproduce the reported BIRD-dev result for Qwen3-Coder-30B-A3B.
Using the released pipeline, I obtained 71.82 EX (1093 / 1533), while the README reports 73.5 EX for the public file results/bird-dev/qwen3-coder-30b-a3b.json.
To make debugging easier, I am attaching:
Would you mind helping me identify where my reproduction may differ from your setup?
In particular, I would like to know whether there are any important differences in:
- the intended runtime/configuration for the pipeline,
- or any unreleased preprocessing / schema / retrieval settings that are needed to match the reported result.
Any guidance would be greatly appreciated.
First of all, thank you for open-sourcing this work. I found the codebase very easy to follow, and I was able to run the full pipeline end-to-end without much difficulty.
I am trying to reproduce the reported BIRD-dev result for Qwen3-Coder-30B-A3B.
Using the released pipeline, I obtained
71.82 EX(1093 / 1533), while the README reports73.5 EXfor the public fileresults/bird-dev/qwen3-coder-30b-a3b.json.To make debugging easier, I am attaching:
config-bird-qwen3-coder-dev.toml.txt
qwen3-coder-dev.json
Would you mind helping me identify where my reproduction may differ from your setup?
In particular, I would like to know whether there are any important differences in:
Any guidance would be greatly appreciated.