Dear Dr. Tongyu Lu, Prof. Dorien Herremans, and the AMAAI Lab Team,
I am a senior undergraduate student majoring in Computer Science and Technology at Zhejiang University, trying to reproduce the out-of-domain evaluation on the MCIC dataset (Table 4) from your paper.
I strictly followed the paper:
- Rendered the 116 MIDI pairs to 24kHz mono WAV using
Musyng Kite SoundFont (Section 3.3).
- Used the exact default configuration: 10s window, 10s hop.
- Used the official checkpoint
siamese_net_20250328.ckpt from Hugging Face.
- Called
aggregate_decision_matrix from test.py with the paper's recommended thresholds (proportion_thres=0.4, decision_thres=0.99).
However, the best Precision I obtained is only 0.2929 (F1-score significantly lower than the reported 0.73), and I cannot reproduce Table 4.
Reproduction details
I have prepared a detailed Markdown with the full pipeline, directory structure, and core code (especially model loading and inference part).
pipeline.md
Could you please take a look and let me know if I missed any important step.
(e.g., audio preprocessing details, MERT layer selection, checkpoint usage, or threshold settings)?
I would greatly appreciate any guidance to help me successfully reproduce the reported F1-score.
Thank you very much for your time and for this excellent work! I really appreciate any guidance.
Dear Dr. Tongyu Lu, Prof. Dorien Herremans, and the AMAAI Lab Team,
I am a senior undergraduate student majoring in Computer Science and Technology at Zhejiang University, trying to reproduce the out-of-domain evaluation on the MCIC dataset (Table 4) from your paper.
I strictly followed the paper:
Musyng KiteSoundFont (Section 3.3).siamese_net_20250328.ckptfrom Hugging Face.aggregate_decision_matrixfromtest.pywith the paper's recommended thresholds (proportion_thres=0.4,decision_thres=0.99).However, the best Precision I obtained is only 0.2929 (F1-score significantly lower than the reported 0.73), and I cannot reproduce Table 4.
Reproduction details
I have prepared a detailed Markdown with the full pipeline, directory structure, and core code (especially model loading and inference part).
pipeline.md
Could you please take a look and let me know if I missed any important step.
(e.g., audio preprocessing details, MERT layer selection, checkpoint usage, or threshold settings)?
I would greatly appreciate any guidance to help me successfully reproduce the reported F1-score.
Thank you very much for your time and for this excellent work! I really appreciate any guidance.