Skip to content

[Question] Cannot reproduce MCIC results with official 10s/10s config (F1 much lower than Table 4) #2

@TheLunarD

Description

@TheLunarD

Dear Dr. Tongyu Lu, Prof. Dorien Herremans, and the AMAAI Lab Team,
I am a senior undergraduate student majoring in Computer Science and Technology at Zhejiang University, trying to reproduce the out-of-domain evaluation on the MCIC dataset (Table 4) from your paper.

I strictly followed the paper:

  • Rendered the 116 MIDI pairs to 24kHz mono WAV using Musyng Kite SoundFont (Section 3.3).
  • Used the exact default configuration: 10s window, 10s hop.
  • Used the official checkpoint siamese_net_20250328.ckpt from Hugging Face.
  • Called aggregate_decision_matrix from test.py with the paper's recommended thresholds (proportion_thres=0.4, decision_thres=0.99).

However, the best Precision I obtained is only 0.2929 (F1-score significantly lower than the reported 0.73), and I cannot reproduce Table 4.

Reproduction details

I have prepared a detailed Markdown with the full pipeline, directory structure, and core code (especially model loading and inference part).

pipeline.md

Could you please take a look and let me know if I missed any important step.
(e.g., audio preprocessing details, MERT layer selection, checkpoint usage, or threshold settings)?
I would greatly appreciate any guidance to help me successfully reproduce the reported F1-score.
Thank you very much for your time and for this excellent work! I really appreciate any guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions