[Question] Cannot reproduce MCIC results with official 10s/10s config (F1 much lower than Table 4)

Dear Dr. Tongyu Lu, Prof. Dorien Herremans, and the AMAAI Lab Team,
I am a senior undergraduate student majoring in Computer Science and Technology at Zhejiang University, trying to reproduce the out-of-domain evaluation on the MCIC dataset (Table 4) from your paper.

I strictly followed the paper:
- Rendered the 116 MIDI pairs to 24kHz mono WAV using `Musyng Kite` SoundFont (Section 3.3).
- Used the exact default configuration: **10s window, 10s hop**.
- Used the official checkpoint `siamese_net_20250328.ckpt` from Hugging Face.
- Called `aggregate_decision_matrix` from `test.py` with the paper's recommended thresholds (`proportion_thres=0.4`, `decision_thres=0.99`).

However, the best Precision I obtained is only **0.2929** (F1-score significantly lower than the reported 0.73), and I cannot reproduce Table 4.

**Reproduction details**

I have prepared a detailed Markdown with the full pipeline, directory structure, and core code (especially model loading and inference part).   

[pipeline.md](https://github.com/user-attachments/files/26558020/pipeline.md)

Could you please take a look and let me know if I missed any important step.
(e.g., audio preprocessing details, MERT layer selection, checkpoint usage, or threshold settings)? 
I would greatly appreciate any guidance to help me successfully reproduce the reported F1-score.
Thank you very much for your time and for this excellent work! I really appreciate any guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Cannot reproduce MCIC results with official 10s/10s config (F1 much lower than Table 4) #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Cannot reproduce MCIC results with official 10s/10s config (F1 much lower than Table 4) #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions