New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix raw chroma accuracy for unvoiced estimates #312
Conversation
Thanks for catching this. Happy to merge after @rabitt or @justinsalamon confirm this is correct behavior. |
Gave it a quick look (it's late in Europe), in principle fix looks correct, but will try to give a second look tomorrow to confirm. |
Didn't have time today, but note that the regression tests fail (due to the change in the chroma metric). I imagine that, assuming the fix is confirmed to be correct, the regression data will have to be updated as part of this PR before it can be merged. |
mir_eval/melody.py
Outdated
@@ -563,7 +563,8 @@ def raw_chroma_accuracy(ref_voicing, ref_cent, est_voicing, est_cent, | |||
# Raw chroma = same as raw pitch except that octave errors are ignored. | |||
cent_diff = np.abs(ref_cent - est_cent) | |||
octave = 1200*np.floor(cent_diff/1200.0 + 0.5) | |||
frame_correct = (np.abs(cent_diff - octave)[ref_voicing] < cent_tolerance) | |||
correct_voicing = ref_voicing * est_voicing | |||
frame_correct = (np.abs(cent_diff - octave)[correct_voicing] < cent_tolerance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(cc @juanjobosch who stared at this with me!)
This almost fixes the problem, but with one annoying subtlety - f0 values in the input can be given as "negative" frequency values - in this case, the difference should be computed over all the reference frames, using the absolute value of the frequency given. It's not very explicit in the definition, but it's meant to be a way to measure pitch/chroma accuracy independent of voicing mistakes:
Raw chroma accuracy: the fraction of voiced frames in ref_cent for which est_cent provides a correct frequency values (within cent_tolerance cents), ignoring octave errors
Consider the case:
ref_t = [1,2,3]
est_t = [1,2,3]
ref_f = [440, 440, 440]
est_f = [-220,-220,-220]
The Raw Chroma Accuracy should be 100%, even though the estimate voicing is [0,0,0]
.
So instead, I propose the following fix that solves the bug but doesn't break the negative frequency use case.
matching_voicing = ref_voicing * (est_cent > 0)
frame_correct = (np.abs(cent_diff - octave)[matching_voicing] < cent_tolerance)
In short, this only count frames as "correct" if the estimate is non-zero (where zero indicates an unvoiced frame where no "best guess" frequency is given) AND within the tolerance.
In fact, this bug exists for raw_pitch_accuracy too... but the "problem" case never comes up because we almost never report frequencies below 40 Hz, so these cases always fall outside of the tolerance threshold. For good measure though, we should make the same fix for raw_pitch_accuracy. (cc @craffel )
See my comment above, but in summary: |
Thanks @rabitt and @juanjobosch for reviewing this and good catch on the neg case (sorry, totally swamped over here with on-boarding). @kukas note @rabitt renamed |
Now it should be fixed, I used @rabitt `s patch and applied it also to raw pitch accuracy. |
@kukas thanks, you'll need to fix the PEP-8 errors before Travis lets you pass though. https://travis-ci.org/craffel/mir_eval/jobs/490483158#L9358 @rabitt mind giving this a look and LGTM? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing pep-8 is complaining about is line-lengths of lines 487 and 568. Otherwise LGTM!
Thanks! |
Fix #311