Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix raw chroma accuracy for unvoiced estimates #312

Merged
merged 3 commits into from Feb 9, 2019
Merged

Conversation

kukas
Copy link
Contributor

@kukas kukas commented Jan 30, 2019

Fix #311

@craffel
Copy link
Owner

craffel commented Jan 30, 2019

Thanks for catching this. Happy to merge after @rabitt or @justinsalamon confirm this is correct behavior.

@justinsalamon
Copy link
Collaborator

Gave it a quick look (it's late in Europe), in principle fix looks correct, but will try to give a second look tomorrow to confirm.

@justinsalamon
Copy link
Collaborator

Didn't have time today, but note that the regression tests fail (due to the change in the chroma metric). I imagine that, assuming the fix is confirmed to be correct, the regression data will have to be updated as part of this PR before it can be merged.

@@ -563,7 +563,8 @@ def raw_chroma_accuracy(ref_voicing, ref_cent, est_voicing, est_cent,
# Raw chroma = same as raw pitch except that octave errors are ignored.
cent_diff = np.abs(ref_cent - est_cent)
octave = 1200*np.floor(cent_diff/1200.0 + 0.5)
frame_correct = (np.abs(cent_diff - octave)[ref_voicing] < cent_tolerance)
correct_voicing = ref_voicing * est_voicing
frame_correct = (np.abs(cent_diff - octave)[correct_voicing] < cent_tolerance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cc @juanjobosch who stared at this with me!)

This almost fixes the problem, but with one annoying subtlety - f0 values in the input can be given as "negative" frequency values - in this case, the difference should be computed over all the reference frames, using the absolute value of the frequency given. It's not very explicit in the definition, but it's meant to be a way to measure pitch/chroma accuracy independent of voicing mistakes:

Raw chroma accuracy: the fraction of voiced frames in ref_cent for which est_cent provides a correct frequency values (within cent_tolerance cents), ignoring octave errors

Consider the case:

ref_t = [1,2,3]
est_t = [1,2,3]
ref_f = [440, 440, 440]
est_f = [-220,-220,-220]

The Raw Chroma Accuracy should be 100%, even though the estimate voicing is [0,0,0].

So instead, I propose the following fix that solves the bug but doesn't break the negative frequency use case.

matching_voicing = ref_voicing * (est_cent > 0)
frame_correct = (np.abs(cent_diff - octave)[matching_voicing] < cent_tolerance)

In short, this only count frames as "correct" if the estimate is non-zero (where zero indicates an unvoiced frame where no "best guess" frequency is given) AND within the tolerance.

In fact, this bug exists for raw_pitch_accuracy too... but the "problem" case never comes up because we almost never report frequencies below 40 Hz, so these cases always fall outside of the tolerance threshold. For good measure though, we should make the same fix for raw_pitch_accuracy. (cc @craffel )

@rabitt
Copy link
Contributor

rabitt commented Feb 4, 2019

See my comment above, but in summary:
(1) the fix is almost correct, but doesn't work correctly when estimates are reported as negative frequencies.
(2) the same bug exists for raw_pitch_accuracy, but in practice, we probably haven't hit any cases where the value of the metric was affected.

@craffel
Copy link
Owner

craffel commented Feb 4, 2019

Good catch, @kukas do you mind implementing @rabitt's version and then we can merge?

@kukas
Copy link
Contributor Author

kukas commented Feb 4, 2019

Good catch, @kukas do you mind implementing @rabitt's version and then we can merge?

Sure, I will fix it on Thursday, if it can wait. Thanks @rabitt for the review!

@justinsalamon
Copy link
Collaborator

Thanks @rabitt and @juanjobosch for reviewing this and good catch on the neg case (sorry, totally swamped over here with on-boarding).

@kukas note @rabitt renamed correct_voicing to matching_voicing, which I think is a semantically more helpful name :)

@kukas
Copy link
Contributor Author

kukas commented Feb 8, 2019

Now it should be fixed, I used @rabitt `s patch and applied it also to raw pitch accuracy.

@craffel
Copy link
Owner

craffel commented Feb 8, 2019

@kukas thanks, you'll need to fix the PEP-8 errors before Travis lets you pass though. https://travis-ci.org/craffel/mir_eval/jobs/490483158#L9358

@rabitt mind giving this a look and LGTM?

Copy link
Contributor

@rabitt rabitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing pep-8 is complaining about is line-lengths of lines 487 and 568. Otherwise LGTM!

@craffel craffel merged commit f41c8da into craffel:master Feb 9, 2019
@craffel
Copy link
Owner

craffel commented Feb 9, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants