Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove METRICS dictionary #54

Closed
craffel opened this issue Apr 25, 2014 · 6 comments
Closed

Remove METRICS dictionary #54

craffel opened this issue Apr 25, 2014 · 6 comments
Assignees
Milestone

Comments

@craffel
Copy link
Collaborator

craffel commented Apr 25, 2014

tl;dr - the METRICS dictionary doesn't really serve the purpose I want it to, and the evaluators do, so it shouldn't exist.

In some cases, metrics within the same task might have a different number of arguments. This used to be the case in segment before it got split into boundary and structure. It's currently true in melody; voicing_measures just takes in voicing arrays while the rest take in voicing and frequency arrays (even though raw_pitch_accuracy and raw_chroma_accuracy don't actually use their est_voicing arg). If all metrics in each task submodule took the same arguments, you could run

for name, metric in mir_eval.task_submodule.METRICS.items():
    print name, metric(ref, est)

or whatever the call signature is, which would be convenient. But, we don't expect this to be the case in general. This was kind of the main reason for the METRICS dictionary to exist, and in practice the evaluators submodule should essentially subsume this functionality.

@craffel craffel added this to the 0.0.2 milestone Apr 25, 2014
@craffel craffel modified the milestones: 0.1, 0.0.2 Jul 3, 2014
@craffel craffel changed the title Handling metrics with different call structures Remove METRIC dictionary Jul 21, 2014
@craffel craffel changed the title Remove METRIC dictionary Remove METRICS dictionary Jul 21, 2014
craffel added a commit that referenced this issue Jul 22, 2014
@craffel
Copy link
Collaborator Author

craffel commented Aug 19, 2014

Hey all, could I get a little consensus here? Especially those of us who are currently using mir_eval, @ejhumphrey @mattmcvicar @bmcfee @justinsalamon @urinieto The proposed replacement for the METRICS dictionary are the evaluators, specifically each evaluator's evaluate function. I'm finding it would be useful to have this function accessible outside of the evaluators, so that they could be used by the tests and by end-users. So, what do you think of putting an evaluate function in each task submodule instead, which takes in reference and estimated annotations (not filenames!), pre-processes them as necessary, and returns a dictionary with nice-sounding keys with the scores for each metric? Then, the evaluators would just load in the data and call this function.

@urinieto
Copy link
Collaborator

I would love to have something like that. I have used mir_eval in three
different projects now, and I always end up thinking that some sort of
behavior like the one you described would be pretty useful.

However, it might be a bit tricky to tell the exact parameters for all the
metrics (e.g. I like to have both the trimmed and non-trimmed versions of
the boundaries eval for both the 0.5 and 3 second windows). Maybe this
function should return a dictionary with all the metrics with the default
params only (i.e. not accepting any parameters other than the reference and
estimation).

On Tue, Aug 19, 2014 at 3:59 PM, craffel notifications@github.com wrote:

Hey all, could I get a little consensus here? Especially those of us who
are currently using mir_eval, @ejhumphrey https://github.com/ejhumphrey
@mattmcvicar https://github.com/mattmcvicar @bmcfee
https://github.com/bmcfee @justinsalamon
https://github.com/justinsalamon @urinieto https://github.com/urinieto
The proposed replacement for the METRICS dictionary are the evaluators,
specifically each evaluator's evaluate function. I'm finding it would be
useful to have this function accessible outside of the evaluators, so that
they could be used by the tests and by end-users. So, what do you think of
putting an evaluate function in each task submodule instead, which takes
in reference and estimated annotations (not filenames!), pre-processes them
as necessary, and returns a dictionary with nice-sounding keys with the
scores for each metric? Then, the evaluators would just load in the data
and call this function.


Reply to this email directly or view it on GitHub
#54 (comment).

@craffel
Copy link
Collaborator Author

craffel commented Aug 19, 2014

So the evaluators currently just implement the standard set of metrics, taking in any parameters that might warrant changing. Take a look at the segment evaluator's evaluate function:
https://github.com/craffel/mir_eval/blob/43aa35903edea4249281e2fd49b57da9ec43ce0a/evaluators/segment_eval.py#L21
It takes in the trim parameter. The proposed submodule evaluate functions would work in the same way.

@justinsalamon
Copy link
Collaborator

In my use of mir_eval I have also found myself needing the functionality of the evaluate function outside of the evaluator... so I too think this would be a good move.

@craffel
Copy link
Collaborator Author

craffel commented Aug 26, 2014

OK, created an issue for this #70.

craffel added a commit that referenced this issue Aug 26, 2014
…ome minor docstring changes and kwarg signature
@craffel
Copy link
Collaborator Author

craffel commented Aug 26, 2014

Removing the METRICS dictionary will require refactoring the test scripts, as covered by #48 and #61

@craffel craffel self-assigned this Sep 4, 2014
craffel added a commit that referenced this issue Sep 5, 2014
@craffel craffel mentioned this issue Sep 5, 2014
7 tasks
craffel added a commit that referenced this issue Oct 9, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants