added marginal flag to segment.nce #227

bmcfee · 2016-11-01T22:49:21Z

Implements #226

bmcfee · 2017-02-09T16:30:17Z

@craffel do you have any opinions on how tests for this one should look? Some options:

Add a simple unit test that calls with a simple test case that you can check manually for marginal=True. Something like [A, B], [a, b, c, b].
Add a wrapper function vmeasure (or whatever we decide to call it from RFC: fixing the segment.nce metrics? #226) that overrides marginal and calls the nce metrics. Then add full regression test outputs as before. I'm confident enough in the correctness of the implementation that correctness doesn't worry me.

If we do go the regression test route, I think we'd need to add the wrapper function so that the results are kept distinct from the existing NCE metrics.

What do you think?

craffel · 2017-02-09T16:39:38Z

Regression would be good, both would be good too. Are you planning on making it so that the evaluate method calls the function both with marginal True and False? (Sorry, may have to jog my memory on this a bit)

bmcfee · 2017-02-09T16:51:36Z

Are you planning on making it so that the evaluate method calls the function both with marginal True and False?

Your call. If we do, I'd advocate doing it via a wrapper/rename, instead of having two separate output dict entries. It will make it easier to keep things separate down the road.

bmcfee · 2017-02-09T16:57:19Z

Added bonus of the wrapper solution: we can encourage people to deprecate nce in favor of v-measure, and all they have to do is ignore columns (rather than change defaults).

craffel · 2017-02-09T18:12:45Z

If the goal is to eventually deprecate marginal=False, then I agree that it doesn't make sense to compute both in evaluate. But, then you need to decide whether evaluate will call it with marginal=True or marginal=False.

Regarding the test, whichever is the non-default behavior (i.e. whichever is NOT called in evaluate) is OK to just have a unit test. The other should be regression tested, of course.

Does that seem reasonable?

bmcfee · 2017-02-09T18:16:09Z

If the goal is to eventually deprecate marginal=False, then I agree that it doesn't make sense to compute both in evaluate.

I guess it's a matter of whether we deprecate marginal=False and still call it NCE, or deprecate NCE entirely in favor of v-measure. I prefer the latter, since it would reduce possibility for mis-interpretation.

craffel · 2017-02-09T18:18:07Z

I guess it's a matter of whether we deprecate marginal=False and still call it NCE, or deprecate NCE entirely in favor of v-measure. I prefer the latter, since it would reduce possibility for mis-interpretation.

When do you propose we do the deprecation? What was the community consensus?

bmcfee · 2017-02-09T18:21:11Z

When do you propose we do the deprecation? What was the community concensus?

Predictably, there was no obvious consensus. I think the best thing to do is implement it now, then pitch it in the unconference at the next ismir so that people have to actually respond.

craffel · 2017-02-09T18:26:10Z

SG. I would advocate for a unit test for v-measure and regression test for the current behavior, and then switching to regression for v-measure and unit for current behavior once it's deprecated.

bmcfee · 2017-02-09T18:29:09Z

I would advocate for a unit test for v-measure and regression test for the current behavior, and then switching to regression for v-measure and unit for current behavior once it's deprecated.

Cool. OTOH, we could also just regression-test both if it's going to become another entry in the evaluate output. I only propose this because it will be the minimal modification to the existing test structure, and LIS, the change is simple enough that I feel confident in the correctness without a specific unit test.

craffel · 2017-02-09T18:30:24Z

OTOH, we could also just regression-test both if it's going to become another entry in the evaluate output

I thought we just decided it wasn't going to become another output of evaluate.

bmcfee · 2017-02-09T18:47:01Z

I thought we just decided it wasn't going to become another output of evaluate.

Hah, I thought we just decided the opposite? Let me be totally clear about what I think makes the most sense:

Add a new function vmeasure that's equivalent to nce(..., margin=True). This way, we don't cause confusion between the new and old style.
Add vmeasure to the evaluate outputs. Old-style NCE is still included.
Add regression tests for the extended dictionary.
Eventually, deprecate nce and keep vmeasure. (Or, if not deprecate in the library, encourage people not to use it.)

EDIT: forgot to address the confusing point. The above would be in contrast to doing something like:

{'NCE Over': ..., 'NCE Over (marginal)': ..., ...}

where the marginal=True case is added as an alternate version of NCE. I think that would only be confusing.

craffel · 2017-02-09T21:28:17Z

Ok, sounds fine. For evaluate, I assume you will just make a new key in the output dict which is v_measure or something.

craffel · 2017-02-27T17:54:24Z

What is left on this other than wait for the other tests to stop failing?

bmcfee · 2017-02-27T18:10:12Z

What is left on this other than wait for the other tests to stop failing?

Well, I'd need to actually implement the separate wrapper function and top-level evaluate hooks. I've been holding off until separation rights itself.

bmcfee · 2017-02-28T21:13:31Z

Okay, this one's up to speed. I added vmeasure as a function, hooked it into evaluate, and updated the tests and fixtures.

I noticed that the reference section is missing in the segmentation docstring. Do you want me to add one while we're at it?

bmcfee · 2017-02-28T22:02:56Z

@craffel I had to move the scipy version requirement here because it turns out scipy.stats.entropy did not exist prior to 0.14. At this point, that release is almost 3 years old, so I doubt this is a huge deal.

craffel

Requesting some documentation fixes.

craffel · 2017-02-28T22:56:37Z

mir_eval/segment.py

+    marginal : bool
+        If `False`, normalize conditional entropy by uniform entropy.
+        If `True`, normalize conditional entropy by the marginal entropy.
+


Add (Default value = False) for consistency within this docstring.

craffel · 2017-02-28T22:57:52Z

mir_eval/segment.py

+    true_given_est = p_est.dot(scipy.stats.entropy(contingency, base=2))
+    pred_given_ref = p_ref.dot(scipy.stats.entropy(contingency.T, base=2))
+
+    if marginal:


Add comment to the effect of # Normalize conditional entropy by the marginal entropy

craffel · 2017-03-01T00:29:04Z

mir_eval/segment.py

+    marginal : bool
+        If `False`, normalize conditional entropy by uniform entropy.
+        If `True`, normalize conditional entropy by the marginal entropy.
+
    Returns
    -------
    S_over


Should we change the math in this part of the docstring to note that it can be different according to the value of marginal? For example denominator should be H(y_ref) right? Or at least, we should change them in the v_measure docstring right?

craffel · 2017-03-01T00:29:38Z

I noticed that the reference section is missing in the segmentation docstring. Do you want me to add one while we're at it?

Sure!

craffel · 2017-03-01T00:29:47Z

@craffel I had to move the scipy version requirement here because it turns out scipy.stats.entropy did not exist prior to 0.14. At this point, that release is almost 3 years old, so I doubt this is a huge deal.

No problem.

bmcfee · 2017-03-01T16:41:36Z

I think I've hit all the review points. @craffel last pass?

craffel · 2017-03-01T17:50:05Z

Last thing to check - did you rebuild the docs and make sure there are no new errors/glitches now that the refs are there?

bmcfee · 2017-03-01T17:51:49Z

I have built the docs locally and didn't notice anything bad, but I wasn't looking too carefully.

craffel · 2017-03-01T17:59:17Z

I have built the docs locally and didn't notice anything bad, but I wasn't looking too carefully.

Look carefully at the changes caused by this PR and I will be happy :)

bmcfee · 2017-03-01T18:34:47Z

Look carefully at the changes caused by this PR and I will be happy :)

Done. Fixed a couple of minor formatting issues, but I think it's solid.

craffel · 2017-03-03T18:48:05Z

Merged, thanks!

added marginal flag to segment.nce

07646c1

bmcfee force-pushed the segment-nce-marginal-entropy branch from d6ef7bd to 07646c1 Compare February 24, 2017 18:51

added v-measure to segmentation module

9a77c73

bmcfee added the enhancement label Feb 28, 2017

bmcfee requested a review from craffel February 28, 2017 21:13

bmcfee added this to the 0.5 milestone Feb 28, 2017

fixed scipy minimum version dependency for entropy calculation

fce5e3e

craffel requested changes Mar 1, 2017

View reviewed changes

bmcfee added 2 commits February 28, 2017 19:37

docstring corrections for vmeasure and nce

a9d652b

added references section to segmentation docstring

db4e1f0

craffel approved these changes Mar 1, 2017

View reviewed changes

fixed some docstring formatting

dae4dd5

craffel merged commit a9c4d60 into craffel:master Mar 3, 2017

craffel mentioned this pull request Mar 3, 2017

RFC: fixing the segment.nce metrics? #226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added marginal flag to segment.nce #227

added marginal flag to segment.nce #227

bmcfee commented Nov 1, 2016

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017 •

edited

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017 •

edited

craffel commented Feb 9, 2017

craffel commented Feb 27, 2017

bmcfee commented Feb 27, 2017

bmcfee commented Feb 28, 2017

bmcfee commented Feb 28, 2017

craffel left a comment

craffel Feb 28, 2017

craffel Feb 28, 2017

bmcfee Mar 1, 2017

craffel Mar 1, 2017

craffel commented Mar 1, 2017

craffel commented Mar 1, 2017

bmcfee commented Mar 1, 2017

craffel commented Mar 1, 2017

bmcfee commented Mar 1, 2017

craffel commented Mar 1, 2017

bmcfee commented Mar 1, 2017

craffel commented Mar 3, 2017

added marginal flag to segment.nce #227

added marginal flag to segment.nce #227

Conversation

bmcfee commented Nov 1, 2016

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017 • edited

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017

craffel commented Feb 9, 2017

bmcfee commented Feb 9, 2017 • edited

craffel commented Feb 9, 2017

craffel commented Feb 27, 2017

bmcfee commented Feb 27, 2017

bmcfee commented Feb 28, 2017

bmcfee commented Feb 28, 2017

craffel left a comment

Choose a reason for hiding this comment

craffel Feb 28, 2017

Choose a reason for hiding this comment

craffel Feb 28, 2017

Choose a reason for hiding this comment

bmcfee Mar 1, 2017

Choose a reason for hiding this comment

craffel Mar 1, 2017

Choose a reason for hiding this comment

craffel commented Mar 1, 2017

craffel commented Mar 1, 2017

bmcfee commented Mar 1, 2017

craffel commented Mar 1, 2017

bmcfee commented Mar 1, 2017

craffel commented Mar 1, 2017

bmcfee commented Mar 1, 2017

craffel commented Mar 3, 2017

craffel commented Feb 9, 2017 •

edited

bmcfee commented Feb 9, 2017 •

edited