[HIVEMALL-132] Generalize f1score UDAF to support any Beta value #107

nzw0301 · 2017-08-02T06:33:40Z

What changes were proposed in this pull request?

Make f1 function more general fmeasure function for any positive beta value.

What type of PR is it?

Improvement

What is the Jira issue?

HIVEMALL-132

How was this patch tested?

Add FMeasureUDAFTest

Checklist

(Please remove this section if not needed; check x for YES, blank for NO)

Did you apply source code formatter, i.e., mvn formatter:format, for your commit?

myui · 2017-08-02T06:47:59Z

resources/ddl/define-all.hive

-drop temporary function if exists f1score;
-create temporary function f1score as 'hivemall.evaluation.FMeasureUDAF';
+drop temporary function if exists fmeasure;
+create temporary function fmeasure as 'hivemall.evaluation.FMeasureUDAF';


Could you remain alias for f1score in DDLs for backward compatibility.

-- alias for backward compatibility drop temporary function if exists f1score; create temporary function f1score as 'hivemall.evaluation.FMeasureUDAF'; drop temporary function if exists fmeasure; ...

myui · 2017-08-02T07:06:01Z

@nzw Could you update user guide to include the usage of fmeasure and f1score in incubator-hivemall/docs/gitbook/eval/classification_measures.md ?

npm install gitbook-cli; gitbook install; gitbook serve on docs/gitbook .

Also, could you revise the current Evaluation section of https://treasure-data.gyazo.com/5ec4b737dcedd55353f8126040ea5366 to

• Binary Classification metrics
  • Area Under the ROC Curve
• Regression metrics
• Ranking metrics

Refer examples in
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
https://turi.com/learn/userguide/evaluation/classification.html#f_scores

myui · 2017-08-02T07:15:34Z

Also, some other DDLs also needed to be updated. Please grep tree_export to know which DDLs to update.

myui · 2017-08-03T06:53:30Z

@nzw0301 Could you add test for binary (and for multi-label measure)?

- Update checking binary input

- Add UnitTests for binary case and multilabel case - Add validation for binary inputs value - Update DDL for fmeasure function

nzw0301 · 2017-08-04T05:59:04Z

@myui Update this PR

Add UnitTests for binary and multi-label inputs
Update DDL's files: Add fmeasure alias

I will update the documentation related to this PR later.

myui · 2017-08-04T06:45:01Z

@nzw0301 when documentation is updated, could you remove [WIP] from PR title?

I'll review and merge then.

nzw0301 · 2017-08-04T06:50:23Z

@myui Thanks, sure.

myui · 2017-08-07T01:30:19Z

docs/gitbook/eval/multilabel_classification_measures.md

+  select array("dog")                as actual, array("dog", "bird") as predicted
+)
+select
+  f1score(actual, predicted)


Could you change the optional third argument to take const string options ?

-beta 1.0 (default)
-average [micro (default), macro]
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score

f1score(actual, predicted) equals to fmeasure(actual, predicted, '-beta 1.0 -average micro') .

See UDFWithOptions and it's usage.

Thank you for your review.
OK, I will update arguments.

myui · 2017-08-07T01:31:25Z

docs/gitbook/eval/multilabel_classification_measures.md

+  select array("dog")                as actual, array("dog", "bird") as predicted
+)
+select
+  fmeasure(actual, predicted, 2)


fmeasure(actual, predicted, '-beta 2.0 -average macro')

- Update document - Create new class: `UDAFEvaluatorWithOptions.java`

nzw0301 · 2017-08-07T08:15:49Z

@myui I update this PR. Could you review this PR code?

myui · 2017-08-07T08:42:54Z

Sure.

@takuti Could you help reviewing this PR?

takuti · 2017-08-21T01:39:35Z

@myui Oh, sorry. I've noticed this just now. Sure, I'm going to review.

takuti

@nzw0301 Reviewed. Sorry for late response.

Most importantly, it's hard to understand the difference in the -average option from your document and code. Updating them with more precise description and comments would be better :)

takuti · 2017-08-21T01:48:30Z

core/src/main/java/hivemall/UDAFEvaluatorWithOptions.java

@@ -0,0 +1,97 @@
+package hivemall;


We need to insert the LICENSE header by ./bin/format_header.sh

takuti · 2017-08-21T01:48:51Z

core/src/test/java/hivemall/evaluation/FMeasureUDAFTest.java

@@ -0,0 +1,355 @@
+package hivemall.evaluation;


We need to insert the LICENSE header by ./bin/format_header.sh

takuti · 2017-08-21T02:08:52Z

core/src/main/java/hivemall/UDAFEvaluatorWithOptions.java

+        }
+    }
+
+    protected static void setCounterValue(@Nullable Counters.Counter counter, long value) {


Since org.apache.hadoop.mapred.Counters is only used for pointing to org.apache.hadoop.mapred.Counters.Counter, you can directly import org.apache.hadoop.mapred.Counters.Counter like UDTFWithOptions.

takuti · 2017-08-21T02:12:43Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

+            fieldOIs.add(PrimitiveObjectInspectorFactory.writableDoubleObjectInspector);
+            fieldNames.add("average");
+            fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
+


Let you remove this duplicated blank line

takuti · 2017-08-21T02:22:45Z

core/src/test/java/hivemall/evaluation/FMeasureUDAFTest.java

+
+        agg.get();
+    }
+


Let you remove this duplicated blank line

takuti · 2017-08-21T06:39:08Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

            } else {
-                return -1d;
+                return -1.d;


When divisor is zero, returning 0.d is general in a context of f-measure. You can refer to scikit-learn's zero division handling here.

I agree with you. This value was based on previous f1score code.
I'll change this value to 0.d when divisor is zero.
And thank you for sharing good link.

takuti · 2017-08-21T06:57:22Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

+        }
+
+        double get() {
+            double squareBeta = Math.pow(beta, 2.d);


It's fine, but let you keep in mind that, in a context of numerical computation, avoiding external method dependency for such very simple procedure is normally good habit. That it, if what you want to do is very very simple, you can implement the procedure by yourself in a succinct way as beta * beta.

The reason here is that, external method potentially executes unexpected code for various reasons such as optimization and error handling, and it might worsen performance. Of course, you can use external method if you've understood its internal code and evaluated the code is reliable & efficient enough.

takuti · 2017-08-21T07:06:06Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

+                double squareBeta) {
+            long lp = totalPredicted - tp;
+
+            if (lp < 0) {


Is this kind of situation (totalPredicted < tp and totalActual < tp) possible? I imagine that TP is always a subset of predicted/actual labels.

No, always total* - tp is non negative. I remove this if statement. Thanks.

takuti · 2017-08-21T07:40:22Z

docs/gitbook/eval/multilabel_classification_measures.md

+$$
+\mathrm{F}_{\beta} = (1+\beta^2) \frac
+{\sum_i |l_i \cap p_i |}
+{ \beta^2 (\sum_i |l_i \cap p_i | + \sum_i |p_i - l_i |) + \sum_i |l_i \cap p_i | + \sum_i |l_i - p_i |}


p_i - l_i and l_i - p_i look opposite. Could you double-check the equation?

Assume that p_i = [1, 2, 3] and l_i = [1, 4]:

p_i - l_i = [1, 2, 3] - [1, 4] = [2, 3]

= predicted labels which are NOT in actual labels

= FP

l_i - p_i = [1, 4] - [1, 2, 3] = [4]

= actual labels which are NOT predicted

= FN

Since the f-measure should be computed by (1 + beta^2) * TP / (beta^2 * (TP + FN) + TP + FP), I guess p_i - l_i and l_i - p_i are opposite in this equation.

Meanwhile, your code FMeasureAggregationBuffer.denom()seems to be implemented correctly.

Thanks! it is wrong equation. (Since my test code is wrong order.)
I also changed FMeasureAggregationBuffer.denom() to understand easily.

takuti · 2017-08-21T08:11:00Z

core/src/test/java/hivemall/evaluation/FMeasureUDAFTest.java

+
+        evaluator.iterate(agg, new Object[] {actual, predicted});
+
+        Assert.assertEquals(0.5714285714285715d, agg.get(), 1e-5);


Could you write how did you get the expected result? (Spark similarly to testMultiLabelF1MultiSamples?)

nzw0301 · 2017-08-21T08:46:13Z

@takuti Thank you for your useful review.
I will fix this PR based on your comment.

- fix typos - replace Jave exception to Hive exception - style fixes

myui · 2017-08-24T15:17:28Z

resources/ddl/define-all.spark

@@ -530,6 +530,9 @@ sqlContext.sql("CREATE TEMPORARY FUNCTION lr_datagen AS 'hivemall.dataset.Logist
 sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS f1score")
 sqlContext.sql("CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.FMeasureUDAF'")


It is better to copy the old FMeasureUDAF.java as F1ScoreUDAF.java for a backward compatibility.

Then, CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.F1ScoreUDAF'.

- fix typo in F1ScoreUDAF.java - fix alias for f1score in define-all.spark

myui · 2017-08-25T02:56:57Z

@nzw0301 grep f1score in resources/ddl. It's not only for define-all.spark.

- Update documents - Fix text for spark example - Refactor FMeasureUDAF.java

nzw0301 · 2017-08-28T05:38:41Z

@takuti @myui Thank you for your kind comments.
I completed update based on reviews.

takuti

@nzw0301 Commented, and I found a crucial problem in the F1ScoreUDAF implementation. Could you check the points?

In order to check if fmeasure works correctly on larger-scale data, I've tried the following query on the MovieLens 1M data, and it has no problem:

with data as (
  select if(rating > 3, 1, 0) as truth, 0 as predicted from ratings
)
select fmeasure(truth, predicted)
from data
;

0.4248392086054015

At the same time, once you fixed the F1ScoreUDAF bug, f1score returns the same result as fmeasure:

with data as (
  select if(rating > 3, 1, 0) as truth, 0 as predicted from ratings
)
select f1score(array(truth), array(predicted))
from data
;

0.42483920860540153

takuti · 2017-08-28T06:09:12Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

+
+        if (typeInfo[0] != typeInfo[1]) {
+            throw new UDFArgumentTypeException(1, "The first argument's `actual` type is "
+                    + typeInfo[0] + ", but the second argument `predicated`'s type is not match: "


Typo: predicated => predicted

takuti · 2017-08-28T06:32:09Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

+                || HiveUtils.isBooleanTypeInfo(typeInfo[1]);
+        if (!isArg2ListOrIntOrBoolean) {
+            throw new UDFArgumentTypeException(1,
+                "The second argument `array/int/boolean actual` is invalid form: " + typeInfo[1]);


Typo: actual => predicted

takuti · 2017-08-28T06:32:31Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

-                this.totalAcutal += numActual;
-                this.totalPredicted += numPredicted;
+
+                average = cl.getOptionValue("average", "micro");


You can write here as: average = cl.getOptionValue("average", average);

takuti · 2017-08-28T06:34:36Z

core/src/test/java/hivemall/evaluation/FMeasureUDAFTest.java

+
+        evaluator.iterate(agg, new Object[] {actual, predicted});
+
+        // TODO: describe the way to get this expected value by spark


If you generated the expected value by using Spark, you can complete this TODO by just writing:

// should equal to spark's micro f1 measure result // https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#multilabel-classification

as testMultiLabelF1MultiSamples()

Oh, I forget to update this line. Thanks.

takuti · 2017-08-28T06:40:55Z

docs/gitbook/eval/binary_classification_measures.md

+
+### Micro average
+
+If `micro` is passed to `average`, 


takuti · 2017-08-28T07:00:37Z

core/src/main/java/hivemall/evaluation/F1ScoreUDAF.java

+                this.totalPredicted += numPredicted;
+            }
+
+            void merge(PartialResult other) {


Oh, there is a bug here! 🐛

Correct:

void merge(PartialResult other) { this.tp += other.tp; this.totalActual += other.totalActual; this.totalPredicted += other.totalPredicted; }

oops. This should be fixed.

takuti · 2017-08-28T07:08:14Z

docs/gitbook/eval/binary_classification_measures.md

+
+-- 0.5;
+```
+


How about writing the difference between f1score and fmeasure here? It could be helpful to understand the concept of -average micro and replace the old f1score with fmeasure. For instance:

It should be noted that, since the old f1score(truth, predicted) function simply counts the number of "matched" elements between truth and predicted, the above query is equivalent to:

WITH data as ( select 1 as truth, 0 as predicted union all select 0 as truth, 1 as predicted union all select 0 as truth, 0 as predicted union all select 1 as truth, 1 as predicted union all select 0 as truth, 1 as predicted union all select 0 as truth, 0 as predicted ) select f1score(array(truth), array(predicted)) from data ;

Sounds good! thanks

- Update docs - Fix typo in FmeasureUDAF

nzw0301 · 2017-08-28T08:46:18Z

@takuti Thank you for comments!
I run query above, I get the same result on movielens-1M.

nzw0301 · 2017-08-28T11:58:38Z

I will check whether the return value is the same tomorrow.

takuti · 2017-08-28T12:27:07Z

Importantly, in case that the number of mappers is 1, fixing the bug in merge() does not change the output value; you might see the same result 0.42483920860540153 even if the bug exists.

Alternatively, let you test the following query (6 mappers will be shown for each select statement), and check if its output is same as fmeasure(truth, predicted, '-average micro'):

WITH data as (
  select 1 as truth, 0 as predicted
union all
  select 0 as truth, 1 as predicted
union all
  select 0 as truth, 0 as predicted
union all
  select 1 as truth, 1 as predicted
union all
  select 0 as truth, 1 as predicted
union all
  select 0 as truth, 0 as predicted
)
select
  f1score(array(truth), array(predicted))
from data
;

If the bug has been fixed correctly, output should be same as fmeasure(truth, predicted, '-average micro) = 0.5, while the buggy code returns f1score(array(truth), array(predicted)) = 1.0

nzw0301 · 2017-08-29T04:04:02Z

I tested @takuti's query. The buggy code (previous code) returns 1.0.
On the other hand, the fixed code return correct value: 0.5.

nzw0301 · 2017-08-29T04:11:03Z

But I found another issue for f1score and fmeasure.

Both function cannot work on EMR v5.8.0

hive> select f1score(array(1), array(1));
FAILED: IllegalArgumentException Size requested for unknown type: org.apache.hadoop.hive.ql.exec.UDAFEvaluator

hive> select fmeasure(array(1), array(1));
FAILED: IllegalArgumentException Size requested for unknown type: java.lang.String

However, they can work on EMR v5.0.0.
I don't know why the failures occur on newer EMR.

myui · 2017-08-29T05:00:22Z

core/src/main/java/hivemall/evaluation/FMeasureUDAF.java

+        }
+    }
+
+    public static class FMeasureAggregationBuffer extends


estimate is required for estimating resulting size for AbstractAggregationBuffer.

Refer
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java#L195

nzw0301 · 2017-08-29T07:09:58Z

The issue above is avoided by creating table:

create table data as (
  select 1 as truth, 1 as predicted
);

// ok
select
  fmeasure(array(truth), array(predicted))
from data
;

// ok
select
  f1score(array(truth), array(predicted))
from data
;

myui · 2017-08-29T07:35:54Z

It's Hive v2.2.0 bug.

Filed a ticket: https://issues.apache.org/jira/browse/HIVE-17406

takuti

@nzw0301 I guess you've finished everything you need. LGTM

takuti · 2017-09-01T09:05:20Z

@myui Would you double-check this? I can merge whenever you are ready.

myui · 2017-09-02T09:59:11Z

Let me see.

myui · 2017-09-13T13:08:49Z

docs/gitbook/eval/auc.md

@@ -100,7 +100,7 @@ Note that `floor(prob / 0.2)` means that the rows are distributed to 5 bins for

 # Difference between AUC and Logarithmic Loss

-Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs. 
+Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs.


Missing link. stat_eval.html is deleted.

myui · 2017-09-13T13:40:39Z

@nzw0301 I'll fix and merge it. No need to update this PR.

myui · 2017-09-13T14:02:28Z

@nzw0301 LGTM 👍 Merged. Well done! (thank you for your review @takuti )

nzw0301 · 2017-09-13T14:26:07Z

Thank you for your review and support @takuti @myui

TD User and others added 2 commits August 2, 2017 15:21

[HIVEMALL-132] Generalize f1score UDAF to support any Beta value

eafd3a7

Apply source code formatting

4ece34d

myui reviewed Aug 2, 2017

View reviewed changes

Revert to the previous alias name of fmeasure

2ac1a18

Improve document

ad59eb0

nzw0301 changed the title ~~[HIVEMALL-132] Generalize f1score UDAF to support any Beta value~~ [WIP][HIVEMALL-132] Generalize f1score UDAF to support any Beta value Aug 2, 2017

Add binary case

5ac2be1

nzw0301 added 4 commits August 3, 2017 18:19

- Add tests for binary input

2d9a258

- Update checking binary input

Update document partially

91d40dd

Update FmeasureUDAF

def44ac

- Add UnitTests for binary case and multilabel case - Add validation for binary inputs value - Update DDL for fmeasure function

Apply source code formatting

04977dd

nzw0301 added 2 commits August 4, 2017 18:25

Fix bug for micro average

8e0504d

Update docs partially

916f544

myui suggested changes Aug 7, 2017

View reviewed changes

- Support binary and micro fmeasure

85f0015

- Update document - Create new class: `UDAFEvaluatorWithOptions.java`

nzw0301 changed the title ~~[WIP][HIVEMALL-132] Generalize f1score UDAF to support any Beta value~~ [HIVEMALL-132] Generalize f1score UDAF to support any Beta value Aug 7, 2017

takuti suggested changes Aug 21, 2017

View reviewed changes

nzw0301 added 3 commits August 22, 2017 20:15

Comments reflect

bfb6c52

Comments reflect partially

b18441e

- fix typos - replace Jave exception to Hive exception - style fixes

Remove redundant else statement

53685a4

myui suggested changes Aug 24, 2017

View reviewed changes

Restore F1ScoreUDAF.java for backward compatibility

bef3624

- fix typo in F1ScoreUDAF.java - fix alias for f1score in define-all.spark

nzw0301 added 3 commits August 25, 2017 19:04

Fix f1score alias for backward compatibility

0fa50f2

Apply code formatting

b4ee2db

Comments reflect

2ea1b3a

- Update documents - Fix text for spark example - Refactor FMeasureUDAF.java

Fix typo

06dac2a

takuti suggested changes Aug 28, 2017

View reviewed changes

- Fix bug in f1score

d8c7035

- Update docs - Fix typo in FmeasureUDAF

myui suggested changes Aug 29, 2017

View reviewed changes

Add estimate function

160ba47

Update document for complex type

c04214c

takuti approved these changes Sep 1, 2017

View reviewed changes

myui reviewed Sep 13, 2017

View reviewed changes

asfgit closed this in 098a7f3 Sep 13, 2017


		evaluator.iterate(agg, new Object[] {actual, predicted});

		Assert.assertEquals(0.5714285714285715d, agg.get(), 1e-5);

		@@ -530,6 +530,9 @@ sqlContext.sql("CREATE TEMPORARY FUNCTION lr_datagen AS 'hivemall.dataset.Logist
		sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS f1score")
		sqlContext.sql("CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.FMeasureUDAF'")


		evaluator.iterate(agg, new Object[] {actual, predicted});

		// TODO: describe the way to get this expected value by spark


		-- 0.5;
		```

[HIVEMALL-132] Generalize f1score UDAF to support any Beta value #107

[HIVEMALL-132] Generalize f1score UDAF to support any Beta value #107

Conversation

nzw0301 commented Aug 2, 2017

What changes were proposed in this pull request?

What type of PR is it?

What is the Jira issue?

How was this patch tested?

Checklist

Choose a reason for hiding this comment

myui commented Aug 2, 2017 • edited Loading

myui commented Aug 2, 2017

myui commented Aug 3, 2017

nzw0301 commented Aug 4, 2017 • edited Loading

myui commented Aug 4, 2017

nzw0301 commented Aug 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nzw0301 commented Aug 7, 2017

myui commented Aug 7, 2017

takuti commented Aug 21, 2017

takuti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nzw0301 commented Aug 21, 2017

Choose a reason for hiding this comment

myui commented Aug 25, 2017

nzw0301 commented Aug 28, 2017

takuti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nzw0301 commented Aug 28, 2017

nzw0301 commented Aug 28, 2017

takuti commented Aug 28, 2017

nzw0301 commented Aug 29, 2017 • edited Loading

nzw0301 commented Aug 29, 2017

Choose a reason for hiding this comment

nzw0301 commented Aug 29, 2017

myui commented Aug 29, 2017

takuti left a comment

Choose a reason for hiding this comment

takuti commented Sep 1, 2017

myui commented Sep 2, 2017

Choose a reason for hiding this comment

myui commented Sep 13, 2017

myui commented Sep 13, 2017

nzw0301 commented Sep 13, 2017

myui commented Aug 2, 2017 •

edited

Loading

nzw0301 commented Aug 4, 2017 •

edited

Loading

nzw0301 commented Aug 29, 2017 •

edited

Loading