Skip to content
This repository has been archived by the owner on Sep 20, 2022. It is now read-only.

[HIVEMALL-132] Generalize f1score UDAF to support any Beta value #107

Closed
wants to merge 23 commits into from

Conversation

nzw0301
Copy link
Member

@nzw0301 nzw0301 commented Aug 2, 2017

What changes were proposed in this pull request?

Make f1 function more general fmeasure function for any positive beta value.

What type of PR is it?

Improvement

What is the Jira issue?

HIVEMALL-132

How was this patch tested?

Add FMeasureUDAFTest

Checklist

(Please remove this section if not needed; check x for YES, blank for NO)

  • Did you apply source code formatter, i.e., mvn formatter:format, for your commit?

drop temporary function if exists f1score;
create temporary function f1score as 'hivemall.evaluation.FMeasureUDAF';
drop temporary function if exists fmeasure;
create temporary function fmeasure as 'hivemall.evaluation.FMeasureUDAF';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remain alias for f1score in DDLs for backward compatibility.

-- alias for backward compatibility
drop temporary function if exists f1score;
create temporary function f1score as 'hivemall.evaluation.FMeasureUDAF';

drop temporary function if exists fmeasure;
...

@myui
Copy link
Member

myui commented Aug 2, 2017

@nzw Could you update user guide to include the usage of fmeasure and f1score in incubator-hivemall/docs/gitbook/eval/classification_measures.md ?

npm install gitbook-cli; gitbook install; gitbook serve on docs/gitbook .

Also, could you revise the current Evaluation section of https://treasure-data.gyazo.com/5ec4b737dcedd55353f8126040ea5366 to

• Binary Classification metrics
  • Area Under the ROC Curve
• Regression metrics
• Ranking metrics

Refer examples in
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
https://turi.com/learn/userguide/evaluation/classification.html#f_scores

@myui
Copy link
Member

myui commented Aug 2, 2017

Also, some other DDLs also needed to be updated. Please grep tree_export to know which DDLs to update.

@nzw0301 nzw0301 changed the title [HIVEMALL-132] Generalize f1score UDAF to support any Beta value [WIP][HIVEMALL-132] Generalize f1score UDAF to support any Beta value Aug 2, 2017
@myui
Copy link
Member

myui commented Aug 3, 2017

@nzw0301 Could you add test for binary (and for multi-label measure)?

- Update checking binary input
- Add UnitTests for binary case and multilabel case
- Add validation for binary inputs value
- Update DDL for fmeasure function
@nzw0301
Copy link
Member Author

nzw0301 commented Aug 4, 2017

@myui Update this PR

  • Add UnitTests for binary and multi-label inputs
  • Update DDL's files: Add fmeasure alias

I will update the documentation related to this PR later.

@myui
Copy link
Member

myui commented Aug 4, 2017

@nzw0301 when documentation is updated, could you remove [WIP] from PR title?

I'll review and merge then.

@nzw0301
Copy link
Member Author

nzw0301 commented Aug 4, 2017

@myui Thanks, sure.

select array("dog") as actual, array("dog", "bird") as predicted
)
select
f1score(actual, predicted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the optional third argument to take const string options ?

-beta 1.0 (default)
-average [micro (default), macro]
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score

f1score(actual, predicted) equals to fmeasure(actual, predicted, '-beta 1.0 -average micro') .

See UDFWithOptions and it's usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review.
OK, I will update arguments.

select array("dog") as actual, array("dog", "bird") as predicted
)
select
fmeasure(actual, predicted, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmeasure(actual, predicted, '-beta 2.0 -average macro')

- Update document
- Create new class: `UDAFEvaluatorWithOptions.java`
@nzw0301 nzw0301 changed the title [WIP][HIVEMALL-132] Generalize f1score UDAF to support any Beta value [HIVEMALL-132] Generalize f1score UDAF to support any Beta value Aug 7, 2017
@nzw0301
Copy link
Member Author

nzw0301 commented Aug 7, 2017

@myui I update this PR. Could you review this PR code?

@myui
Copy link
Member

myui commented Aug 7, 2017

Sure.

@takuti Could you help reviewing this PR?

@takuti
Copy link
Member

takuti commented Aug 21, 2017

@myui Oh, sorry. I've noticed this just now. Sure, I'm going to review.

Copy link
Member

@takuti takuti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nzw0301 Reviewed. Sorry for late response.

Most importantly, it's hard to understand the difference in the -average option from your document and code. Updating them with more precise description and comments would be better :)

@@ -0,0 +1,97 @@
package hivemall;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to insert the LICENSE header by ./bin/format_header.sh

@@ -0,0 +1,355 @@
package hivemall.evaluation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to insert the LICENSE header by ./bin/format_header.sh

}
}

protected static void setCounterValue(@Nullable Counters.Counter counter, long value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since org.apache.hadoop.mapred.Counters is only used for pointing to org.apache.hadoop.mapred.Counters.Counter, you can directly import org.apache.hadoop.mapred.Counters.Counter like UDTFWithOptions.

fieldOIs.add(PrimitiveObjectInspectorFactory.writableDoubleObjectInspector);
fieldNames.add("average");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let you remove this duplicated blank line


agg.get();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let you remove this duplicated blank line

} else {
return -1d;
return -1.d;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When divisor is zero, returning 0.d is general in a context of f-measure. You can refer to scikit-learn's zero division handling here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. This value was based on previous f1score code.
I'll change this value to 0.d when divisor is zero.
And thank you for sharing good link.

}

double get() {
double squareBeta = Math.pow(beta, 2.d);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine, but let you keep in mind that, in a context of numerical computation, avoiding external method dependency for such very simple procedure is normally good habit. That it, if what you want to do is very very simple, you can implement the procedure by yourself in a succinct way as beta * beta.

The reason here is that, external method potentially executes unexpected code for various reasons such as optimization and error handling, and it might worsen performance. Of course, you can use external method if you've understood its internal code and evaluated the code is reliable & efficient enough.

double squareBeta) {
long lp = totalPredicted - tp;

if (lp < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this kind of situation (totalPredicted < tp and totalActual < tp) possible? I imagine that TP is always a subset of predicted/actual labels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, always total* - tp is non negative. I remove this if statement. Thanks.

$$
\mathrm{F}_{\beta} = (1+\beta^2) \frac
{\sum_i |l_i \cap p_i |}
{ \beta^2 (\sum_i |l_i \cap p_i | + \sum_i |p_i - l_i |) + \sum_i |l_i \cap p_i | + \sum_i |l_i - p_i |}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p_i - l_i and l_i - p_i look opposite. Could you double-check the equation?

Assume that p_i = [1, 2, 3] and l_i = [1, 4]:

  • p_i - l_i = [1, 2, 3] - [1, 4] = [2, 3]
    • = predicted labels which are NOT in actual labels
    • = FP
  • l_i - p_i = [1, 4] - [1, 2, 3] = [4]
    • = actual labels which are NOT predicted
    • = FN

Since the f-measure should be computed by (1 + beta^2) * TP / (beta^2 * (TP + FN) + TP + FP), I guess p_i - l_i and l_i - p_i are opposite in this equation.

Meanwhile, your code FMeasureAggregationBuffer.denom()seems to be implemented correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! it is wrong equation. (Since my test code is wrong order.)
I also changed FMeasureAggregationBuffer.denom() to understand easily.


evaluator.iterate(agg, new Object[] {actual, predicted});

Assert.assertEquals(0.5714285714285715d, agg.get(), 1e-5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write how did you get the expected result? (Spark similarly to testMultiLabelF1MultiSamples?)

@nzw0301
Copy link
Member Author

nzw0301 commented Aug 21, 2017

@takuti Thank you for your useful review.
I will fix this PR based on your comment.

- fix typos
- replace Jave exception to Hive exception
- style fixes
@@ -530,6 +530,9 @@ sqlContext.sql("CREATE TEMPORARY FUNCTION lr_datagen AS 'hivemall.dataset.Logist
sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS f1score")
sqlContext.sql("CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.FMeasureUDAF'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to copy the old FMeasureUDAF.java as F1ScoreUDAF.java for a backward compatibility.

Then, CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.F1ScoreUDAF'.

- fix typo in F1ScoreUDAF.java
- fix alias for f1score in define-all.spark
@myui
Copy link
Member

myui commented Aug 25, 2017

@nzw0301 grep f1score in resources/ddl. It's not only for define-all.spark.

- Update documents
- Fix text for spark example
- Refactor FMeasureUDAF.java
@nzw0301
Copy link
Member Author

nzw0301 commented Aug 28, 2017

@takuti @myui Thank you for your kind comments.
I completed update based on reviews.

Copy link
Member

@takuti takuti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nzw0301 Commented, and I found a crucial problem in the F1ScoreUDAF implementation. Could you check the points?

In order to check if fmeasure works correctly on larger-scale data, I've tried the following query on the MovieLens 1M data, and it has no problem:

with data as (
  select if(rating > 3, 1, 0) as truth, 0 as predicted from ratings
)
select fmeasure(truth, predicted)
from data
;

0.4248392086054015

At the same time, once you fixed the F1ScoreUDAF bug, f1score returns the same result as fmeasure:

with data as (
  select if(rating > 3, 1, 0) as truth, 0 as predicted from ratings
)
select f1score(array(truth), array(predicted))
from data
;

0.42483920860540153


if (typeInfo[0] != typeInfo[1]) {
throw new UDFArgumentTypeException(1, "The first argument's `actual` type is "
+ typeInfo[0] + ", but the second argument `predicated`'s type is not match: "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: predicated => predicted

|| HiveUtils.isBooleanTypeInfo(typeInfo[1]);
if (!isArg2ListOrIntOrBoolean) {
throw new UDFArgumentTypeException(1,
"The second argument `array/int/boolean actual` is invalid form: " + typeInfo[1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: actual => predicted

this.totalAcutal += numActual;
this.totalPredicted += numPredicted;

average = cl.getOptionValue("average", "micro");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write here as: average = cl.getOptionValue("average", average);


evaluator.iterate(agg, new Object[] {actual, predicted});

// TODO: describe the way to get this expected value by spark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you generated the expected value by using Spark, you can complete this TODO by just writing:

// should equal to spark's micro f1 measure result
// https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#multilabel-classification

as testMultiLabelF1MultiSamples()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forget to update this line. Thanks.


### Micro average

If `micro` is passed to `average`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

this.totalPredicted += numPredicted;
}

void merge(PartialResult other) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, there is a bug here! 🐛

Correct:

void merge(PartialResult other) {
    this.tp += other.tp;
    this.totalActual += other.totalActual;
    this.totalPredicted += other.totalPredicted;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. This should be fixed.


-- 0.5;
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about writing the difference between f1score and fmeasure here? It could be helpful to understand the concept of -average micro and replace the old f1score with fmeasure. For instance:

It should be noted that, since the old f1score(truth, predicted) function simply counts the number of "matched" elements between truth and predicted, the above query is equivalent to:

WITH data as (
  select 1 as truth, 0 as predicted
union all
  select 0 as truth, 1 as predicted
union all
  select 0 as truth, 0 as predicted
union all
  select 1 as truth, 1 as predicted
union all
  select 0 as truth, 1 as predicted
union all
  select 0 as truth, 0 as predicted
)
select
  f1score(array(truth), array(predicted))
from data
;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! thanks

- Update docs
- Fix typo in FmeasureUDAF
@nzw0301
Copy link
Member Author

nzw0301 commented Aug 28, 2017

@takuti Thank you for comments!
I run query above, I get the same result on movielens-1M.

@nzw0301
Copy link
Member Author

nzw0301 commented Aug 28, 2017

I will check whether the return value is the same tomorrow.

@takuti
Copy link
Member

takuti commented Aug 28, 2017

Importantly, in case that the number of mappers is 1, fixing the bug in merge() does not change the output value; you might see the same result 0.42483920860540153 even if the bug exists.

Alternatively, let you test the following query (6 mappers will be shown for each select statement), and check if its output is same as fmeasure(truth, predicted, '-average micro'):

WITH data as (
  select 1 as truth, 0 as predicted
union all
  select 0 as truth, 1 as predicted
union all
  select 0 as truth, 0 as predicted
union all
  select 1 as truth, 1 as predicted
union all
  select 0 as truth, 1 as predicted
union all
  select 0 as truth, 0 as predicted
)
select
  f1score(array(truth), array(predicted))
from data
;

If the bug has been fixed correctly, output should be same as fmeasure(truth, predicted, '-average micro) = 0.5, while the buggy code returns f1score(array(truth), array(predicted)) = 1.0

@nzw0301
Copy link
Member Author

nzw0301 commented Aug 29, 2017

I tested @takuti's query. The buggy code (previous code) returns 1.0.
On the other hand, the fixed code return correct value: 0.5.

@nzw0301
Copy link
Member Author

nzw0301 commented Aug 29, 2017

But I found another issue for f1score and fmeasure.

Both function cannot work on EMR v5.8.0

hive> select f1score(array(1), array(1));
FAILED: IllegalArgumentException Size requested for unknown type: org.apache.hadoop.hive.ql.exec.UDAFEvaluator
hive> select fmeasure(array(1), array(1));
FAILED: IllegalArgumentException Size requested for unknown type: java.lang.String

However, they can work on EMR v5.0.0.
I don't know why the failures occur on newer EMR.

}
}

public static class FMeasureAggregationBuffer extends
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

estimate is required for estimating resulting size for AbstractAggregationBuffer.

Refer
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java#L195

@nzw0301
Copy link
Member Author

nzw0301 commented Aug 29, 2017

The issue above is avoided by creating table:

create table data as (
  select 1 as truth, 1 as predicted
);
// ok
select
  fmeasure(array(truth), array(predicted))
from data
;

// ok
select
  f1score(array(truth), array(predicted))
from data
;

@myui
Copy link
Member

myui commented Aug 29, 2017

It's Hive v2.2.0 bug.

Filed a ticket: https://issues.apache.org/jira/browse/HIVE-17406

Copy link
Member

@takuti takuti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nzw0301 I guess you've finished everything you need. LGTM

@takuti
Copy link
Member

takuti commented Sep 1, 2017

@myui Would you double-check this? I can merge whenever you are ready.

@myui
Copy link
Member

myui commented Sep 2, 2017

Let me see.

@@ -100,7 +100,7 @@ Note that `floor(prob / 0.2)` means that the rows are distributed to 5 bins for

# Difference between AUC and Logarithmic Loss

Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs.
Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing link. stat_eval.html is deleted.

@myui
Copy link
Member

myui commented Sep 13, 2017

@nzw0301 I'll fix and merge it. No need to update this PR.

@asfgit asfgit closed this in 098a7f3 Sep 13, 2017
@myui
Copy link
Member

myui commented Sep 13, 2017

@nzw0301 LGTM 👍 Merged. Well done! (thank you for your review @takuti )

@nzw0301
Copy link
Member Author

nzw0301 commented Sep 13, 2017

Thank you for your review and support @takuti @myui

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants