-
Notifications
You must be signed in to change notification settings - Fork 118
[HIVEMALL-132] Generalize f1score UDAF to support any Beta value #107
Conversation
resources/ddl/define-all.hive
Outdated
drop temporary function if exists f1score; | ||
create temporary function f1score as 'hivemall.evaluation.FMeasureUDAF'; | ||
drop temporary function if exists fmeasure; | ||
create temporary function fmeasure as 'hivemall.evaluation.FMeasureUDAF'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remain alias for f1score
in DDLs for backward compatibility.
-- alias for backward compatibility
drop temporary function if exists f1score;
create temporary function f1score as 'hivemall.evaluation.FMeasureUDAF';
drop temporary function if exists fmeasure;
...
@nzw Could you update user guide to include the usage of
Also, could you revise the current Evaluation section of https://treasure-data.gyazo.com/5ec4b737dcedd55353f8126040ea5366 to
Refer examples in |
Also, some other DDLs also needed to be updated. Please grep |
@nzw0301 Could you add test for binary (and for multi-label measure)? |
- Update checking binary input
- Add UnitTests for binary case and multilabel case - Add validation for binary inputs value - Update DDL for fmeasure function
@myui Update this PR
I will update the documentation related to this PR later. |
@nzw0301 when documentation is updated, could you remove I'll review and merge then. |
@myui Thanks, sure. |
select array("dog") as actual, array("dog", "bird") as predicted | ||
) | ||
select | ||
f1score(actual, predicted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change the optional third argument to take const string options
?
-beta 1.0 (default)
-average [micro (default), macro]
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score
f1score(actual, predicted)
equals to fmeasure(actual, predicted, '-beta 1.0 -average micro')
.
See UDFWithOptions
and it's usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review.
OK, I will update arguments.
select array("dog") as actual, array("dog", "bird") as predicted | ||
) | ||
select | ||
fmeasure(actual, predicted, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmeasure(actual, predicted, '-beta 2.0 -average macro')
- Update document - Create new class: `UDAFEvaluatorWithOptions.java`
@myui I update this PR. Could you review this PR code? |
Sure. @takuti Could you help reviewing this PR? |
@myui Oh, sorry. I've noticed this just now. Sure, I'm going to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nzw0301 Reviewed. Sorry for late response.
Most importantly, it's hard to understand the difference in the -average
option from your document and code. Updating them with more precise description and comments would be better :)
@@ -0,0 +1,97 @@ | |||
package hivemall; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to insert the LICENSE header by ./bin/format_header.sh
@@ -0,0 +1,355 @@ | |||
package hivemall.evaluation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to insert the LICENSE header by ./bin/format_header.sh
} | ||
} | ||
|
||
protected static void setCounterValue(@Nullable Counters.Counter counter, long value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since org.apache.hadoop.mapred.Counters
is only used for pointing to org.apache.hadoop.mapred.Counters.Counter
, you can directly import org.apache.hadoop.mapred.Counters.Counter
like UDTFWithOptions
.
fieldOIs.add(PrimitiveObjectInspectorFactory.writableDoubleObjectInspector); | ||
fieldNames.add("average"); | ||
fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let you remove this duplicated blank line
|
||
agg.get(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let you remove this duplicated blank line
} else { | ||
return -1d; | ||
return -1.d; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When divisor
is zero, returning 0.d
is general in a context of f-measure. You can refer to scikit-learn's zero division handling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. This value was based on previous f1score code.
I'll change this value to 0.d
when divisor
is zero.
And thank you for sharing good link.
} | ||
|
||
double get() { | ||
double squareBeta = Math.pow(beta, 2.d); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine, but let you keep in mind that, in a context of numerical computation, avoiding external method dependency for such very simple procedure is normally good habit. That it, if what you want to do is very very simple, you can implement the procedure by yourself in a succinct way as beta * beta
.
The reason here is that, external method potentially executes unexpected code for various reasons such as optimization and error handling, and it might worsen performance. Of course, you can use external method if you've understood its internal code and evaluated the code is reliable & efficient enough.
double squareBeta) { | ||
long lp = totalPredicted - tp; | ||
|
||
if (lp < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this kind of situation (totalPredicted < tp
and totalActual < tp
) possible? I imagine that TP is always a subset of predicted/actual labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, always total* - tp
is non negative. I remove this if statement. Thanks.
$$ | ||
\mathrm{F}_{\beta} = (1+\beta^2) \frac | ||
{\sum_i |l_i \cap p_i |} | ||
{ \beta^2 (\sum_i |l_i \cap p_i | + \sum_i |p_i - l_i |) + \sum_i |l_i \cap p_i | + \sum_i |l_i - p_i |} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p_i - l_i
and l_i - p_i
look opposite. Could you double-check the equation?
Assume that p_i = [1, 2, 3]
and l_i = [1, 4]
:
p_i - l_i = [1, 2, 3] - [1, 4] = [2, 3]
- = predicted labels which are NOT in actual labels
- = FP
l_i - p_i = [1, 4] - [1, 2, 3] = [4]
- = actual labels which are NOT predicted
- = FN
Since the f-measure should be computed by (1 + beta^2) * TP / (beta^2 * (TP + FN) + TP + FP)
, I guess p_i - l_i
and l_i - p_i
are opposite in this equation.
Meanwhile, your code FMeasureAggregationBuffer.denom()
seems to be implemented correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! it is wrong equation. (Since my test code is wrong order.)
I also changed FMeasureAggregationBuffer.denom()
to understand easily.
|
||
evaluator.iterate(agg, new Object[] {actual, predicted}); | ||
|
||
Assert.assertEquals(0.5714285714285715d, agg.get(), 1e-5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you write how did you get the expected result? (Spark similarly to testMultiLabelF1MultiSamples
?)
@takuti Thank you for your useful review. |
- fix typos - replace Jave exception to Hive exception - style fixes
resources/ddl/define-all.spark
Outdated
@@ -530,6 +530,9 @@ sqlContext.sql("CREATE TEMPORARY FUNCTION lr_datagen AS 'hivemall.dataset.Logist | |||
sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS f1score") | |||
sqlContext.sql("CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.FMeasureUDAF'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to copy the old FMeasureUDAF.java as F1ScoreUDAF.java for a backward compatibility.
Then, CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.F1ScoreUDAF'
.
- fix typo in F1ScoreUDAF.java - fix alias for f1score in define-all.spark
@nzw0301 grep f1score in |
- Update documents - Fix text for spark example - Refactor FMeasureUDAF.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nzw0301 Commented, and I found a crucial problem in the F1ScoreUDAF
implementation. Could you check the points?
In order to check if fmeasure
works correctly on larger-scale data, I've tried the following query on the MovieLens 1M data, and it has no problem:
with data as (
select if(rating > 3, 1, 0) as truth, 0 as predicted from ratings
)
select fmeasure(truth, predicted)
from data
;
0.4248392086054015
At the same time, once you fixed the F1ScoreUDAF
bug, f1score
returns the same result as fmeasure
:
with data as (
select if(rating > 3, 1, 0) as truth, 0 as predicted from ratings
)
select f1score(array(truth), array(predicted))
from data
;
0.42483920860540153
|
||
if (typeInfo[0] != typeInfo[1]) { | ||
throw new UDFArgumentTypeException(1, "The first argument's `actual` type is " | ||
+ typeInfo[0] + ", but the second argument `predicated`'s type is not match: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: predicated
=> predicted
|| HiveUtils.isBooleanTypeInfo(typeInfo[1]); | ||
if (!isArg2ListOrIntOrBoolean) { | ||
throw new UDFArgumentTypeException(1, | ||
"The second argument `array/int/boolean actual` is invalid form: " + typeInfo[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: actual
=> predicted
this.totalAcutal += numActual; | ||
this.totalPredicted += numPredicted; | ||
|
||
average = cl.getOptionValue("average", "micro"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write here as: average = cl.getOptionValue("average", average);
|
||
evaluator.iterate(agg, new Object[] {actual, predicted}); | ||
|
||
// TODO: describe the way to get this expected value by spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you generated the expected value by using Spark, you can complete this TODO by just writing:
// should equal to spark's micro f1 measure result
// https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#multilabel-classification
as testMultiLabelF1MultiSamples()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I forget to update this line. Thanks.
|
||
### Micro average | ||
|
||
If `micro` is passed to `average`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
this.totalPredicted += numPredicted; | ||
} | ||
|
||
void merge(PartialResult other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, there is a bug here! 🐛
Correct:
void merge(PartialResult other) {
this.tp += other.tp;
this.totalActual += other.totalActual;
this.totalPredicted += other.totalPredicted;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops. This should be fixed.
|
||
-- 0.5; | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about writing the difference between f1score
and fmeasure
here? It could be helpful to understand the concept of -average micro
and replace the old f1score
with fmeasure
. For instance:
It should be noted that, since the old
f1score(truth, predicted)
function simply counts the number of "matched" elements betweentruth
andpredicted
, the above query is equivalent to:
WITH data as (
select 1 as truth, 0 as predicted
union all
select 0 as truth, 1 as predicted
union all
select 0 as truth, 0 as predicted
union all
select 1 as truth, 1 as predicted
union all
select 0 as truth, 1 as predicted
union all
select 0 as truth, 0 as predicted
)
select
f1score(array(truth), array(predicted))
from data
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! thanks
- Update docs - Fix typo in FmeasureUDAF
@takuti Thank you for comments! |
I will check whether the return value is the same tomorrow. |
Importantly, in case that the number of mappers is 1, fixing the bug in Alternatively, let you test the following query (6 mappers will be shown for each WITH data as (
select 1 as truth, 0 as predicted
union all
select 0 as truth, 1 as predicted
union all
select 0 as truth, 0 as predicted
union all
select 1 as truth, 1 as predicted
union all
select 0 as truth, 1 as predicted
union all
select 0 as truth, 0 as predicted
)
select
f1score(array(truth), array(predicted))
from data
; If the bug has been fixed correctly, output should be same as |
I tested @takuti's query. The buggy code (previous code) returns |
But I found another issue for f1score and fmeasure. Both function cannot work on hive> select f1score(array(1), array(1));
FAILED: IllegalArgumentException Size requested for unknown type: org.apache.hadoop.hive.ql.exec.UDAFEvaluator hive> select fmeasure(array(1), array(1));
FAILED: IllegalArgumentException Size requested for unknown type: java.lang.String However, they can work on |
} | ||
} | ||
|
||
public static class FMeasureAggregationBuffer extends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
estimate
is required for estimating resulting size for AbstractAggregationBuffer
.
The issue above is avoided by creating table:
|
It's Hive v2.2.0 bug. Filed a ticket: https://issues.apache.org/jira/browse/HIVE-17406 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nzw0301 I guess you've finished everything you need. LGTM
@myui Would you double-check this? I can merge whenever you are ready. |
Let me see. |
@@ -100,7 +100,7 @@ Note that `floor(prob / 0.2)` means that the rows are distributed to 5 bins for | |||
|
|||
# Difference between AUC and Logarithmic Loss | |||
|
|||
Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs. | |||
Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing link. stat_eval.html
is deleted.
@nzw0301 I'll fix and merge it. No need to update this PR. |
What changes were proposed in this pull request?
Make f1 function more general fmeasure function for any positive beta value.
What type of PR is it?
Improvement
What is the Jira issue?
HIVEMALL-132
How was this patch tested?
Add
FMeasureUDAFTest
Checklist
(Please remove this section if not needed; check
x
for YES, blank for NO)mvn formatter:format
, for your commit?