metrics fmeasure and matthews_correlation don't work batchwise #4592

mhubrich · 2016-12-04T12:58:05Z

Hello,

In my opinion, the metrics fmeasure, matthews_correlation, precision and recall all don't work batchwise. In general, this is the case for all metrics which incorporate true/false positives/negatives.

Here is a small and easy counterexample:
Let's assume we have just 4 samples: two negatives and two positives. Also, our batch size is 2:

Batch	Label	Prediction
1	0	0
1	0	1
2	1	0
2	1	1

Now we want to calculate the recall aka. TP-rate in batchwise manner. For the first batch, we have a TP-rate of 0 (since we don't have any true positives). For the second batch, we have a TP-rate of 0.5. Finally, we take the mean over all batches and end with recall=TP-rate=mean(0 + 0.5) = 0.25.

But: as we easily can see the correct recall over the entire dataset is 0.5. The problem with the batchwise calculation is that we wrongly incorporated the first batch.

The text was updated successfully, but these errors were encountered:

sietschie · 2016-12-22T09:54:17Z

I think I stumbled upon the same problem while experimenting with an unbalanced dataset. Here is an example I wrote to isolate the problem:

from keras.models import Sequential
from keras.layers import Dense
import numpy as np

np.random.seed(1380)

input_data = np.random.choice(2,10000, p = [0.99,0.01])

model = Sequential()
model.add(Dense(1, input_dim=1, init='uniform', activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy', 'precision', 'recall'])
              
model.fit(input_data, input_data, nb_epoch=2, batch_size=2, class_weight = {0:0.01, 1:0.99})

Output:

Epoch 1/2
10000/10000 [==============================] - 12s - loss: 0.0130 - acc: 0.7067 - precision: 0.0187 - recall: 0.0218
Epoch 2/2
10000/10000 [==============================] - 11s - loss: 0.0100 - acc: 1.0000 - precision: 0.0230 - recall: 0.0230

In this simple example the classification is always correct, yet the precision and recall is almost 0. It goes up when the batch size is increased. It roughly follows the probability of getting a random batch sized subset of the data where at least on sample is 1. (In this example 1-(.99**2) = 0.019)

That is why I think that this is the same issue the OP described.

laxatives · 2017-02-13T22:55:10Z

I'm also using a binary classifier and seeing some suspicious metrics reported. Namely every epoch reports the exact same value for accuracy, precision and recall over hundreds of epochs. This occurs in training, validation, and evaluation.

model.compile(optimizer='RMSprop',
              loss='binary_crossentropy',
              metrics=['accuracy', 'precision', 'recall'])

4704/4789 [============================>.] - ETA: 2s - loss: 0.5148 - acc: 0.8567 - precision: 0.8567 - recall: 0.8567
4736/4789 [============================>.] - ETA: 1s - loss: 0.5149 - acc: 0.8566 - precision: 0.8566 - recall: 0.8566
4768/4789 [============================>.] - ETA: 0s - loss: 0.5152 - acc: 0.8565 - precision: 0.8565 - recall: 0.8565
4789/4789 [==============================] - 166s - loss: 0.5153 - acc: 0.8563 - precision: 0.8563 - recall: 0.8563 - val_loss: 0.6714 - val_acc: 0.7749 - val_precision: 0.7749 - val_recall: 0.7749
Epoch 4/100
  32/4789 [..............................] - ETA: 162s - loss: 0.4381 - acc: 0.8750 - precision: 0.8750 - recall: 0.8750
  64/4789 [..............................] - ETA: 167s - loss: 0.5680 - acc: 0.8125 - precision: 0.8125 - recall: 0.8125
  96/4789 [..............................] - ETA: 164s - loss: 0.5151 - acc: 0.8438 - precision: 0.8438 - recall: 0.8438
 128/4789 [..............................] - ETA: 161s - loss: 0.4999 - acc: 0.8516 - precision: 0.8516 - recall: 0.8516

isaacgerg · 2017-02-14T16:54:02Z

@laxatives I have this same problem with a binary classifier with a balanced dataset. I use a generator and have verified that it gives balanced data each call to next().

@DepthFirstSearch @sietschie Have you figured out the issue?

mhubrich · 2017-02-14T17:02:30Z

@laxatives @isaacgerg I don't think you have the same issue as @sietschie or me.

As I stated in my initial post, I don't think it's possible to compute fmeasure, precision etc. in a batch-wise manner (see my example). That's the whole issue here.

isaacgerg · 2017-02-15T15:05:48Z

@DepthFirstSearch During training these metrics are meant to be computed for each batch, not for all the batches. In any case, for a binary classifier, the metrics are still being reported incorrectly even on a per batch basis. See #5400.

nsarafianos mentioned this issue Feb 22, 2017

Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example provided) #5400

Closed

stale bot added the stale label May 23, 2017

stale bot closed this as completed Jun 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics fmeasure and matthews_correlation don't work batchwise #4592

metrics fmeasure and matthews_correlation don't work batchwise #4592

mhubrich commented Dec 4, 2016 •

edited

sietschie commented Dec 22, 2016

laxatives commented Feb 13, 2017 •

edited

isaacgerg commented Feb 14, 2017

mhubrich commented Feb 14, 2017

isaacgerg commented Feb 15, 2017

metrics fmeasure and matthews_correlation don't work batchwise #4592

metrics fmeasure and matthews_correlation don't work batchwise #4592

Comments

mhubrich commented Dec 4, 2016 • edited

sietschie commented Dec 22, 2016

laxatives commented Feb 13, 2017 • edited

isaacgerg commented Feb 14, 2017

mhubrich commented Feb 14, 2017

isaacgerg commented Feb 15, 2017

mhubrich commented Dec 4, 2016 •

edited

laxatives commented Feb 13, 2017 •

edited