Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example provided) #5400

isaacgerg · 2017-02-14T22:31:56Z

keras 1.2.2, tf-gpu -.12.1

Example code to show issue:

'''Trains a simple convnet on the MNIST dataset.

Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.
'''

#from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

batch_size = 128
nb_classes = 10
nb_epoch = 12

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# make 2 categories
y_train = y_train>=5
y_test = y_test>=5

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, 2)
Y_test = np_utils.to_categorical(y_test, 2)

model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy', 'f1score', 'precision', 'recall'])

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

yields output:

Using TensorFlow backend.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX TITAN Black
major: 3 minor: 5 memoryClockRate (GHz) 0.98
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 5.85GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:01:00.0)

  128/60000 [..............................] - ETA: 1686s - loss: 0.7091 - acc: 0.4688 - fmeasure: 0.4687 - precision: 0.4688 - recall: 0.4688
  384/60000 [..............................] - ETA: 567s - loss: 0.6981 - acc: 0.4922 - fmeasure: 0.4922 - precision: 0.4922 - recall: 0.4922 
  640/60000 [..............................] - ETA: 343s - loss: 0.6845 - acc: 0.5609 - fmeasure: 0.5609 - precision: 0.5609 - recall: 0.5609
 1024/60000 [..............................] - ETA: 217s - loss: 0.6654 - acc: 0.6143 - fmeasure: 0.6143 - precision: 0.6143 - recall: 0.6143
 1408/60000 [..............................] - ETA: 159s - loss: 0.6427 - acc: 0.6456 - fmeasure: 0.6456 - precision: 0.6456 - recall: 0.6456
 1792/60000 [..............................] - ETA: 126s - loss: 0.6226 - acc: 0.6629 - fmeasure: 0.6629 - precision: 0.6629 - recall: 0.6629

The text was updated successfully, but these errors were encountered:

laxatives · 2017-02-16T02:13:53Z

FYI, I had this issue as well, but I believe it was resolved by replacing metrics.py with the latest version in https://raw.githubusercontent.com/fchollet/keras/master/keras/metrics.py

1280/5640 [=====>........................] - ETA: 20s - loss: 1.5566 - fmeasure: 0.8134 - precision: 0.8421 - recall: 0.7867
1408/5640 [======>.......................] - ETA: 19s - loss: 1.5358 - fmeasure: 0.8160 - precision: 0.8433 - recall: 0.7905
1536/5640 [=======>......................] - ETA: 18s - loss: 1.5221 - fmeasure: 0.8187 - precision: 0.8449 - recall: 0.7943
1664/5640 [=======>......................] - ETA: 18s - loss: 1.5240 - fmeasure: 0.8152 - precision: 0.8406 - recall: 0.7915
1792/5640 [========>.....................] - ETA: 17s - loss: 1.5294 - fmeasure: 0.8141 - precision: 0.8391 - recall: 0.7907
1920/5640 [=========>....................] - ETA: 17s - loss: 1.5106 - fmeasure: 0.8175 - precision: 0.8422 - recall: 0.7943
2048/5640 [=========>....................] - ETA: 16s - loss: 1.5220 - fmeasure: 0.8146 - precision: 0.8391 - recall: 0.7915
2176/5640 [==========>...................] - ETA: 15s - loss: 1.5127 - fmeasure: 0.8189 - precision: 0.8429 - recall: 0.7964

isaacgerg · 2017-02-16T02:20:05Z

Did you runt he code I provided?

metrics.py is a month old. I just did the pypy pull to get keras 1.2.2. I can't see how that can be the issue.

recluze · 2017-02-19T16:03:48Z

I had the issue of getting good accuracy but bad precision and recall on balanced dataset. I ended up calculating fp, tp, fn and tn manually and then precision/recall/f1 through custom metrics method . Based on that, I got good f1 etc. I did see the code of metrics.py and can't figure out why it would give incorrect results.

btw, I'm using keras 1.2.0 for now.

nsarafianos · 2017-02-22T20:32:33Z

@recluze how can you do this and at the same time compute them over the whole set (and not per batch) and still pass the custom metric in the model?

As also mentioned here #4592 I think that the correct way to compute precision, recall, etc is with the complete prediction and ground truth vectors/tensors and not averaging per batch.

recluze · 2017-02-23T04:16:08Z

@nsarafianos Only do this per-batch as the values are reported on a per-batch basis by keras callbacks. Once you're trained, you can just use mode.predict to go over the complete test set and compute your metrics in full.

isaacgerg · 2017-02-24T01:09:27Z

@recluze Were you able to replicate my bug?

nsarafianos · 2017-02-24T01:29:15Z

@isaacgerg I had exactly the same problem (accuracy equal to precision on a balanced task) with another dataset which made me look into this. For some reason the per batch computation of the precision is not working properly. Using sklearn.metrics on the predicted class worked fine (here's a binary problem):

yp = model.predict(X_test, batch_size=32, verbose=1)
ypreds = np.argmax(yp, axis=1)
print(average_precision_score(ytrue, ypreds))
print(accuracy_score(ytrue, ypreds))

isaacgerg · 2017-02-24T01:33:15Z

@nsarafianos Would you mind checking if you get the the same results when running the code i submitted? Thanks for the advice on sklearn.metrics.

nsarafianos · 2017-02-24T01:39:47Z

@isaacgerg Just ran the code you posted and yes all 3 measurements are the same for all 12 epochs.

For a 10-class problem, I would create the confusion matrix, get tp,fp, etc. and then compute whichever metrics you want.

recluze · 2017-02-24T02:46:29Z

@nsarafianos and @isaacgerg .... I've seen this issue repeatedly (though not strictly always for all datasets. For some very large datasets, it seems to be common).

sklearn's metrics aren't an option for me since raw numpy is quite slow for my dataset. I used keras backend functions to create tp/fp etc. and then compute precision etc. in on_epoch_end of the callback. Works well for seeing training progress and then you can compute over the full training set at the end.

guangbaowan · 2017-04-27T08:58:36Z

I tried this on 2.0.3 but failed

add the snippet

def f1_score(y_true, y_pred):

    # Count positive samples.
    c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    c2 = K.sum(K.round(K.clip(y_pred, 0, 1)))
    c3 = K.sum(K.round(K.clip(y_true, 0, 1)))

    # If there are no true samples, fix the F1 score at 0.
    if c3 == 0:
        return 0

    # How many selected items are relevant?
    precision = c1 / c2

    # How many relevant items are selected?
    recall = c1 / c3

    # Calculate f1_score
    f1_score = 2 * (precision * recall) / (precision + recall)
    return f1_score

isaacgerg · 2017-04-27T12:05:23Z

Is it failing here? if c3 == 0: return 0

…

On Thu, Apr 27, 2017 at 4:58 AM, guangbaowan ***@***.***> wrote: I tried this on 2.0.3 but failed add the snippet def f1_score(y_true, y_pred): # Count positive samples. c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) c2 = K.sum(K.round(K.clip(y_pred, 0, 1))) c3 = K.sum(K.round(K.clip(y_true, 0, 1))) # If there are no true samples, fix the F1 score at 0. if c3 == 0: return 0 # How many selected items are relevant? precision = c1 / c2 # How many relevant items are selected? recall = c1 / c3 # Calculate f1_score f1_score = 2 * (precision * recall) / (precision + recall) return f1_score — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5400 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALarq8-h1F0h-kEPgirGb68YUsfl_oltks5r0FjPgaJpZM4MBFGI> .

fighting41love · 2017-05-05T03:16:13Z

Same problem. I customized metrics -- precision, recall and F1-measure. The model.fit_generator and model.evaluate_generator also gives the same precision, recall and F1-measure.

keras==2.0.0 on Mac OS Sierra 10.12.4

Epoch 8/10
0s - loss: 0.0269 - binary_accuracy: 0.8320 - f1score: 0.8320 - precision: 0.8320 - recall: 0.8320
Epoch 9/10
0s - loss: 0.0488 - binary_accuracy: 0.6953 - f1score: 0.6953 - precision: 0.6953 - recall: 0.6953
Epoch 10/10
0s - loss: 0.0457 - binary_accuracy: 0.7148 - f1score: 0.7148 - precision: 0.7148 - recall: 0.7148
Start to evaluate.
binary_accuracy: 76.06%
f1score: 76.06%
precision: 76.06%
recall: 76.06%

unnir · 2017-07-12T12:01:02Z

for those who will come here later, since Keras 2.0 metrics fmeasure, precision, and recall have been removed.

if you want to use them, you can check history of the repo or add this code:

from keras import backend as K

 def mcor(y_true, y_pred):
     #matthews_correlation
     y_pred_pos = K.round(K.clip(y_pred, 0, 1))
     y_pred_neg = 1 - y_pred_pos
 
 
     y_pos = K.round(K.clip(y_true, 0, 1))
     y_neg = 1 - y_pos
 
 
     tp = K.sum(y_pos * y_pred_pos)
     tn = K.sum(y_neg * y_pred_neg)
 
 
     fp = K.sum(y_neg * y_pred_pos)
     fn = K.sum(y_pos * y_pred_neg)
 
 
     numerator = (tp * tn - fp * fn)
     denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
 
 
     return numerator / (denominator + K.epsilon())




def precision(y_true, y_pred):
    """Precision metric.

    Only computes a batch-wise average of precision.

    Computes the precision, a metric for multi-label classification of
    how many selected items are relevant.
    """
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def recall(y_true, y_pred):
    """Recall metric.

    Only computes a batch-wise average of recall.

    Computes the recall, a metric for multi-label classification of
    how many relevant items are selected.
    """
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall


def f1(y_true, y_pred):
    def recall(y_true, y_pred):
        """Recall metric.

        Only computes a batch-wise average of recall.

        Computes the recall, a metric for multi-label classification of
        how many relevant items are selected.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

    def precision(y_true, y_pred):
        """Precision metric.

        Only computes a batch-wise average of precision.

        Computes the precision, a metric for multi-label classification of
        how many selected items are relevant.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

#you can use it like this
model.compile(loss='binary_crossentropy',
              optimizer= "adam",
              metrics=[mcor,recall, f1])

baharian · 2017-08-10T18:13:38Z

@unnir Thanks for providing these implementations. But even using these, I get equal precision and recall at each epoch during training for both training dataset and validation dataset.

unnir · 2017-08-10T18:55:10Z

@baharian I guess it has nothing to do with metrics. Do you have the result for the loss too?

baharian · 2017-08-10T19:05:38Z

Yes! The value of loss changes by epoch, but precision and recall stay constant. Here is an excerpt:

Train on 9372 samples, validate on 2343 samples
Epoch 1/10
9372/9372 [==============================] - 0s - loss: 0.4628 - precision: 0.9392 - recall: 0.9720 - val_loss: 0.2192 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 2/10
9372/9372 [==============================] - 0s - loss: 0.2203 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.1182 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 3/10
9372/9372 [==============================] - 0s - loss: 0.1525 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.1031 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 4/10
9372/9372 [==============================] - 0s - loss: 0.1269 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.0967 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 5/10
9372/9372 [==============================] - 0s - loss: 0.1202 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.0907 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 6/10
9372/9372 [==============================] - 0s - loss: 0.1172 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.0904 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 7/10
9372/9372 [==============================] - 0s - loss: 0.1134 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.0901 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 8/10
9372/9372 [==============================] - 0s - loss: 0.1111 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.0877 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 9/10
9372/9372 [==============================] - 0s - loss: 0.1097 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.0856 - val_precision: 0.9812 - val_recall: 0.9812
Epoch 10/10
9372/9372 [==============================] - 0s - loss: 0.1070 - precision: 0.9791 - recall: 0.9791 - val_loss: 0.0869 - val_precision: 0.9812 - val_recall: 0.9812
32/515 [>.............................] - ETA: 0s

rimjhim365 · 2017-08-10T23:52:56Z

@baharian @unnir I am stuck on this problem! Did you get any solution?

unnir · 2017-08-11T05:42:14Z

@baharian metrics functions work, please look on your data, probably you have not normalised it, or try to tune hyper parameters of your model (opt function, batch size, number of layers)

@rimjhim365 what kind of problem do you have?

rimjhim365 · 2017-08-11T06:12:16Z

@unnir The value for other metrics like precision and recall are coming same as accuracy

baharian · 2017-08-11T14:12:16Z

@unnir i did not mean that they do not work; what i was trying to say is that the numbers that i get don't make much sense to me. i have indeed normalized my data prior to feeding them into the neural network, and i am doing cross-validation to tune hyper-parameters.

ametersky · 2017-08-28T15:17:46Z

i am also seeing the same scores coming through for custom metrics. the below gave the following output for an epoch:

Epoch 1/20
72326/72326 [==============================] - 293s - loss: 0.4666 - acc: 0.8097 - precision: 0.8097 - recall: 0.8097 - f1_score: 0.8097 - val_loss: 0.4592 - val_acc: 0.8100 - val_precision: 0.8100 - val_recall: 0.8100 - val_f1_score: 0.8100

def f1_score(y_true, y_pred):

    # Count positive samples.
    c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    c2 = K.sum(K.round(K.clip(y_pred, 0, 1)))
    c3 = K.sum(K.round(K.clip(y_true, 0, 1)))

    # If there are no true samples, fix the F1 score at 0.
    if c3 == 0:
        return 0

    # How many selected items are relevant?
    precision = c1 / c2

    # How many relevant items are selected?
    recall = c1 / c3

    # Calculate f1_score
    f1_score = 2 * (precision * recall) / (precision + recall)
    return f1_score


def precision(y_true, y_pred):

    # Count positive samples.
    c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    c2 = K.sum(K.round(K.clip(y_pred, 0, 1)))
    c3 = K.sum(K.round(K.clip(y_true, 0, 1)))

    # If there are no true samples, fix the F1 score at 0.
    if c3 == 0:
        return 0

    # How many selected items are relevant?
    precision = c1 / c2

    return precision


def recall(y_true, y_pred):

    # Count positive samples.
    c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    c3 = K.sum(K.round(K.clip(y_true, 0, 1)))

    # If there are no true samples, fix the F1 score at 0.
    if c3 == 0:
        return 0

    recall = c1 / c3

    return recall

    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy', precision, recall, f1_score])

micklexqg · 2017-09-04T07:37:41Z

@nsarafianos, I think the idea to compute confusion matrix is good. But I don't know how to compute it?
Did you solve it?

moming2k · 2017-09-20T08:52:07Z

I have the same problem, even like @ametersky to input the custom function, it always return the same value.

@unnir do you have any example code that can show it will work differently ?

unnir · 2017-09-20T08:59:11Z

@moming2k I have, I guess you have issues in your model, or you have to update keras.

One more time:
the custom metrics work for me perfectly . Try to make a toy model, f.e. XOR to check the metrics.

SSchott · 2017-10-06T14:06:11Z

I have Keras 2.0.8 and have the same behaviour :S

manvindra · 2018-02-08T14:38:12Z

When i use this, I gets F1 as nan in all epochs in each batch.

tobycheese · 2018-06-19T07:20:44Z

With unnirs merics I also get f1 as NaNs in my first epoch sometimes, while precision and recall seem ok. In subsequent epochs, all metrics are ok.

unnir · 2018-06-19T07:33:57Z

@tobycheese I updated the code, please check it

tobycheese · 2018-06-19T08:57:28Z

@unnir perfect, thanks!

Avcu · 2018-07-05T02:18:04Z

I have the same issue which is having the same results for the custom metrics for the binary classification on an unbalanced data and I am very positive that I there is nothing wrong in the model. Looks like best way is to use the keras metrics rather than implementing it on the backend. Let me know if any of you understands what's wrong here

hbb21st · 2018-07-26T03:22:18Z

Same issue and keras update to 2.2.0, using unnir's code, still 1/19 [>.............................] - ETA: 2:28 - loss: 1.0960e-07 - acc: 1.0000 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000
2/19 [==>...........................] - ETA: 2:19 - loss: 8.0151 - acc: 0.5000 - precision: 0.5000 - recall: 0.5000 - f1: 0.5000
3/19 [===>..........................] - ETA: 2:09 - loss: 5.3434 - acc: 0.6667 - precision: 0.6667 - recall: 0.6667 - f1: 0.6667

unnir · 2018-07-26T14:48:08Z

@hbb21st and what is your problem? The metrics look good, no?

hbb21st · 2018-07-26T15:08:22Z

Hi, no. I think it was wrong. How can acc equals precision equals f1 equals sensertivity equal ... After defining all in custom matrics

…

On Thu, Jul 26, 2018, 9:48 AM unnir ***@***.***> wrote: @hbb21st <https://github.com/hbb21st> and what is your problem? The metrics look good, no? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5400 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHZqJ7SHekAtZjs7LdvDiWZj72Z3cun4ks5uKdbVgaJpZM4MBFGI> .

Avcu · 2018-07-26T15:21:28Z

Yeah sounds strange, I just wanted you to beware that f1 score is harmonic mean of recall and precision. In this case, since recall and precision is same, it does make sense that you have the same number. But still probably something is wrong.

Avcu · 2018-07-26T15:41:57Z

I found a package released recently(attached below). By the way, these metrics should be calculated on every epoch not on every batch.
https://pypi.org/project/keras-metrics/
If the history of metrics is not exactly you want to get, you can omit them during training and calculate after prediction by using unnir's code(you just have to replace all 'K' with numpy)

hbb21st · 2018-07-27T21:12:42Z

Thank Avcu, I tried keras-metrics, still showed acc same with precision.
1/18 [>.............................] - ETA: 1:27:10 - loss: 0.3126 - acc: 0.8750 - precision: 0.8750 - recall: 0.8750 - true_positive: 28.0000 - true_negative: 28.0000 - false_positive: 4.0000 - false_negative: 4.0000
2/18 [==>...........................] - ETA: 41:18 - loss: 8.2154 - acc: 0.4375 - precision: 0.4375 - recall: 0.4375 - true_positive: 14.0000 - true_negative: 14.0000 - false_positive: 18.0000 - false_negative: 18.0000
3/18 [====>.........................] - ETA: 25:59 - loss: 5.4769 - acc: 0.6250 - precision: 0.6250 - recall: 0.6250 - true_positive: 20.0000 - true_negative: 20.0000 - false_positive: 12.0000 - false_negative: 12.0000
4/18 [=====>........................] - ETA: 18:19 - loss: 8.1372 - acc: 0.4688 - precision: 0.4688 - recall: 0.4688 - true_positive: 15.0000 - true_negative: 15.0000 - false_positive: 17.0000 - false_negative: 17.0000
5/18 [=======>......................] - ETA: 13:42 - loss: 7.5171 - acc: 0.5125 - precision: 0.5125 - recall: 0.5125 - true_positive: 16.4000 - true_negative: 16.4000 - false_positive: 15.6000 - false_negative: 15.6000
6/18 [=========>....................] - ETA: 10:36 - loss: 8.9506 - acc: 0.4271 - precision: 0.4271 - recall: 0.4271 - true_positive: 13.6667 - true_negative: 13.6667 - false_positive: 18.3333 - false_negative: 18.3333
7/18 [==========>...................] - ETA: 8:23 - loss: 9.9746 - acc: 0.3661 - precision: 0.3661 - recall: 0.3661 - true_positive: 11.7143 - true_negative: 11.7143 - false_positive: 20.2857 - false_negative: 20.2857
8/18 [============>.................] - ETA: 6:43 - loss: 8.7277 - acc: 0.4453 - precision: 0.4453 - recall: 0.4453 - true_positive: 14.2500 - true_negative: 14.2500 - false_positive: 17.7500 - false_negative: 17.7500
9/18 [==============>...............] - ETA: 5:24 - loss: 7.7580 - acc: 0.5069 - precision: 0.5069 - recall: 0.5069 - true_positive: 16.2222 - true_negative: 16.2222 - false_positive: 15.7778 - false_negative: 15.7778
10/18 [===============>..............] - ETA: 4:21 - loss: 8.5940 - acc: 0.4562 - precision: 0.4562 - recall: 0.4562 - true_positive: 14.6000 - true_negative: 14.6000 - false_positive: 17.4000 - false_negative: 17.4000
11/18 [=================>............] - ETA: 3:29 - loss: 9.2780 - acc: 0.4148 - precision: 0.4148 - recall: 0.4148 - true_positive: 13.2727 - true_negative: 13.2727 - false_positive: 18.7273 - false_negative: 18.7273
12/18 [===================>..........] - ETA: 2:45 - loss: 9.8480 - acc: 0.3802 - precision: 0.3802 - recall: 0.3802 - true_positive: 12.1667 - true_negative: 12.1667 - false_positive: 19.8333 - false_negative: 19.8333
13/18 [====================>.........] - ETA: 2:08 - loss: 10.3303 - acc: 0.3510 - precision: 0.3510 - recall: 0.3510 - true_positive: 11.2308 - true_negative: 11.2308 - false_positive: 20.7692 - false_negative: 20.7692
14/18 [======================>.......] - ETA: 1:35 - loss: 10.7437 - acc: 0.3259 - precision: 0.3259 - recall: 0.3259 - true_positive: 10.4286 - true_negative: 10.4286 - false_positive: 21.5714 - false_negative: 21.5714
15/18 [========================>.....] - ETA: 1:07 - loss: 10.0275 - acc: 0.3708 - precision: 0.3708 - recall: 0.3708 - true_positive: 11.8667 - true_negative: 11.8667 - false_positive: 20.1333 - false_negative: 20.1333
16/18 [=========================>....] - ETA: 42s - loss: 9.4008 - acc: 0.4102 - precision: 0.4102 - recall: 0.4102 - true_positive: 13.1250 - true_negative: 13.1250 - false_positive: 18.8750 - false_negative: 18.8750
17/18 [===========================>..] - ETA: 20s - loss: 9.7959 - acc: 0.3860 - precision: 0.3860 - recall: 0.3860 - true_positive: 12.3529 - true_negative: 12.3529 - false_positive: 19.6471 - false_negative: 19.6471
18/18 [==============================] - 345s 19s/step - loss: 10.1471 - acc: 0.3646 - precision: 0.3646 - recall: 0.3646 - true_positive: 11.6667 - true_negative: 11.6667 - false_positive: 20.3333 - false_negative: 20.3333
But you can see my mistake here: true_positive equals true_negative all the time, why? (True Positive always equals True Negatives). One reason I found: I set batch_size=32, but here true_positive+true_negative+false_positive+false_negative ===== 64, how it doubled my batch_size? And I doubt it cause TP==TN all the time.

Avcu · 2018-07-28T07:45:04Z

I got it. I've tried a binary classification on google servers. It is all about how many units one has on the last layer. If you have only one, everything is okay but if you have two of them it's not working. On the other hand, for the binary classification using two units with a softmax activation function(probably that's what you do as well) is often suggested for a better convergence as far as I know. You can check my code below, I will create a post under this keras-vis library's issue.
https://colab.research.google.com/drive/1lmQ-hWcN4tsGMicd4dKnSjeTD-BdgJuE
Best

Avcu · 2018-07-30T04:29:28Z

I've created a pull request to solve the problem(netrack/keras-metrics#4), I hope it'll be accepted soon.
For those who wanna use custom method, I corrected the unnir's code as following

from keras import backend as K

def check_units(y_true, y_pred):
    if y_pred.shape[1] != 1:
      y_pred = y_pred[:,1:2]
      y_true = y_true[:,1:2]
    return y_true, y_pred

def precision(y_true, y_pred):
    y_true, y_pred = check_units(y_true, y_pred)
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def recall(y_true, y_pred):
    y_true, y_pred = check_units(y_true, y_pred)
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def f1(y_true, y_pred):
    def recall(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

    def precision(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    y_true, y_pred = check_units(y_true, y_pred)
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

#you can use it as following
model.compile(loss='binary_crossentropy',
              optimizer= "adam",
              metrics=[precision,recall, f1])

hbb21st · 2018-07-30T21:05:27Z

Hi, Avcu, can you check and confirm your code,I adopted it by model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=[ 'accuracy', mcor, precision, recall, f1 ] ) and that showed something incorrectly for mcor: 1/7 [===>..........................] - ETA: 1:22 - loss: 3.9758 - acc: 0.2500 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.2500 - f1: 0.4000 2/7 [=======>......................] - ETA: 36s - loss: 1.9879 - acc: 0.6250 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.6250 - f1: 0.7000 3/7 [===========>..................] - ETA: 20s - loss: 1.3253 - acc: 0.7500 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.7500 - f1: 0.8000 4/7 [================>.............] - ETA: 12s - loss: 0.9940 - acc: 0.8125 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8125 - f1: 0.8500 5/7 [====================>.........] - ETA: 7s - loss: 0.7952 - acc: 0.8500 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8500 - f1: 0.8800 6/7 [========================>.....] - ETA: 3s - loss: 0.6626 - acc: 0.8750 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8750 - f1: 0.9000 7/7 [==============================] - 20s 3s/step - loss: 0.5680 - acc: 0.8929 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8929 - f1: 0.9143 Epoch 2/5 1/7 [===>..........................] - ETA: 5s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 2/7 [=======>......................] - ETA: 4s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 3/7 [===========>..................] - ETA: 3s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 4/7 [================>.............] - ETA: 2s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 5/7 [====================>.........] - ETA: 1s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 6/7 [========================>.....] - ETA: 0s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 7/7 [==============================] - 7s 993ms/step - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 Epoch 3/5 1/7 [===>..........................] - ETA: 5s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 2/7 [=======>......................] - ETA: 4s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 3/7 [===========>..................] - ETA: 3s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 4/7 [================>.............] - ETA: 2s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 5/7 [====================>.........] - ETA: 1s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 6/7 [========================>.....] - ETA: 0s - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 7/7 [==============================] - 7s 997ms/step - loss: 1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 Epoch 4/5

…

On Sun, Jul 29, 2018 at 11:30 PM, Avcu ***@***.***> wrote: I've created a pull request to solve the problem, I hope it'll be accepted soon. For those who wanna use custom method, I corrected the unnir's code as following def check_units(y_true, y_pred): if y_pred.shape[1] != 1: y_pred = y_pred[:,1:2] y_true = y_true[:,1:2] return y_true, y_pred from keras import backend as K def mcor(y_true, y_pred): #matthews_correlation y_true, y_pred = check_units(y_true, y_pred) y_pred_pos = K.round(K.clip(y_pred, 0, 1)) y_pred_neg = 1 - y_pred_pos y_pos = K.round(K.clip(y_true, 0, 1)) y_neg = 1 - y_pos tp = K.sum(y_pos * y_pred_pos) tn = K.sum(y_neg * y_pred_neg) fp = K.sum(y_neg * y_pred_pos) fn = K.sum(y_pos * y_pred_neg) numerator = (tp * tn - fp * fn) denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn)) return numerator / (denominator + K.epsilon()) def precision(y_true, y_pred): """Precision metric. Only computes a batch-wise average of precision. Computes the precision, a metric for multi-label classification of how many selected items are relevant. """ y_true, y_pred = check_units(y_true, y_pred) true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1))) precision = true_positives / (predicted_positives + K.epsilon()) return precision def recall(y_true, y_pred): """Recall metric. Only computes a batch-wise average of recall. Computes the recall, a metric for multi-label classification of how many relevant items are selected. """ y_true, y_pred = check_units(y_true, y_pred) true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) possible_positives = K.sum(K.round(K.clip(y_true, 0, 1))) recall = true_positives / (possible_positives + K.epsilon()) return recall def f1(y_true, y_pred): def recall(y_true, y_pred): """Recall metric. Only computes a batch-wise average of recall. Computes the recall, a metric for multi-label classification of how many relevant items are selected. """ true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) possible_positives = K.sum(K.round(K.clip(y_true, 0, 1))) recall = true_positives / (possible_positives + K.epsilon()) return recall def precision(y_true, y_pred): """Precision metric. Only computes a batch-wise average of precision. Computes the precision, a metric for multi-label classification of how many selected items are relevant. """ true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1))) precision = true_positives / (predicted_positives + K.epsilon()) return precision y_true, y_pred = check_units(y_true, y_pred) precision = precision(y_true, y_pred) recall = recall(y_true, y_pred) return 2*((precision*recall)/(precision+recall+K.epsilon())) #you can use it like thismodel.compile(loss='binary_crossentropy', optimizer= "adam", metrics=[mcor,recall, f1]) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5400 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHZqJxuMwM8lMdW-z7NIArxyiYQmHR5rks5uLov6gaJpZM4MBFGI> .

Avcu · 2018-07-31T16:08:28Z

@hbb21st hi, I checked it, and everything seems perfect to me. By the way, I don't know so much about this metric, what it should be around or etc. to be honest. Is it what you're looking for, if so you can give some insight. For the sake of completeness of this issue, I will remove it from my code above.
Can you debug your code by printing tp, tn, fp and fn to find where might be the mistake in every epoch.

idrpambudi · 2018-09-16T16:02:16Z

@hbb21st halloo, I had the same problem. In my case it caused by using softmax in binary classification problem with output dimension of 2 ([0,1] or [1,0]). So when I changed the output dimension to 1 ([0] or [1]) with sigmoid activation function, then it worked just fine.

NTNguyen13 · 2018-09-17T05:55:26Z

@Avcu hi, the code works perfectly.

However, when I tried to use the predict_generator method, it gives different result:

prediction = model.predict_generator(int_val_generator)
val_preds = np.argmax(prediction, axis=-1)
val_trues = int_val_generator.classes
print(classification_report(val_trues, val_preds))

Out:
             precision    recall  f1-score   support
          0       0.75      0.50      0.60         6
          1       0.94      0.98      0.96        47
avg / total       0.92      0.92      0.92        53

while the same weight saved during training give those values: val_loss: 0.4253 - val_precision: 0.9748 - val_recall: 0.8926 - val_f1: 0.9319

where did I go wrong in this case?

NoraAMM · 2018-10-19T20:45:19Z

I am using categorical_crossentropy and softmax and have 2 labels. I also use to_categorica. I used @Avcu 's edit on the code. However, I get equal precision and recall every time. Does this mean I have a problem?

Epoch 1/5
442/442 [==============================] - 6s 14ms/step - loss: 0.6080 - acc: 0.8990 - precision: 0.7059 - recall: 0.7059 - val_loss: 0.4961 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 2/5
442/442 [==============================] - 1s 1ms/step - loss: 0.4000 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.3174 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 3/5
442/442 [==============================] - 1s 1ms/step - loss: 0.2660 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.2254 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 4/5
442/442 [==============================] - 1s 1ms/step - loss: 0.1995 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.1817 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 5/5
442/442 [==============================] - 1s 1ms/step - loss: 0.1677 - acc: 0.9421 - precision: 0.7285 - recall: 0.7285 - val_loss: 0.1594 - val_acc: 0.9400 - val_precision: 0.6600 - val_recall: 0.6600
acc: 94.00%

NoraAMM · 2018-10-19T21:21:02Z

I am using categorical_crossentropy and softmax and have 2 labels. I also use to_categorica. I used @Avcu 's edit on the code. However, I get equal precision and recall every time. Does this mean I have a problem?

Epoch 1/5
442/442 [==============================] - 6s 14ms/step - loss: 0.6080 - acc: 0.8990 - precision: 0.7059 - recall: 0.7059 - val_loss: 0.4961 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 2/5
442/442 [==============================] - 1s 1ms/step - loss: 0.4000 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.3174 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 3/5
442/442 [==============================] - 1s 1ms/step - loss: 0.2660 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.2254 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 4/5
442/442 [==============================] - 1s 1ms/step - loss: 0.1995 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.1817 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 5/5
442/442 [==============================] - 1s 1ms/step - loss: 0.1677 - acc: 0.9421 - precision: 0.7285 - recall: 0.7285 - val_loss: 0.1594 - val_acc: 0.9400 - val_precision: 0.6600 - val_recall: 0.6600
acc: 94.00%

Same problem when I use binary_crossentropy
Also, my problem is a grammar error detection problem where i have sentences and each sentence has only one error (so label 'correct' is way more than 'incorrect') so the model predicts the whole sentence as 'correct'. What can I do?!

NoraAMM · 2018-10-20T15:59:16Z

I am using categorical_crossentropy and softmax and have 2 labels. I also use to_categorica. I used @Avcu 's edit on the code. However, I get equal precision and recall every time. Does this mean I have a problem?
Epoch 1/5
442/442 [==============================] - 6s 14ms/step - loss: 0.6080 - acc: 0.8990 - precision: 0.7059 - recall: 0.7059 - val_loss: 0.4961 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 2/5
442/442 [==============================] - 1s 1ms/step - loss: 0.4000 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.3174 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 3/5
442/442 [==============================] - 1s 1ms/step - loss: 0.2660 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.2254 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 4/5
442/442 [==============================] - 1s 1ms/step - loss: 0.1995 - acc: 0.9419 - precision: 0.7240 - recall: 0.7240 - val_loss: 0.1817 - val_acc: 0.9380 - val_precision: 0.6400 - val_recall: 0.6400
Epoch 5/5
442/442 [==============================] - 1s 1ms/step - loss: 0.1677 - acc: 0.9421 - precision: 0.7285 - recall: 0.7285 - val_loss: 0.1594 - val_acc: 0.9400 - val_precision: 0.6600 - val_recall: 0.6600
acc: 94.00%

Same problem when I use binary_crossentropy
Also, my problem is a grammar error detection problem where i have sentences and each sentence has only one error (so label 'correct' is way more than 'incorrect') so the model predicts the whole sentence as 'correct'. What can I do?!

I think I solved this..
I used the 'incorrect' label (the rare one) as padding (it used to be correct). Also, using weighted_cross_entropy_with_logits from tensorflow as a loss (https://stackoverflow.com/questions/42158866/neural-network-for-multi-label-classification-with-large-number-of-classes-outpu/47313183#47313183).
My results and predictions are now making more sense and feel more normal lol. I hope they are accurate though.

HaiyanJiang · 2019-05-09T16:04:12Z

EQUALITY PROBLEM

I had exactly ran into the same problem (accuracy, precision, recall are f1score are equal to each other both on the training set and the validation set for a balanced task) with another dataset which made me look into this, which we can call it the EQUALITY PROBLEM.

I use:
tensorflow version: 1.13.1
tensorflow keras version: 2.2.4-tf

I have combined all the replies and tried all the codes above, and finally come up with two versions.
The first version is to define precison, recall, and f1score as above.
The second version is to use the precison, recall, and f1score defined in keras-metrics (which depends on keras).

CONCLUTION:

The following is the results of the first version, when I try "categorical classfication using softmax with one-hot output", I HAVE EQUALITY PROBLEM. However, when I try "binary classfication using sigmoid with 0-1 vector output", I DO NOT have EQUALITY PROBLEM.

Here is all my codes

"""
Created on Thu May  9 10:36:22 2019
# Example code to show issue:
Trains a simple convnet on the MNIST dataset.
"""

import numpy as np

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.utils import to_categorical
import tensorflow.keras.backend as K

import tensorflow as tf
print("tensorflow version:", tf.VERSION)
print("tensorflow keras version:", tf.keras.__version__)

def mcor(y_true, y_pred):
    # matthews_correlation
    y_pred_pos = K.round(K.clip(y_pred, 0, 1))
    y_pred_neg = 1 - y_pred_pos
    y_pos = K.round(K.clip(y_true, 0, 1))
    y_neg = 1 - y_pos
    tp = K.sum(y_pos * y_pred_pos)
    tn = K.sum(y_neg * y_pred_neg)
    fp = K.sum(y_neg * y_pred_pos)
    fn = K.sum(y_pos * y_pred_neg)
    numerator = (tp * tn - fp * fn)
    denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
    return numerator / (denominator + K.epsilon())


def precision(y_true, y_pred):
    """ Precision metric.
    Only computes a batch-wise average of precision.
    Computes the precision, a metric for multi-label classification of
    how many selected items are relevant.
    """
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision


def recall(y_true, y_pred):
    """Recall metric.
    Only computes a batch-wise average of recall.
    Computes the recall, a metric for multi-label classification of
    how many relevant items are selected.
    """
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall


def f1score(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return 2*((precision * recall) / (precision+recall + K.epsilon()))

NB_BATCH = 128
NB_EPOCH = 11
NB_FILTER = 32  # number of convolutional filters to use
SZ_POOL = (2, 2)  # size of pooling area for max pooling
SZ_KERNEL = (3, 3)  # convolution kernel size

def get_mnist_bin_data():
    import tensorflow.keras.backend as K
    img_rows, img_cols = 28, 28  # input image dimensions
    # the data, shuffled and split between train and test sets
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    y_train = (y_train >= 5)  # make 2 categories
    y_test = (y_test >= 5)
    if K.image_data_format() == 'channels_first':  # Theano
        X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
        X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
        input_shape = (1, img_rows, img_cols)
    elif K.image_data_format() == 'channels_last':  # TensorFlow
        X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
        X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
        input_shape = (img_rows, img_cols, 1)
    print(input_shape)
    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    X_train /= 255
    X_test /= 255
    print('X_train shape:', X_train.shape)
    print(X_train.shape[0], 'train samples')
    print(X_test.shape[0], 'test samples')
    return X_train, X_test, y_train, y_test


def ann_cat_soft():
    np.random.seed(5400)  # for reproducibility
    X_train, X_test, y_train, y_test = get_mnist_bin_data()
    # convert class vectors to binary class matrices
    Y_train = to_categorical(y_train, 2)
    Y_test = to_categorical(y_test, 2)
    input_shape = X_train.shape[1:]
    model = Sequential()
    model.add(Conv2D(filters=NB_FILTER, kernel_size=SZ_KERNEL,
                     padding='valid', input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(Conv2D(NB_FILTER, SZ_KERNEL))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=SZ_POOL))
    model.add(Dropout(rate=1-0.25))
    model.add(Flatten())
    model.add(Dense(128))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.5))
    model.add(Dense(2, activation='softmax'))
    model.compile(
        loss='categorical_crossentropy',
        optimizer='adadelta',
        metrics=[ mcor, 'accuracy', precision, recall, f1score])
    model.fit(
        X_train, Y_train, batch_size=NB_BATCH, epochs=NB_EPOCH,
        verbose=1, validation_data=(X_test, Y_test))
    score = model.evaluate(X_test, Y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])
    '''
    Accuracy, fmeasure, precision, and recall all the same for
    binary classification problem (cut and pasted example) on May 09 2019.
    '''


def ann_bin_sigm():
    np.random.seed(5400)  # for reproducibility
    X_train, X_test, y_train, y_test = get_mnist_bin_data()
    # convert class vectors to binary class matrices
    Y_train = y_train.astype('float32')
    Y_test = y_test.astype('float32')
    input_shape = X_train.shape[1:]
    model = Sequential()
    model.add(Conv2D(filters=NB_FILTER, kernel_size=SZ_KERNEL[0],
                     strides=SZ_KERNEL[1], padding='valid',
                     input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(Conv2D(NB_FILTER, SZ_KERNEL[0], SZ_KERNEL[1]))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=SZ_POOL))
    model.add(Dropout(rate=1-0.25))
    model.add(Flatten())
    model.add(Dense(128))
    model.add(Activation('relu'))
    model.add(Dropout(rate=0.5))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(
        loss='binary_crossentropy',
        optimizer='adadelta',
        metrics=[mcor, 'accuracy', precision, recall, f1score])
    model.fit(
        X_train, Y_train, batch_size=NB_BATCH, epochs=NB_EPOCH,
        verbose=1, validation_data=(X_test, Y_test))
    score = model.evaluate(X_test, Y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

For the "categorical classfication using softmax with one-hot output", I get the following results, which shows I have the EQUALITY PROBLEM.

ann_cat_soft()

Epoch 1/11
60000/60000 [==============================] - 67s 1ms/sample - loss: 0.2254 - mcor: 0.8140 - acc: 0.9070 - precision: 0.9070 - recall: 0.9070 - f1score: 0.9070 - val_loss: 0.0715 - val_mcor: 0.9539 - val_acc: 0.9767 - val_precision: 0.9770 - val_recall: 0.9770 - val_f1score: 0.9770
Epoch 2/11
60000/60000 [==============================] - 67s 1ms/sample - loss: 0.0995 - mcor: 0.9292 - acc: 0.9646 - precision: 0.9646 - recall: 0.9646 - f1score: 0.9646 - val_loss: 0.0497 - val_mcor: 0.9666 - val_acc: 0.9831 - val_precision: 0.9833 - val_recall: 0.9833 - val_f1score: 0.9833
Epoch 3/11
60000/60000 [==============================] - 65s 1ms/sample - loss: 0.0778 - mcor: 0.9470 - acc: 0.9735 - precision: 0.9735 - recall: 0.9735 - f1score: 0.9735 - val_loss: 0.0416 - val_mcor: 0.9693 - val_acc: 0.9852 - val_precision: 0.9847 - val_recall: 0.9847 - val_f1score: 0.9847
Epoch 4/11
60000/60000 [==============================] - 64s 1ms/sample - loss: 0.0683 - mcor: 0.9546 - acc: 0.9773 - precision: 0.9773 - recall: 0.9773 - f1score: 0.9773 - val_loss: 0.0371 - val_mcor: 0.9753 - val_acc: 0.9875 - val_precision: 0.9876 - val_recall: 0.9876 - val_f1score: 0.9876
Epoch 5/11
60000/60000 [==============================] - 66s 1ms/sample - loss: 0.0615 - mcor: 0.9587 - acc: 0.9793 - precision: 0.9793 - recall: 0.9793 - f1score: 0.9793 - val_loss: 0.0359 - val_mcor: 0.9759 - val_acc: 0.9878 - val_precision: 0.9879 - val_recall: 0.9879 - val_f1score: 0.9879
Epoch 6/11
60000/60000 [==============================] - 66s 1ms/sample - loss: 0.0563 - mcor: 0.9633 - acc: 0.9816 - precision: 0.9816 - recall: 0.9816 - f1score: 0.9816 - val_loss: 0.0342 - val_mcor: 0.9767 - val_acc: 0.9882 - val_precision: 0.9883 - val_recall: 0.9883 - val_f1score: 0.9883
Epoch 7/11
60000/60000 [==============================] - 67s 1ms/sample - loss: 0.0538 - mcor: 0.9632 - acc: 0.9816 - precision: 0.9816 - recall: 0.9816 - f1score: 0.9816 - val_loss: 0.0300 - val_mcor: 0.9802 - val_acc: 0.9900 - val_precision: 0.9901 - val_recall: 0.9901 - val_f1score: 0.9901
Epoch 8/11
60000/60000 [==============================] - 67s 1ms/sample - loss: 0.0529 - mcor: 0.9643 - acc: 0.9822 - precision: 0.9821 - recall: 0.9821 - f1score: 0.9821 - val_loss: 0.0307 - val_mcor: 0.9782 - val_acc: 0.9890 - val_precision: 0.9891 - val_recall: 0.9891 - val_f1score: 0.9891
Epoch 9/11
60000/60000 [==============================] - 68s 1ms/sample - loss: 0.0513 - mcor: 0.9663 - acc: 0.9832 - precision: 0.9832 - recall: 0.9832 - f1score: 0.9832 - val_loss: 0.0294 - val_mcor: 0.9780 - val_acc: 0.9896 - val_precision: 0.9890 - val_recall: 0.9890 - val_f1score: 0.9890
Epoch 10/11
60000/60000 [==============================] - 67s 1ms/sample - loss: 0.0477 - mcor: 0.9692 - acc: 0.9846 - precision: 0.9846 - recall: 0.9846 - f1score: 0.9846 - val_loss: 0.0291 - val_mcor: 0.9773 - val_acc: 0.9892 - val_precision: 0.9886 - val_recall: 0.9886 - val_f1score: 0.9886
Epoch 11/11
60000/60000 [==============================] - 66s 1ms/sample - loss: 0.0466 - mcor: 0.9681 - acc: 0.9840 - precision: 0.9841 - recall: 0.9841 - f1score: 0.9841 - val_loss: 0.0283 - val_mcor: 0.9794 - val_acc: 0.9896 - val_precision: 0.9897 - val_recall: 0.9897 - val_f1score: 0.9897
Test score: 0.028260348330519627
Test accuracy: 0.9792332

For the "binary classfication using sigmoid with 0-1 vector output", I get the following results, which shows I DO NOT have the EQUALITY PROBLEM.

ann_bin_sigm()

Train on 60000 samples, validate on 10000 samples
Epoch 1/11
60000/60000 [==============================] - 4s 61us/sample - loss: 0.5379 - mcor: 0.4488 - acc: 0.7237 - precision: 0.7249 - recall: 0.7078 - f1score: 0.7133 - val_loss: 0.3585 - val_mcor: 0.7453 - val_acc: 0.8715 - val_precision: 0.8549 - val_recall: 0.8889 - val_f1score: 0.8705
Epoch 2/11
60000/60000 [==============================] - 3s 50us/sample - loss: 0.4248 - mcor: 0.6232 - acc: 0.8109 - precision: 0.8206 - recall: 0.7878 - f1score: 0.8018 - val_loss: 0.2906 - val_mcor: 0.7892 - val_acc: 0.8945 - val_precision: 0.9033 - val_recall: 0.8764 - val_f1score: 0.8888
Epoch 3/11
60000/60000 [==============================] - 3s 50us/sample - loss: 0.3910 - mcor: 0.6602 - acc: 0.8298 - precision: 0.8411 - recall: 0.8053 - f1score: 0.8214 - val_loss: 0.2740 - val_mcor: 0.8137 - val_acc: 0.9083 - val_precision: 0.9019 - val_recall: 0.9054 - val_f1score: 0.9030
Epoch 4/11
60000/60000 [==============================] - 3s 49us/sample - loss: 0.3738 - mcor: 0.6764 - acc: 0.8380 - precision: 0.8476 - recall: 0.8173 - f1score: 0.8307 - val_loss: 0.2689 - val_mcor: 0.8199 - val_acc: 0.9089 - val_precision: 0.9223 - val_recall: 0.8899 - val_f1score: 0.9051
Epoch 5/11
60000/60000 [==============================] - 3s 48us/sample - loss: 0.3596 - mcor: 0.6866 - acc: 0.8434 - precision: 0.8523 - recall: 0.8233 - f1score: 0.8364 - val_loss: 0.2672 - val_mcor: 0.8241 - val_acc: 0.9108 - val_precision: 0.9250 - val_recall: 0.8916 - val_f1score: 0.9070
Epoch 6/11
60000/60000 [==============================] - 3s 49us/sample - loss: 0.3529 - mcor: 0.6949 - acc: 0.8475 - precision: 0.8567 - recall: 0.8277 - f1score: 0.8408 - val_loss: 0.2529 - val_mcor: 0.8334 - val_acc: 0.9165 - val_precision: 0.9274 - val_recall: 0.8987 - val_f1score: 0.9122
Epoch 7/11
60000/60000 [==============================] - 3s 48us/sample - loss: 0.3416 - mcor: 0.7108 - acc: 0.8551 - precision: 0.8640 - recall: 0.8371 - f1score: 0.8489 - val_loss: 0.2429 - val_mcor: 0.8415 - val_acc: 0.9199 - val_precision: 0.9257 - val_recall: 0.9101 - val_f1score: 0.9173
Epoch 8/11
60000/60000 [==============================] - 3s 49us/sample - loss: 0.3359 - mcor: 0.7142 - acc: 0.8569 - precision: 0.8673 - recall: 0.8360 - f1score: 0.8501 - val_loss: 0.2422 - val_mcor: 0.8401 - val_acc: 0.9197 - val_precision: 0.9152 - val_recall: 0.9215 - val_f1score: 0.9177
Epoch 9/11
60000/60000 [==============================] - 3s 47us/sample - loss: 0.3297 - mcor: 0.7222 - acc: 0.8609 - precision: 0.8717 - recall: 0.8403 - f1score: 0.8545 - val_loss: 0.2461 - val_mcor: 0.8440 - val_acc: 0.9232 - val_precision: 0.9146 - val_recall: 0.9275 - val_f1score: 0.9205
Epoch 10/11
60000/60000 [==============================] - 3s 47us/sample - loss: 0.3263 - mcor: 0.7270 - acc: 0.8634 - precision: 0.8735 - recall: 0.8444 - f1score: 0.8576 - val_loss: 0.2354 - val_mcor: 0.8534 - val_acc: 0.9274 - val_precision: 0.9242 - val_recall: 0.9249 - val_f1score: 0.9239
Epoch 11/11
60000/60000 [==============================] - 3s 48us/sample - loss: 0.3215 - mcor: 0.7281 - acc: 0.8638 - precision: 0.8724 - recall: 0.8467 - f1score: 0.8582 - val_loss: 0.2372 - val_mcor: 0.8529 - val_acc: 0.9257 - val_precision: 0.9314 - val_recall: 0.9165 - val_f1score: 0.9234
Test score: 0.23720481104850769
Test accuracy: 0.8519195

I find it very interesting, but I don't know why, can anyone explain why this happens? Thank you!

jxw950605 · 2019-06-04T07:55:21Z

Who solved the problem?I also met this problem, who can help me？@jingerx

rola93 · 2019-06-28T15:44:41Z

Hi @unnir nice implementation

However I have a question: why did you re implement recall & precision two times? (I mean, one "common" implementation and the other, exactly the same, but inside f_score metric) Does it have any advantage?

Thanks!

#5400 (comment)

isaacgerg · 2019-09-19T20:46:10Z

The relavant metrics are no longer supported in keras 2.x. Closing for good housekeeping.

Mariyamimtiaz · 2019-11-14T06:15:52Z

For a 10-class problem, I would create the confusion matrix, get tp,fp, etc. and then compute whichever metrics you want.

@nsarafianos If you have created the confusion matrix for 10-classes then can you please share your code?

ShrikanthSingh · 2020-01-09T09:32:36Z

i am also seeing the same scores coming through for custom metrics. the below gave the following output for an epoch:

Epoch 1/20
72326/72326 [==============================] - 293s - loss: 0.4666 - acc: 0.8097 - precision: 0.8097 - recall: 0.8097 - f1_score: 0.8097 - val_loss: 0.4592 - val_acc: 0.8100 - val_precision: 0.8100 - val_recall: 0.8100 - val_f1_score: 0.8100

def f1_score(y_true, y_pred):

    # Count positive samples.
    c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    c2 = K.sum(K.round(K.clip(y_pred, 0, 1)))
    c3 = K.sum(K.round(K.clip(y_true, 0, 1)))

    # If there are no true samples, fix the F1 score at 0.
    if c3 == 0:
        return 0

    # How many selected items are relevant?
    precision = c1 / c2

    # How many relevant items are selected?
    recall = c1 / c3

    # Calculate f1_score
    f1_score = 2 * (precision * recall) / (precision + recall)
    return f1_score


def precision(y_true, y_pred):

    # Count positive samples.
    c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    c2 = K.sum(K.round(K.clip(y_pred, 0, 1)))
    c3 = K.sum(K.round(K.clip(y_true, 0, 1)))

    # If there are no true samples, fix the F1 score at 0.
    if c3 == 0:
        return 0

    # How many selected items are relevant?
    precision = c1 / c2

    return precision


def recall(y_true, y_pred):

    # Count positive samples.
    c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    c3 = K.sum(K.round(K.clip(y_true, 0, 1)))

    # If there are no true samples, fix the F1 score at 0.
    if c3 == 0:
        return 0

    recall = c1 / c3

    return recall

    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy', precision, recall, f1_score])

Does it work for multiclass classification problem?

NKM999 · 2020-06-20T16:11:28Z

@unnir, @isaacgerg , Any solution found?

CharleoY · 2020-08-19T20:35:07Z

It is simply because this custom mcor metric is not compatible with binary categorical label and softmax. If you directly implement the original mcor and f1 metric with softmax, the label distribution of the dataset (or batches) greatly impact the final result.

You can try the modified code as following:

def mcor_softmax(y_true, y_pred):
    # matthews_correlation
    y_pred_pos = K.round(K.clip(y_pred, 0, 1))
    y_pred_pos = K.cast(K.argmax(y_pred_pos, axis=1), 'float32')
    y_pred_neg = 1.0 - y_pred_pos
    y_pos = K.round(K.clip(y_true, 0, 1))
    y_pos = K.cast(K.argmax(y_pos, axis=1), 'float32')
    y_neg = 1.0 - y_pos
    tp = K.sum(y_pos * y_pred_pos)
    tn = K.sum(y_neg * y_pred_neg)
    fp = K.sum(y_neg * y_pred_pos)
    fn = K.sum(y_pos * y_pred_neg)
    numerator = (tp * tn - fp * fn)
    denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
    return numerator / (denominator + K.epsilon())

def f1(y_true, y_pred,softmax=True):
    if softmax:
        y_pred = K.cast(K.argmax(y_pred, axis=1), 'float32')
        y_true = K.cast(K.argmax(y_true, axis=1), 'float32')
    tp = K.sum(K.cast(y_true*y_pred, 'float'), axis=0)
    # tn = K.sum(K.cast((1-y_true)*(1-y_pred), 'float'), axis=0)
    fp = K.sum(K.cast((1-y_true)*y_pred, 'float'), axis=0)
    fn = K.sum(K.cast(y_true*(1-y_pred), 'float'), axis=0)
    p = tp / (tp + fp + K.epsilon())
    r = tp / (tp + fn + K.epsilon())
    f1 = 2*p*r / (p+r+K.epsilon())
    f1 = tf.where(tf.is_nan(f1), tf.zeros_like(f1), f1)
    return K.mean(f1)

def macrof1(y_true, y_pred):
    return (f1(y_true,y_pred) + f1(1-y_true,1-y_pred))/2.

isaacgerg changed the title ~~Accuracy, fmeasure, precision, and recall all the same for binary classification problem~~ Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and pasted example) Feb 14, 2017

isaacgerg changed the title ~~Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and pasted example)~~ Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example) Feb 14, 2017

isaacgerg changed the title ~~Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example)~~ Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example provided) Feb 14, 2017

isaacgerg mentioned this issue Feb 15, 2017

metrics fmeasure and matthews_correlation don't work batchwise #4592

Closed

isaacgerg closed this as completed Sep 19, 2019

shibbirtanvin mentioned this issue Feb 22, 2022

RFC keras-team/governance#34

Closed

Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example provided) #5400

Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example provided) #5400

Comments

isaacgerg commented Feb 14, 2017 • edited

laxatives commented Feb 16, 2017 • edited

isaacgerg commented Feb 16, 2017

recluze commented Feb 19, 2017 • edited

nsarafianos commented Feb 22, 2017 • edited

recluze commented Feb 23, 2017

isaacgerg commented Feb 24, 2017

nsarafianos commented Feb 24, 2017 • edited

isaacgerg commented Feb 24, 2017

nsarafianos commented Feb 24, 2017

recluze commented Feb 24, 2017

guangbaowan commented Apr 27, 2017

isaacgerg commented Apr 27, 2017 via email

fighting41love commented May 5, 2017 • edited

unnir commented Jul 12, 2017 • edited

baharian commented Aug 10, 2017

unnir commented Aug 10, 2017

baharian commented Aug 10, 2017

rimjhim365 commented Aug 10, 2017

unnir commented Aug 11, 2017

rimjhim365 commented Aug 11, 2017

baharian commented Aug 11, 2017

ametersky commented Aug 28, 2017

micklexqg commented Sep 4, 2017

moming2k commented Sep 20, 2017

unnir commented Sep 20, 2017 • edited

SSchott commented Oct 6, 2017

manvindra commented Feb 8, 2018

tobycheese commented Jun 19, 2018 • edited

unnir commented Jun 19, 2018

tobycheese commented Jun 19, 2018

Avcu commented Jul 5, 2018

hbb21st commented Jul 26, 2018

unnir commented Jul 26, 2018

hbb21st commented Jul 26, 2018 via email

Avcu commented Jul 26, 2018

Avcu commented Jul 26, 2018

hbb21st commented Jul 27, 2018

Avcu commented Jul 28, 2018 • edited

Avcu commented Jul 30, 2018 • edited

hbb21st commented Jul 30, 2018 via email

Avcu commented Jul 31, 2018

idrpambudi commented Sep 16, 2018

NTNguyen13 commented Sep 17, 2018

NoraAMM commented Oct 19, 2018

NoraAMM commented Oct 19, 2018

NoraAMM commented Oct 20, 2018

HaiyanJiang commented May 9, 2019

EQUALITY PROBLEM

The following is the results of the first version, when I try "categorical classfication using softmax with one-hot output", I HAVE EQUALITY PROBLEM. However, when I try "binary classfication using sigmoid with 0-1 vector output", I DO NOT have EQUALITY PROBLEM.

I find it very interesting, but I don't know why, can anyone explain why this happens? Thank you!

jxw950605 commented Jun 4, 2019

rola93 commented Jun 28, 2019

isaacgerg commented Sep 19, 2019

Mariyamimtiaz commented Nov 14, 2019

ShrikanthSingh commented Jan 9, 2020

NKM999 commented Jun 20, 2020

CharleoY commented Aug 19, 2020 • edited

isaacgerg commented Feb 14, 2017 •

edited

laxatives commented Feb 16, 2017 •

edited

recluze commented Feb 19, 2017 •

edited

nsarafianos commented Feb 22, 2017 •

edited

nsarafianos commented Feb 24, 2017 •

edited

fighting41love commented May 5, 2017 •

edited

unnir commented Jul 12, 2017 •

edited

unnir commented Sep 20, 2017 •

edited

tobycheese commented Jun 19, 2018 •

edited

Avcu commented Jul 28, 2018 •

edited

Avcu commented Jul 30, 2018 •

edited

CharleoY commented Aug 19, 2020 •

edited