New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example provided) #5400
Comments
FYI, I had this issue as well, but I believe it was resolved by replacing metrics.py with the latest version in https://raw.githubusercontent.com/fchollet/keras/master/keras/metrics.py
|
Did you runt he code I provided? metrics.py is a month old. I just did the pypy pull to get keras 1.2.2. I can't see how that can be the issue. |
I had the issue of getting good accuracy but bad precision and recall on balanced dataset. I ended up calculating fp, tp, fn and tn manually and then precision/recall/f1 through custom metrics method . Based on that, I got good f1 etc. I did see the code of btw, I'm using keras 1.2.0 for now. |
@recluze how can you do this and at the same time compute them over the whole set (and not per batch) and still pass the custom metric in the model? As also mentioned here #4592 I think that the correct way to compute precision, recall, etc is with the complete prediction and ground truth vectors/tensors and not averaging per batch. |
@nsarafianos Only do this per-batch as the values are reported on a per-batch basis by keras callbacks. Once you're trained, you can just use |
@recluze Were you able to replicate my bug? |
@isaacgerg I had exactly the same problem (accuracy equal to precision on a balanced task) with another dataset which made me look into this. For some reason the per batch computation of the precision is not working properly. Using sklearn.metrics on the predicted class worked fine (here's a binary problem):
|
@nsarafianos Would you mind checking if you get the the same results when running the code i submitted? Thanks for the advice on sklearn.metrics. |
@isaacgerg Just ran the code you posted and yes all 3 measurements are the same for all 12 epochs. For a 10-class problem, I would create the confusion matrix, get tp,fp, etc. and then compute whichever metrics you want. |
@nsarafianos and @isaacgerg .... I've seen this issue repeatedly (though not strictly always for all datasets. For some very large datasets, it seems to be common). sklearn's metrics aren't an option for me since raw numpy is quite slow for my dataset. I used keras backend functions to create tp/fp etc. and then compute precision etc. in |
I tried this on 2.0.3 but failed add the snippet
|
Is it failing here?
if c3 == 0:
return 0
…On Thu, Apr 27, 2017 at 4:58 AM, guangbaowan ***@***.***> wrote:
I tried this on 2.0.3 but failed
add the snippet
def f1_score(y_true, y_pred):
# Count positive samples.
c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
c2 = K.sum(K.round(K.clip(y_pred, 0, 1)))
c3 = K.sum(K.round(K.clip(y_true, 0, 1)))
# If there are no true samples, fix the F1 score at 0.
if c3 == 0:
return 0
# How many selected items are relevant?
precision = c1 / c2
# How many relevant items are selected?
recall = c1 / c3
# Calculate f1_score
f1_score = 2 * (precision * recall) / (precision + recall)
return f1_score
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5400 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALarq8-h1F0h-kEPgirGb68YUsfl_oltks5r0FjPgaJpZM4MBFGI>
.
|
Same problem. I customized metrics -- precision, recall and F1-measure. The model.fit_generator and model.evaluate_generator also gives the same precision, recall and F1-measure. keras==2.0.0 on Mac OS Sierra 10.12.4 Epoch 8/10 |
for those who will come here later, since Keras 2.0 metrics fmeasure, precision, and recall have been removed. if you want to use them, you can check history of the repo or add this code: from keras import backend as K
def mcor(y_true, y_pred):
#matthews_correlation
y_pred_pos = K.round(K.clip(y_pred, 0, 1))
y_pred_neg = 1 - y_pred_pos
y_pos = K.round(K.clip(y_true, 0, 1))
y_neg = 1 - y_pos
tp = K.sum(y_pos * y_pred_pos)
tn = K.sum(y_neg * y_pred_neg)
fp = K.sum(y_neg * y_pred_pos)
fn = K.sum(y_pos * y_pred_neg)
numerator = (tp * tn - fp * fn)
denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
return numerator / (denominator + K.epsilon())
def precision(y_true, y_pred):
"""Precision metric.
Only computes a batch-wise average of precision.
Computes the precision, a metric for multi-label classification of
how many selected items are relevant.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def recall(y_true, y_pred):
"""Recall metric.
Only computes a batch-wise average of recall.
Computes the recall, a metric for multi-label classification of
how many relevant items are selected.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def f1(y_true, y_pred):
def recall(y_true, y_pred):
"""Recall metric.
Only computes a batch-wise average of recall.
Computes the recall, a metric for multi-label classification of
how many relevant items are selected.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
"""Precision metric.
Only computes a batch-wise average of precision.
Computes the precision, a metric for multi-label classification of
how many selected items are relevant.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
precision = precision(y_true, y_pred)
recall = recall(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))
#you can use it like this
model.compile(loss='binary_crossentropy',
optimizer= "adam",
metrics=[mcor,recall, f1]) |
@unnir Thanks for providing these implementations. But even using these, I get equal precision and recall at each epoch during training for both training dataset and validation dataset. |
@baharian I guess it has nothing to do with metrics. Do you have the result for the loss too? |
Yes! The value of loss changes by epoch, but precision and recall stay constant. Here is an excerpt:
|
@baharian metrics functions work, please look on your data, probably you have not normalised it, or try to tune hyper parameters of your model (opt function, batch size, number of layers) @rimjhim365 what kind of problem do you have? |
|
@unnir i did not mean that they do not work; what i was trying to say is that the numbers that i get don't make much sense to me. i have indeed normalized my data prior to feeding them into the neural network, and i am doing cross-validation to tune hyper-parameters. |
i am also seeing the same scores coming through for custom metrics. the below gave the following output for an epoch:
|
@nsarafianos, I think the idea to compute confusion matrix is good. But I don't know how to compute it? |
I have the same problem, even like @ametersky to input the custom function, it always return the same value. @unnir do you have any example code that can show it will work differently ? |
@moming2k I have, I guess you have issues in your model, or you have to update keras. One more time: |
I have Keras 2.0.8 and have the same behaviour :S |
When i use this, I gets F1 as nan in all epochs in each batch. |
With unnirs merics I also get f1 as NaNs in my first epoch sometimes, while precision and recall seem ok. In subsequent epochs, all metrics are ok. |
@tobycheese I updated the code, please check it |
@unnir perfect, thanks! |
I have the same issue which is having the same results for the custom metrics for the binary classification on an unbalanced data and I am very positive that I there is nothing wrong in the model. Looks like best way is to use the keras metrics rather than implementing it on the backend. Let me know if any of you understands what's wrong here |
Same issue and keras update to 2.2.0, using unnir's code, still 1/19 [>.............................] - ETA: 2:28 - loss: 1.0960e-07 - acc: 1.0000 - precision: 1.0000 - recall: 1.0000 - f1: 1.0000 |
@hbb21st and what is your problem? The metrics look good, no? |
Hi, no. I think it was wrong. How can acc equals precision equals f1 equals
sensertivity equal ... After defining all in custom matrics
…On Thu, Jul 26, 2018, 9:48 AM unnir ***@***.***> wrote:
@hbb21st <https://github.com/hbb21st> and what is your problem? The
metrics look good, no?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5400 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHZqJ7SHekAtZjs7LdvDiWZj72Z3cun4ks5uKdbVgaJpZM4MBFGI>
.
|
Yeah sounds strange, I just wanted you to beware that f1 score is harmonic mean of recall and precision. In this case, since recall and precision is same, it does make sense that you have the same number. But still probably something is wrong. |
I found a package released recently(attached below). By the way, these metrics should be calculated on every epoch not on every batch. |
Thank Avcu, I tried keras-metrics, still showed acc same with precision. |
I got it. I've tried a binary classification on google servers. It is all about how many units one has on the last layer. If you have only one, everything is okay but if you have two of them it's not working. On the other hand, for the binary classification using two units with a softmax activation function(probably that's what you do as well) is often suggested for a better convergence as far as I know. You can check my code below, I will create a post under this keras-vis library's issue. |
I've created a pull request to solve the problem(netrack/keras-metrics#4), I hope it'll be accepted soon. from keras import backend as K
def check_units(y_true, y_pred):
if y_pred.shape[1] != 1:
y_pred = y_pred[:,1:2]
y_true = y_true[:,1:2]
return y_true, y_pred
def precision(y_true, y_pred):
y_true, y_pred = check_units(y_true, y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def recall(y_true, y_pred):
y_true, y_pred = check_units(y_true, y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def f1(y_true, y_pred):
def recall(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
y_true, y_pred = check_units(y_true, y_pred)
precision = precision(y_true, y_pred)
recall = recall(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))
#you can use it as following
model.compile(loss='binary_crossentropy',
optimizer= "adam",
metrics=[precision,recall, f1]) |
Hi, Avcu, can you check and confirm your code,I adopted it by
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=[
'accuracy', mcor, precision, recall, f1 ] )
and that showed something incorrectly for mcor:
1/7 [===>..........................] - ETA: 1:22 - loss: 3.9758 - acc:
0.2500 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.2500 - f1:
0.4000
2/7 [=======>......................] - ETA: 36s - loss: 1.9879 - acc:
0.6250 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.6250 - f1:
0.7000
3/7 [===========>..................] - ETA: 20s - loss: 1.3253 - acc:
0.7500 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.7500 - f1:
0.8000
4/7 [================>.............] - ETA: 12s - loss: 0.9940 - acc:
0.8125 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8125 - f1:
0.8500
5/7 [====================>.........] - ETA: 7s - loss: 0.7952 - acc:
0.8500 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8500 - f1:
0.8800
6/7 [========================>.....] - ETA: 3s - loss: 0.6626 - acc:
0.8750 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8750 - f1:
0.9000
7/7 [==============================] - 20s 3s/step - loss: 0.5680 -
acc: 0.8929 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 0.8929 -
f1: 0.9143
Epoch 2/5
1/7 [===>..........................] - ETA: 5s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
2/7 [=======>......................] - ETA: 4s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
3/7 [===========>..................] - ETA: 3s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
4/7 [================>.............] - ETA: 2s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
5/7 [====================>.........] - ETA: 1s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
6/7 [========================>.....] - ETA: 0s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
7/7 [==============================] - 7s 993ms/step - loss:
1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 -
recall: 1.0000 - f1: 1.0000
Epoch 3/5
1/7 [===>..........................] - ETA: 5s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
2/7 [=======>......................] - ETA: 4s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
3/7 [===========>..................] - ETA: 3s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
4/7 [================>.............] - ETA: 2s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
5/7 [====================>.........] - ETA: 1s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
6/7 [========================>.....] - ETA: 0s - loss: 1.0960e-07 -
acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 - recall: 1.0000 -
f1: 1.0000
7/7 [==============================] - 7s 997ms/step - loss:
1.0960e-07 - acc: 1.0000 - mcor: 0.0000e+00 - precision: 1.0000 -
recall: 1.0000 - f1: 1.0000
Epoch 4/5
…On Sun, Jul 29, 2018 at 11:30 PM, Avcu ***@***.***> wrote:
I've created a pull request to solve the problem, I hope it'll be accepted
soon.
For those who wanna use custom method, I corrected the unnir's code as
following
def check_units(y_true, y_pred):
if y_pred.shape[1] != 1:
y_pred = y_pred[:,1:2]
y_true = y_true[:,1:2]
return y_true, y_pred
from keras import backend as K
def mcor(y_true, y_pred):
#matthews_correlation
y_true, y_pred = check_units(y_true, y_pred)
y_pred_pos = K.round(K.clip(y_pred, 0, 1))
y_pred_neg = 1 - y_pred_pos
y_pos = K.round(K.clip(y_true, 0, 1))
y_neg = 1 - y_pos
tp = K.sum(y_pos * y_pred_pos)
tn = K.sum(y_neg * y_pred_neg)
fp = K.sum(y_neg * y_pred_pos)
fn = K.sum(y_pos * y_pred_neg)
numerator = (tp * tn - fp * fn)
denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
return numerator / (denominator + K.epsilon())
def precision(y_true, y_pred):
"""Precision metric. Only computes a batch-wise average of precision. Computes the precision, a metric for multi-label classification of how many selected items are relevant. """
y_true, y_pred = check_units(y_true, y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def recall(y_true, y_pred):
"""Recall metric. Only computes a batch-wise average of recall. Computes the recall, a metric for multi-label classification of how many relevant items are selected. """
y_true, y_pred = check_units(y_true, y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def f1(y_true, y_pred):
def recall(y_true, y_pred):
"""Recall metric. Only computes a batch-wise average of recall. Computes the recall, a metric for multi-label classification of how many relevant items are selected. """
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
"""Precision metric. Only computes a batch-wise average of precision. Computes the precision, a metric for multi-label classification of how many selected items are relevant. """
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
y_true, y_pred = check_units(y_true, y_pred)
precision = precision(y_true, y_pred)
recall = recall(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))
#you can use it like thismodel.compile(loss='binary_crossentropy',
optimizer= "adam",
metrics=[mcor,recall, f1])
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5400 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHZqJxuMwM8lMdW-z7NIArxyiYQmHR5rks5uLov6gaJpZM4MBFGI>
.
|
@hbb21st hi, I checked it, and everything seems perfect to me. By the way, I don't know so much about this metric, what it should be around or etc. to be honest. Is it what you're looking for, if so you can give some insight. For the sake of completeness of this issue, I will remove it from my code above. |
@hbb21st halloo, I had the same problem. In my case it caused by using softmax in binary classification problem with output dimension of 2 ([0,1] or [1,0]). So when I changed the output dimension to 1 ([0] or [1]) with sigmoid activation function, then it worked just fine. |
@Avcu hi, the code works perfectly. However, when I tried to use the
while the same weight saved during training give those values: val_loss: 0.4253 - val_precision: 0.9748 - val_recall: 0.8926 - val_f1: 0.9319 where did I go wrong in this case? |
I am using categorical_crossentropy and softmax and have 2 labels. I also use to_categorica. I used @Avcu 's edit on the code. However, I get equal precision and recall every time. Does this mean I have a problem? Epoch 1/5 |
Same problem when I use binary_crossentropy |
I think I solved this.. |
EQUALITY PROBLEMI had exactly ran into the same problem (accuracy, precision, recall are f1score are equal to each other both on the training set and the validation set for a balanced task) with another dataset which made me look into this, which we can call it the EQUALITY PROBLEM. I use: I have combined all the replies and tried all the codes above, and finally come up with two versions. CONCLUTION: The following is the results of the first version, when I try "categorical classfication using softmax with one-hot output", I HAVE EQUALITY PROBLEM. However, when I try "binary classfication using sigmoid with 0-1 vector output", I DO NOT have EQUALITY PROBLEM.Here is all my codes
For the "categorical classfication using softmax with one-hot output", I get the following results, which shows I have the EQUALITY PROBLEM.
For the "binary classfication using sigmoid with 0-1 vector output", I get the following results, which shows I DO NOT have the EQUALITY PROBLEM.
I find it very interesting, but I don't know why, can anyone explain why this happens? Thank you! |
Who solved the problem?I also met this problem, who can help me?@jingerx |
Hi @unnir nice implementation However I have a question: why did you re implement recall & precision two times? (I mean, one "common" implementation and the other, exactly the same, but inside f_score metric) Does it have any advantage? Thanks! |
The relavant metrics are no longer supported in keras 2.x. Closing for good housekeeping. |
@nsarafianos If you have created the confusion matrix for 10-classes then can you please share your code? |
Does it work for multiclass classification problem? |
@unnir, @isaacgerg , Any solution found? |
It is simply because this custom mcor metric is not compatible with binary categorical label and softmax. If you directly implement the original mcor and f1 metric with softmax, the label distribution of the dataset (or batches) greatly impact the final result. You can try the modified code as following:
|
keras 1.2.2, tf-gpu -.12.1
Example code to show issue:
yields output:
The text was updated successfully, but these errors were encountered: