Different Classification Reports for IOBES and BILOU #71

rsuwaileh · 2020-11-14T17:59:36Z

I compared the results of the same model on the same test data with both IOBES and BILOU schemes. I get exactly the same precision, recall, and F1 scores which I expect:

Precision = 0.6762295081967213
Recall = 0.5045871559633027
F1 = 0.5779334500875658

However, I get different classification reports as shown below! Any explanation for this?
BILOU:

              precision    recall  f1-score   support

         LOC      0.676     0.505     0.578       327

   micro avg      0.676     0.505     0.578       327
   macro avg      0.676     0.505     0.578       327
weighted avg      0.676     0.505     0.578       327

IOBES:

              precision    recall  f1-score   support

         LOC      0.667     0.503     0.574       314

   micro avg      0.667     0.503     0.574       314
   macro avg      0.667     0.503     0.574       314
weighted avg      0.667     0.503     0.574       314

My Environment

Operating System: Windows 10
Python Version: 3.8.3
Package Version: 1.2.2

The text was updated successfully, but these errors were encountered:

Hironsan · 2020-11-14T20:05:31Z

Please show me the evaluation snippet and the data.

rsuwaileh · 2020-11-21T15:22:14Z

I generated a small example from my dataset:

z_true = [['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'E-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC', 'O', 'S-LOC', 'O', 'S-LOC', 'O', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'O', 'O', 'O', 'O', 'S-LOC', 'S-LOC', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'B-LOC', 'I-LOC', 'E-LOC', 'O', 'S-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'],
['O', 'B-LOC', 'E-LOC', 'O', 'O', 'B-LOC', 'E-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]

z_pred = [['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'E-LOC', 'B-LOC', 'I-LOC', 'E-LOC', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC', 'O', 'S-LOC', 'O', 'S-LOC', 'O', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'O', 'O', 'O', 'O', 'S-LOC', 'S-LOC', 'B-LOC', 'I-LOC', 'E-LOC', 'O', 'O'], 
['O', 'S-LOC', 'O', 'O', 'O', 'O', 'S-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'O', 'B-LOC', 'E-LOC', 'B-LOC', 'E-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]
scheme = IOBES
average = "micro"
evaluate(z_true, z_pred, scheme, average)

The results I get:

0.6666666666666666	0.8	0.7272727272727272
              precision    recall  f1-score   support

         LOC      0.667     0.800     0.727        10

   micro avg      0.667     0.800     0.727        10
   macro avg      0.667     0.800     0.727        10
weighted avg      0.667     0.800     0.727        10

When I change the scheme to BILOU using the same example and lables above:

z_true = [['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'L-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'L-LOC', 'O', 'U-LOC', 'O', 'U-LOC', 'O', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'O', 'O', 'O', 'O', 'U-LOC', 'U-LOC', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'B-LOC', 'I-LOC', 'L-LOC', 'O', 'U-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'],
['O', 'B-LOC', 'L-LOC', 'O', 'O', 'B-LOC', 'L-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]

z_pred = [['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'L-LOC', 'B-LOC', 'I-LOC', 'L-LOC', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'L-LOC', 'O', 'U-LOC', 'O', 'U-LOC', 'O', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'O', 'O', 'O', 'O', 'U-LOC', 'U-LOC', 'B-LOC', 'I-LOC', 'L-LOC', 'O', 'O'], 
['O', 'U-LOC', 'O', 'O', 'O', 'O', 'U-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'], 
['O', 'O', 'O', 'B-LOC', 'L-LOC', 'B-LOC', 'L-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]
scheme = BILOU
average = "micro"
evaluate(z_true, z_pred, scheme, average)

I get the same P, R, & F1. However, the report is different. I'm using micro average with both schemes:

0.6666666666666666	0.8	0.7272727272727272
              precision    recall  f1-score   support

         LOC      0.625     0.556     0.588         9

   micro avg      0.625     0.556     0.588         9
   macro avg      0.625     0.556     0.588         9
weighted avg      0.625     0.556     0.588         9

This is the evaluate function that uses seqeval:

def evaluate(y_true, y_pred, scheme, average):
    print(precision_score(y_true, y_pred, average = average, mode='strict', scheme=scheme), end='\t')
    print(recall_score(y_true, y_pred, average = average, mode='strict', scheme=scheme), end='\t')
    print(f1_score(y_true, y_pred, average = average, mode='strict', scheme=scheme))
    print(classification_report(y_true, y_pred, digits=3))

Hironsan · 2020-11-21T23:27:03Z

You just forgot to specify mode and scheme to classification_report. If it's specified correctly, the result is the same:

def evaluate(y_true, y_pred, scheme, average):
    print(precision_score(y_true, y_pred, average=average, mode='strict', scheme=scheme), end='\t')
    print(recall_score(y_true, y_pred, average=average, mode='strict', scheme=scheme), end='\t')
    print(f1_score(y_true, y_pred, average=average, mode='strict', scheme=scheme))
    print(classification_report(y_true, y_pred, digits=3, mode='strict', scheme=scheme))

# IOBES
0.6666666666666666      0.8     0.7272727272727272
              precision    recall  f1-score   support

         LOC      0.667     0.800     0.727        10

   micro avg      0.667     0.800     0.727        10
   macro avg      0.667     0.800     0.727        10
weighted avg      0.667     0.800     0.727        10

# BILOU
0.6666666666666666      0.8     0.7272727272727272
              precision    recall  f1-score   support

         LOC      0.667     0.800     0.727        10

   micro avg      0.667     0.800     0.727        10
   macro avg      0.667     0.800     0.727        10
weighted avg      0.667     0.800     0.727        10

Hironsan closed this as completed Nov 21, 2020

Hironsan added the question Further information is requested label Nov 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different Classification Reports for IOBES and BILOU #71

Different Classification Reports for IOBES and BILOU #71

rsuwaileh commented Nov 14, 2020 •

edited

Loading

Hironsan commented Nov 14, 2020 •

edited

Loading

rsuwaileh commented Nov 21, 2020

Hironsan commented Nov 21, 2020

Different Classification Reports for IOBES and BILOU #71

Different Classification Reports for IOBES and BILOU #71

Comments

rsuwaileh commented Nov 14, 2020 • edited Loading

Hironsan commented Nov 14, 2020 • edited Loading

rsuwaileh commented Nov 21, 2020

Hironsan commented Nov 21, 2020

rsuwaileh commented Nov 14, 2020 •

edited

Loading

Hironsan commented Nov 14, 2020 •

edited

Loading