Repository for tables and code for the 2022 LREC paper "Tackling Irony Detection using Ensemble Classifiers"
Table 1 Results of fine-tuning ten Bertweet models on Task A with original train data
Model | f1 score on Task A |
---|---|
0 | 0.7851 |
1 | 0.7448 |
2 | 0.7644 |
3 | 0.7581 |
4 | 0.7865 |
5 | 0.777 |
6 | 0.7455 |
7 | 0.5846 |
8 | 0.7585 |
9 | 0.7666 |
Mean | 0.7471 |
Ensemble | 0.7816 |
Table 2 Results of fine-tuning ten Bertweet models on Task B with original train data
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
0 | 0.7731 | 0.5706 | 0.8067 | 0.7475 | 0.5486 | 0.1798 |
1 | 0.794 | 0.6131 | 0.8353 | 0.7717 | 0.5844 | 0.2609 |
2 | 0.7957 | 0.5921 | 0.8358 | 0.7692 | 0.6337 | 0.1299 |
3 | 0.7815 | 0.5691 | 0.8266 | 0.709 | 0.5588 | 0.1818 |
4 | 0.7577 | 0.4986 | 0.821 | 0.7413 | 0.3178 | 0.1143 |
5 | 0.7894 | 0.6369 | 0.8388 | 0.7768 | 0.6627 | 0.2692 |
6 | 0.7707 | 0.5492 | 0.8386 | 0.7954 | 0.5324 | 0.0303 |
7 | 0.7916 | 0.6081 | 0.8328 | 0.7531 | 0.6216 | 0.225 |
8 | 0.771 | 0.5649 | 0.8075 | 0.7184 | 0.6071 | 0.1266 |
9 | 0.7798 | 0.5703 | 0.8394 | 0.7692 | 0.5152 | 0.1573 |
Mean | 0.7804 | 0.5773 | 0.8282 | 0.7552 | 0.5582 | 0.1675 |
Ensemble | 0.7977 | 0.5902 | 0.8475 | 0.7817 | 0.5906 | 0.1408 |
Table 3 Results of fine-tuning Bertweet models on Task A with original train data + back-translated dataset
Model | f1 score on Task A |
---|---|
es | 0.7708 |
fi | 0.7636 |
ru | 0.7663 |
pl | 0.7846 |
de | 0.7657 |
cs | 0.7945 |
nl | 0.7732 |
fr | 0.7657 |
Mean | 0.773 |
Ensemble | 0.7868 |
Table 4 - Results of fine-tuning Bertweet models on Task B with original train data + back-translated dataset
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
es | 0.7947 | 0.6501 | 0.8269 | 0.7857 | 0.6517 | 0.3361 |
fi | 0.7253 | 0.5219 | 0.7687 | 0.6522 | 0.48 | 0.1867 |
ru | 0.7668 | 0.5719 | 0.7959 | 0.7214 | 0.5298 | 0.2406 |
pl | 0.7499 | 0.5672 | 0.7687 | 0.6593 | 0.6087 | 0.2321 |
de | 0.7477 | 0.5561 | 0.7908 | 0.6957 | 0.5156 | 0.2222 |
cs | 0.7832 | 0.5741 | 0.8271 | 0.7526 | 0.4923 | 0.2243 |
nl | 0.7918 | 0.628 | 0.829 | 0.7611 | 0.6506 | 0.2712 |
fr | 0.7744 | 0.5845 | 0.8118 | 0.7236 | 0.5479 | 0.2545 |
Mean | 0.7667 | 0.5817 | 0.8024 | 0.7189 | 0.5596 | 0.246 |
Ensemble | 0.794 | 0.6106 | 0.8353 | 0.7641 | 0.5986 | 0.2444 |
Table 5 Results of fine-tuning ten Bertweet models on Task B, original train data + antonym + negation not-ironic cases
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
0 | 0.7338 | 0.5307 | 0.802 | 0.7297 | 0.5306 | 0.0606 |
1 | 0.782 | 0.558 | 0.8259 | 0.7635 | 0.5854 | 0.0571 |
2 | 0.7543 | 0.5445 | 0.812 | 0.7421 | 0.5405 | 0.0833 |
3 | 0.7549 | 0.5476 | 0.7864 | 0.6918 | 0.5325 | 0.1798 |
4 | 0.7384 | 0.592 | 0.7648 | 0.6683 | 0.623 | 0.3119 |
5 | 0.7496 | 0.5774 | 0.7979 | 0.719 | 0.5875 | 0.2051 |
6 | 0.7265 | 0.5746 | 0.739 | 0.6712 | 0.6111 | 0.2771 |
7 | 0.7512 | 0.5594 | 0.8017 | 0.7117 | 0.5306 | 0.1935 |
8 | 0.7442 | 0.5611 | 0.7667 | 0.6845 | 0.625 | 0.1684 |
9 | 0.7493 | 0.5211 | 0.8046 | 0.7293 | 0.3636 | 0.1871 |
Mean | 0.7484 | 0.5567 | 0.7901 | 0.7111 | 0.553 | 0.1724 |
Ensemble | 0.7662 | 0.5775 | 0.816 | 0.7454 | 0.6203 | 0.1282 |
Table 6 Results of fine-tuning ten Bertweet models on Task B, original train data + antonym not-ironic cases
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
0 | 0.7647 | 0.5934 | 0.8052 | 0.7382 | 0.5828 | 0.2474 |
1 | 0.772 | 0.5775 | 0.8194 | 0.7722 | 0.5475 | 0.1707 |
2 | 0.7646 | 0.5687 | 0.8123 | 0.7296 | 0.5352 | 0.1978 |
3 | 0.7463 | 0.5635 | 0.79 | 0.7128 | 0.5481 | 0.2029 |
4 | 0.757 | 0.5876 | 0.7901 | 0.7089 | 0.5806 | 0.2708 |
5 | 0.7275 | 0.5317 | 0.7577 | 0.651 | 0.5882 | 0.1299 |
6 | 0.7609 | 0.5711 | 0.8017 | 0.7283 | 0.6244 | 0.1299 |
7 | 0.7325 | 0.555 | 0.7629 | 0.6874 | 0.5581 | 0.2115 |
8 | 0.7446 | 0.5073 | 0.7896 | 0.6819 | 0.5286 | 0.029 |
9 | 0.7617 | 0.5881 | 0.7909 | 0.7167 | 0.6228 | 0.2222 |
Mean | 0.7532 | 0.5644 | 0.792 | 0.7127 | 0.5716 | 0.1812 |
Ensemble | 0.78 | 0.5996 | 0.8226 | 0.7481 | 0.625 | 0.2025 |
Table 7 Results of fine-tuning ten Bertweet models on Task B, 1000 elements for each class
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
0 | 0.6985 | 0.4987 | 0.7108 | 0.635 | 0.5784 | 0.0706 |
1 | 0.7274 | 0.5931 | 0.7518 | 0.7085 | 0.6421 | 0.2698 |
2 | 0.7078 | 0.5689 | 0.7134 | 0.6771 | 0.6333 | 0.2517 |
3 | 0.7211 | 0.5488 | 0.7367 | 0.697 | 0.55 | 0.2115 |
4 | 0.7533 | 0.6011 | 0.786 | 0.7473 | 0.6061 | 0.2649 |
5 | 0.7332 | 0.5741 | 0.753 | 0.6878 | 0.6188 | 0.237 |
6 | 0.7396 | 0.5712 | 0.7719 | 0.7111 | 0.5667 | 0.2353 |
7 | 0.7323 | 0.5911 | 0.7576 | 0.6868 | 0.6395 | 0.2804 |
8 | 0.7459 | 0.5762 | 0.766 | 0.7194 | 0.5871 | 0.2326 |
9 | 0.719 | 0.5763 | 0.7291 | 0.7224 | 0.6087 | 0.2452 |
Mean | 0.7278 | 0.57 | 0.7476 | 0.6992 | 0.6031 | 0.2299 |
Ensemble | 0.7608 | 0.6003 | 0.7886 | 0.7379 | 0.6462 | 0.2286 |
Table 8 Results of fine-tuning ten Bertweet models on Task B, 3000 elements for not ironic and ironic by polarity clash classes - 1000 elements for situational irony and other irony
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
0 | 0.7669 | 0.5546 | 0.8044 | 0.7287 | 0.5914 | 0.0941 |
1 | 0.7633 | 0.5703 | 0.7913 | 0.7611 | 0.5887 | 0.14 |
2 | 0.7162 | 0.5598 | 0.7293 | 0.6784 | 0.5921 | 0.2394 |
3 | 0.763 | 0.5568 | 0.7923 | 0.6757 | 0.5455 | 0.2136 |
4 | 0.7066 | 0.5513 | 0.7089 | 0.6469 | 0.6162 | 0.2333 |
5 | 0.7675 | 0.5964 | 0.7987 | 0.7419 | 0.5442 | 0.3008 |
6 | 0.7653 | 0.561 | 0.8078 | 0.7067 | 0.6154 | 0.1143 |
7 | 0.7564 | 0.5618 | 0.803 | 0.7454 | 0.5466 | 0.1522 |
8 | 0.7401 | 0.5398 | 0.7703 | 0.6711 | 0.5697 | 0.1481 |
9 | 0.7638 | 0.5711 | 0.8135 | 0.7474 | 0.6211 | 0.1026 |
Mean | 0.7509 | 0.5623 | 0.7819 | 0.7103 | 0.5831 | 0.1738 |
Ensemble | 0.7742 | 0.5886 | 0.8122 | 0.7456 | 0.6279 | 0.1687 |
Table 9 Results of training the proposed combination ensemble model ten times
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
0 | 0.7902 | 0.593 | 0.8432 | 0.7855 | 0.6133 | 0.1299 |
1 | 0.7797 | 0.5921 | 0.8198 | 0.7531 | 0.6135 | 0.1818 |
2 | 0.7807 | 0.5956 | 0.8247 | 0.7649 | 0.6026 | 0.1905 |
3 | 0.7733 | 0.5996 | 0.8175 | 0.7468 | 0.6093 | 0.2247 |
4 | 0.7919 | 0.5866 | 0.8354 | 0.7624 | 0.5987 | 0.15 |
5 | 0.7833 | 0.609 | 0.8237 | 0.7684 | 0.6265 | 0.2174 |
6 | 0.7851 | 0.6006 | 0.8303 | 0.7619 | 0.6375 | 0.1728 |
7 | 0.7779 | 0.5849 | 0.8229 | 0.7558 | 0.596 | 0.1647 |
8 | 0.7771 | 0.6175 | 0.8209 | 0.7579 | 0.6115 | 0.2796 |
9 | 0.8001 | 0.6182 | 0.8414 | 0.7807 | 0.6626 | 0.1882 |
Mean | 0.7839 | 0.5997 | 0.828 | 0.7637 | 0.6171 | 0.19 |
Table 10 Results of ten fold cross-validation
Model | f1 score | f1 score B | f1 on 0 | f1 on 1 | f1 on 2 | f1 on 3 |
---|---|---|---|---|---|---|
0 | 0.7942 | 0.5951 | 0.851 | 0.7531 | 0.6667 | 0.1096 |
1 | 0.7787 | 0.6188 | 0.8202 | 0.7386 | 0.6369 | 0.2796 |
2 | 0.8325 | 0.6621 | 0.8733 | 0.8083 | 0.646 | 0.321 |
3 | 0.8047 | 0.6278 | 0.8551 | 0.7356 | 0.6241 | 0.2963 |
4 | 0.7711 | 0.5854 | 0.8397 | 0.6985 | 0.6061 | 0.1972 |
5 | 0.8139 | 0.6227 | 0.8653 | 0.7791 | 0.6434 | 0.2029 |
6 | 0.8013 | 0.5889 | 0.8484 | 0.7443 | 0.5921 | 0.1707 |
7 | 0.7948 | 0.6315 | 0.8528 | 0.753 | 0.6383 | 0.2821 |
8 | 0.8142 | 0.6375 | 0.8544 | 0.7507 | 0.6748 | 0.2703 |
9 | 0.8321 | 0.6514 | 0.871 | 0.7619 | 0.6871 | 0.2857 |
Mean | 0.8038 | 0.6221 | 0.8531 | 0.7523 | 0.6415 | 0.2415 |