LREC-Irony-Detection-Ensemble-Classifier

Repository for tables and code for the 2022 LREC paper "Tackling Irony Detection using Ensemble Classifiers"

Table 1 Results of fine-tuning ten Bertweet models on Task A with original train data

Model	f1 score on Task A
0	0.7851
1	0.7448
2	0.7644
3	0.7581
4	0.7865
5	0.777
6	0.7455
7	0.5846
8	0.7585
9	0.7666
Mean	0.7471
Ensemble	0.7816

Table 2 Results of fine-tuning ten Bertweet models on Task B with original train data

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
0	0.7731	0.5706	0.8067	0.7475	0.5486	0.1798
1	0.794	0.6131	0.8353	0.7717	0.5844	0.2609
2	0.7957	0.5921	0.8358	0.7692	0.6337	0.1299
3	0.7815	0.5691	0.8266	0.709	0.5588	0.1818
4	0.7577	0.4986	0.821	0.7413	0.3178	0.1143
5	0.7894	0.6369	0.8388	0.7768	0.6627	0.2692
6	0.7707	0.5492	0.8386	0.7954	0.5324	0.0303
7	0.7916	0.6081	0.8328	0.7531	0.6216	0.225
8	0.771	0.5649	0.8075	0.7184	0.6071	0.1266
9	0.7798	0.5703	0.8394	0.7692	0.5152	0.1573
Mean	0.7804	0.5773	0.8282	0.7552	0.5582	0.1675
Ensemble	0.7977	0.5902	0.8475	0.7817	0.5906	0.1408

Table 3 Results of fine-tuning Bertweet models on Task A with original train data + back-translated dataset

Model	f1 score on Task A
es	0.7708
fi	0.7636
ru	0.7663
pl	0.7846
de	0.7657
cs	0.7945
nl	0.7732
fr	0.7657
Mean	0.773
Ensemble	0.7868

Table 4 - Results of fine-tuning Bertweet models on Task B with original train data + back-translated dataset

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
es	0.7947	0.6501	0.8269	0.7857	0.6517	0.3361
fi	0.7253	0.5219	0.7687	0.6522	0.48	0.1867
ru	0.7668	0.5719	0.7959	0.7214	0.5298	0.2406
pl	0.7499	0.5672	0.7687	0.6593	0.6087	0.2321
de	0.7477	0.5561	0.7908	0.6957	0.5156	0.2222
cs	0.7832	0.5741	0.8271	0.7526	0.4923	0.2243
nl	0.7918	0.628	0.829	0.7611	0.6506	0.2712
fr	0.7744	0.5845	0.8118	0.7236	0.5479	0.2545
Mean	0.7667	0.5817	0.8024	0.7189	0.5596	0.246
Ensemble	0.794	0.6106	0.8353	0.7641	0.5986	0.2444

Table 5 Results of fine-tuning ten Bertweet models on Task B, original train data + antonym + negation not-ironic cases

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
0	0.7338	0.5307	0.802	0.7297	0.5306	0.0606
1	0.782	0.558	0.8259	0.7635	0.5854	0.0571
2	0.7543	0.5445	0.812	0.7421	0.5405	0.0833
3	0.7549	0.5476	0.7864	0.6918	0.5325	0.1798
4	0.7384	0.592	0.7648	0.6683	0.623	0.3119
5	0.7496	0.5774	0.7979	0.719	0.5875	0.2051
6	0.7265	0.5746	0.739	0.6712	0.6111	0.2771
7	0.7512	0.5594	0.8017	0.7117	0.5306	0.1935
8	0.7442	0.5611	0.7667	0.6845	0.625	0.1684
9	0.7493	0.5211	0.8046	0.7293	0.3636	0.1871
Mean	0.7484	0.5567	0.7901	0.7111	0.553	0.1724
Ensemble	0.7662	0.5775	0.816	0.7454	0.6203	0.1282

Table 6 Results of fine-tuning ten Bertweet models on Task B, original train data + antonym not-ironic cases

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
0	0.7647	0.5934	0.8052	0.7382	0.5828	0.2474
1	0.772	0.5775	0.8194	0.7722	0.5475	0.1707
2	0.7646	0.5687	0.8123	0.7296	0.5352	0.1978
3	0.7463	0.5635	0.79	0.7128	0.5481	0.2029
4	0.757	0.5876	0.7901	0.7089	0.5806	0.2708
5	0.7275	0.5317	0.7577	0.651	0.5882	0.1299
6	0.7609	0.5711	0.8017	0.7283	0.6244	0.1299
7	0.7325	0.555	0.7629	0.6874	0.5581	0.2115
8	0.7446	0.5073	0.7896	0.6819	0.5286	0.029
9	0.7617	0.5881	0.7909	0.7167	0.6228	0.2222
Mean	0.7532	0.5644	0.792	0.7127	0.5716	0.1812
Ensemble	0.78	0.5996	0.8226	0.7481	0.625	0.2025

Table 7 Results of fine-tuning ten Bertweet models on Task B, 1000 elements for each class

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
0	0.6985	0.4987	0.7108	0.635	0.5784	0.0706
1	0.7274	0.5931	0.7518	0.7085	0.6421	0.2698
2	0.7078	0.5689	0.7134	0.6771	0.6333	0.2517
3	0.7211	0.5488	0.7367	0.697	0.55	0.2115
4	0.7533	0.6011	0.786	0.7473	0.6061	0.2649
5	0.7332	0.5741	0.753	0.6878	0.6188	0.237
6	0.7396	0.5712	0.7719	0.7111	0.5667	0.2353
7	0.7323	0.5911	0.7576	0.6868	0.6395	0.2804
8	0.7459	0.5762	0.766	0.7194	0.5871	0.2326
9	0.719	0.5763	0.7291	0.7224	0.6087	0.2452
Mean	0.7278	0.57	0.7476	0.6992	0.6031	0.2299
Ensemble	0.7608	0.6003	0.7886	0.7379	0.6462	0.2286

Table 8 Results of fine-tuning ten Bertweet models on Task B, 3000 elements for not ironic and ironic by polarity clash classes - 1000 elements for situational irony and other irony

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
0	0.7669	0.5546	0.8044	0.7287	0.5914	0.0941
1	0.7633	0.5703	0.7913	0.7611	0.5887	0.14
2	0.7162	0.5598	0.7293	0.6784	0.5921	0.2394
3	0.763	0.5568	0.7923	0.6757	0.5455	0.2136
4	0.7066	0.5513	0.7089	0.6469	0.6162	0.2333
5	0.7675	0.5964	0.7987	0.7419	0.5442	0.3008
6	0.7653	0.561	0.8078	0.7067	0.6154	0.1143
7	0.7564	0.5618	0.803	0.7454	0.5466	0.1522
8	0.7401	0.5398	0.7703	0.6711	0.5697	0.1481
9	0.7638	0.5711	0.8135	0.7474	0.6211	0.1026
Mean	0.7509	0.5623	0.7819	0.7103	0.5831	0.1738
Ensemble	0.7742	0.5886	0.8122	0.7456	0.6279	0.1687

Table 9 Results of training the proposed combination ensemble model ten times

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
0	0.7902	0.593	0.8432	0.7855	0.6133	0.1299
1	0.7797	0.5921	0.8198	0.7531	0.6135	0.1818
2	0.7807	0.5956	0.8247	0.7649	0.6026	0.1905
3	0.7733	0.5996	0.8175	0.7468	0.6093	0.2247
4	0.7919	0.5866	0.8354	0.7624	0.5987	0.15
5	0.7833	0.609	0.8237	0.7684	0.6265	0.2174
6	0.7851	0.6006	0.8303	0.7619	0.6375	0.1728
7	0.7779	0.5849	0.8229	0.7558	0.596	0.1647
8	0.7771	0.6175	0.8209	0.7579	0.6115	0.2796
9	0.8001	0.6182	0.8414	0.7807	0.6626	0.1882
Mean	0.7839	0.5997	0.828	0.7637	0.6171	0.19

Table 10 Results of ten fold cross-validation

Model	f1 score	f1 score B	f1 on 0	f1 on 1	f1 on 2	f1 on 3
0	0.7942	0.5951	0.851	0.7531	0.6667	0.1096
1	0.7787	0.6188	0.8202	0.7386	0.6369	0.2796
2	0.8325	0.6621	0.8733	0.8083	0.646	0.321
3	0.8047	0.6278	0.8551	0.7356	0.6241	0.2963
4	0.7711	0.5854	0.8397	0.6985	0.6061	0.1972
5	0.8139	0.6227	0.8653	0.7791	0.6434	0.2029
6	0.8013	0.5889	0.8484	0.7443	0.5921	0.1707
7	0.7948	0.6315	0.8528	0.753	0.6383	0.2821
8	0.8142	0.6375	0.8544	0.7507	0.6748	0.2703
9	0.8321	0.6514	0.871	0.7619	0.6871	0.2857
Mean	0.8038	0.6221	0.8531	0.7523	0.6415	0.2415

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
software_LREC_irony_detection_ensemble_classifiers		software_LREC_irony_detection_ensemble_classifiers
README.md		README.md
statistical_tests.md		statistical_tests.md
tables.md		tables.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LREC-Irony-Detection-Ensemble-Classifier

About

Releases

Packages

Languages

ChristophTurban/LREC-Irony-Detection-Ensemble-Classifier

Folders and files

Latest commit

History

Repository files navigation

LREC-Irony-Detection-Ensemble-Classifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages