<h3>Text classification</h3>
<p>Konrad Przewłoka</p>

<h4>Imports</h4>

In [243]:
#!pip install datasets
#!pip install transformers
#!pip install fasttext_win
#!pip install lime
from datasets import load_dataset
import numpy as np
import pandas as pd
import transformers
import fasttext
import lime.lime_text 
import sklearn
import sklearn.naive_bayes
import sklearn.metrics
import sklearn.pipeline
import sklearn.feature_extraction.text
from prettytable import PrettyTable

<h4>Load dataset</h4>
<p>Dataset consits of two tasks:</p>
<ul>
    <li> Task one - binary classification
    <li> Task two - multiclass classification
</ul>

In [106]:
dataset_task1 = load_dataset("poleval2019_cyberbullying", 'task01')
dataset_task2 = load_dataset("poleval2019_cyberbullying", 'task02')

Found cached dataset poleval2019_cyberbullying (C:/Users/KPR/.cache/huggingface/datasets/poleval2019_cyberbullying/task01/1.0.0/ce6060c56dae43c469bab309a7573b86299b0bcc2484e85cfe0ae70b5f770450)
100%|██████████| 2/2 [00:00<00:00, 126.38it/s]
Found cached dataset poleval2019_cyberbullying (C:/Users/KPR/.cache/huggingface/datasets/poleval2019_cyberbullying/task02/1.0.0/ce6060c56dae43c469bab309a7573b86299b0bcc2484e85cfe0ae70b5f770450)
100%|██████████| 2/2 [00:00<00:00, 153.67it/s]


<h4>Utility data structures</h4>

In [107]:
class Results:
    def __init__(self,task,y_true,Y_pred):
        self.task = task
        if task == 'task1':
            self.accuracy = sklearn.metrics.accuracy_score(y_true,Y_pred)
            self.f1 = sklearn.metrics.f1_score(y_true,Y_pred)
            self.recall = sklearn.metrics.recall_score(y_true,Y_pred)
            self.precision = sklearn.metrics.precision_score(y_true,Y_pred)
        if task == 'task2':
            self.accuracy = sklearn.metrics.accuracy_score(y_true,Y_pred)
            self.recall = sklearn.metrics.recall_score(y_true,Y_pred,average='macro')
            self.precision = sklearn.metrics.precision_score(y_true,Y_pred, average='macro')
            self.f1_micro = sklearn.metrics.f1_score(y_true,Y_pred,average="micro")
            self.f1_macro = sklearn.metrics.f1_score(y_true,Y_pred,average="macro")
        
classifiers_results={
    "bayesian":{},
    "fasttext":{},
    "transformer":{}
}

<h4>Bayesian classifier</h4>

In [108]:
gnb_task1 = sklearn.naive_bayes.GaussianNB()
gnb_task2 = sklearn.naive_bayes.GaussianNB()

vectorizer_taks1 = sklearn.feature_extraction.text.TfidfVectorizer()
vectorizer_taks2 = sklearn.feature_extraction.text.TfidfVectorizer()
vectorizer_taks1.fit(dataset_task1['train']['text'])
vectorizer_taks2.fit(dataset_task2['train']['text'])

task1_train_tfidf = vectorizer_taks1.transform(dataset_task1['train']['text']).toarray()
task1_test_tfidf = vectorizer_taks1.transform(dataset_task1['test']['text']).toarray()
task2_train_tfidf = vectorizer_taks2.transform(dataset_task2['train']['text']).toarray()
task2_test_tfidf = vectorizer_taks2.transform(dataset_task2['test']['text']).toarray()

gnb_task1.fit(task1_train_tfidf,dataset_task1['train']['label'])
gnb_task2.fit(task2_train_tfidf,dataset_task2['train']['label'])

<h4>Bayesian classifier predictions</h4>

In [350]:
task1_pred = gnb_task1.predict(task1_test_tfidf)
task2_pred = gnb_task2.predict(task2_test_tfidf)
classifiers_results['bayesian']['task1']=Results('task1',task1_pred,dataset_task1['test']['label'])
classifiers_results['bayesian']['task2']=Results('task2',task2_pred,dataset_task2['test']['label'])

<h4>Fasttext classifier</h4>

In [110]:
def save_as_fasttext_input_train(dataset, file):
    with open(file, "w",encoding="utf-8") as f:
        for label, text in zip(dataset['label'], dataset['text']):
            f.write(f"__label__{label} {text}\n")

save_as_fasttext_input_train(dataset_task1["train"],'task1_train.txt')
save_as_fasttext_input_train(dataset_task1["test"],'task1_test.txt')
save_as_fasttext_input_train(dataset_task2["train"],'task2_train.txt')
save_as_fasttext_input_train(dataset_task2["test"],'task2_test.txt')

model_task1 = fasttext.supervised("task1_train.txt", 'model1')
model_task2 = fasttext.supervised("task2_train.txt", 'model2')

def flatten(l):
    return [int(item) for sublist in l for item in sublist]



<h4>Fasttest classifier predictions</h4>

In [349]:
task1_pred_fast = np.array(flatten(model_task1.predict(dataset_task1["test"]['text'])))
task2_pred_fast = np.array(flatten(model_task2.predict(dataset_task2["test"]['text'])))
classifiers_results['fasttext']['task1']=Results('task1',task1_pred_fast,dataset_task1['test']['label'])
classifiers_results['fasttext']['task2']=Results('task2',task2_pred_fast,dataset_task2['test']['label'])

<h4>Transformer classifier </h4>

In [205]:
tokenizer = transformers.AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
task1_tokenized = dataset_task1.map(lambda partition: tokenizer(partition["text"], truncation=True), batched=True)
task1_model = transformers.AutoModelForSequenceClassification.from_pretrained("allegro/herbert-base-cased", num_labels=2)
    
task1_args = transformers.TrainingArguments(
        output_dir='./results_task1',
        learning_rate=0.00002,
        num_train_epochs=3,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16
    )
task1_trainer = transformers.Trainer(
        model=task1_model,
        args=task1_args,
        train_dataset=task1_tokenized["train"],
        eval_dataset=task1_tokenized["test"],
        tokenizer=tokenizer,
        data_collator=transformers.DataCollatorWithPadding(tokenizer=tokenizer)
    )

task1_trainer.train()
task1_model.save_pretrained('task1_model')

loading configuration file config.json from cache at C:\Users\KPR/.cache\huggingface\hub\models--allegro--herbert-base-cased\snapshots\50e33e0567be0c0b313832314c586e3df0dc2297\config.json
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.24.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file vocab.json from cache at C:\Users\KPR/.cache\huggingface\hub\models--allegro--herbert-base-cased\snapshots\50e33e0567be0c0b31

Epoch,Training Loss,Validation Loss
1,0.2376,0.349713
2,0.1749,0.391852
3,0.1352,0.409864


Saving model checkpoint to ./results_task1\checkpoint-500
Configuration saved in ./results_task1\checkpoint-500\config.json
Model weights saved in ./results_task1\checkpoint-500\pytorch_model.bin
tokenizer config file saved in ./results_task1\checkpoint-500\tokenizer_config.json
Special tokens file saved in ./results_task1\checkpoint-500\special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 16
Saving model checkpoint to ./results_task1\checkpoint-1000
Configuration saved in ./results_task1\checkpoint-1000\config.json
Model weights saved in ./results_task1\checkpoint-1000\pytorch_model.bin
tokenizer config file saved in ./results_task1\checkpoint-1000\tokenizer_config.json
Special tokens file sav

In [202]:
tokenizer_task2 = transformers.AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
task2_tokenized = dataset_task2.map(lambda partition: tokenizer_task2(partition["text"], truncation=True), batched=True)
task2_model = transformers.AutoModelForSequenceClassification.from_pretrained("allegro/herbert-base-cased", num_labels=3)
    
task2_args = transformers.TrainingArguments(
        output_dir='./results_task2',
        learning_rate=0.00002,
        num_train_epochs=3,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16
    )
task2_trainer = transformers.Trainer(
        model=task2_model,
        args=task2_args,
        train_dataset=task2_tokenized["train"],
        eval_dataset=task2_tokenized["test"],
        tokenizer=tokenizer_task2,
        data_collator=transformers.DataCollatorWithPadding(tokenizer=tokenizer_task2)
    )

task2_trainer.train()
task2_model.save_pretrained('task2_model')

loading configuration file config.json from cache at C:\Users\KPR/.cache\huggingface\hub\models--allegro--herbert-base-cased\snapshots\50e33e0567be0c0b313832314c586e3df0dc2297\config.json
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.24.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file vocab.json from cache at C:\Users\KPR/.cache\huggingface\hub\models--allegro--herbert-base-cased\snapshots\50e33e0567be0c0b31

Epoch,Training Loss,Validation Loss
1,0.3064,0.407738
2,0.2204,0.473807
3,0.1773,0.42735


Saving model checkpoint to ./results_task2\checkpoint-500
Configuration saved in ./results_task2\checkpoint-500\config.json
Model weights saved in ./results_task2\checkpoint-500\pytorch_model.bin
tokenizer config file saved in ./results_task2\checkpoint-500\tokenizer_config.json
Special tokens file saved in ./results_task2\checkpoint-500\special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 16
Saving model checkpoint to ./results_task2\checkpoint-1000
Configuration saved in ./results_task2\checkpoint-1000\config.json
Model weights saved in ./results_task2\checkpoint-1000\pytorch_model.bin
tokenizer config file saved in ./results_task2\checkpoint-1000\tokenizer_config.json
Special tokens file sav

<h4>Transformer classifier predictions</h4>

In [206]:

trainer = transformers.Trainer(model=task1_model,
                      eval_dataset=task1_tokenized["test"],
                      tokenizer=tokenizer)

tmp1=  trainer.predict(task1_tokenized["test"])
classifiers_results['transformer']['task1']=Results('task1',np.argmax(tmp1[0], axis=1),dataset_task1['test']['label'])


No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 1000
  Batch size = 8


In [203]:
trainer2 = transformers.Trainer(model=task2_model,
                      eval_dataset=task2_tokenized["test"],
                      tokenizer=tokenizer_task2)

tmp2 =  trainer2.predict(task2_tokenized["test"])
classifiers_results['transformer']['task2']=Results('task2',np.argmax(tmp2[0], axis=1),dataset_task2['test']['label'])

No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 1000
  Batch size = 8


<h4>Classifiers comparison</h4>

In [207]:
print("TASK1")
x = PrettyTable()
x.field_names = ["model", "accuracy", "f1", "recall","precision"]
x.add_row(["bayesian", classifiers_results['bayesian']['task1'].accuracy, classifiers_results['bayesian']['task1'].f1, 
           classifiers_results['bayesian']['task1'].recall,classifiers_results['bayesian']['task1'].precision])
x.add_row(["fasttext", classifiers_results['fasttext']['task1'].accuracy, classifiers_results['fasttext']['task1'].f1, 
           classifiers_results['fasttext']['task1'].recall,classifiers_results['fasttext']['task1'].precision])
x.add_row(["transformer", classifiers_results['transformer']['task1'].accuracy, classifiers_results['transformer']['task1'].f1, 
           classifiers_results['transformer']['task1'].recall,classifiers_results['transformer']['task1'].precision])
print(x)

TASK1
+-------------+----------+---------------------+---------------------+---------------------+
|    model    | accuracy |          f1         |        recall       |      precision      |
+-------------+----------+---------------------+---------------------+---------------------+
|   bayesian  |  0.782   |  0.2684563758389261 | 0.24390243902439024 | 0.29850746268656714 |
|   fasttext  |  0.873   | 0.13605442176870747 |  0.7692307692307693 | 0.07462686567164178 |
| transformer |   0.9    |         0.5         |  0.7575757575757576 |  0.373134328358209  |
+-------------+----------+---------------------+---------------------+---------------------+


In [204]:
print("TASK2")
x = PrettyTable()
x.field_names = ["model", "accuracy", "f1_micro","f1_macro", "recall","precision"]
x.add_row(["bayesian", classifiers_results['bayesian']['task2'].accuracy, classifiers_results['bayesian']['task2'].f1_micro, 
           classifiers_results['bayesian']['task2'].f1_macro,
           classifiers_results['bayesian']['task2'].recall,classifiers_results['bayesian']['task2'].precision])
x.add_row(["fasttext", classifiers_results['fasttext']['task2'].accuracy, classifiers_results['fasttext']['task2'].f1_micro, 
           classifiers_results['fasttext']['task2'].f1_macro,
           classifiers_results['fasttext']['task2'].recall,classifiers_results['fasttext']['task2'].precision])
x.add_row(["transformer", classifiers_results['transformer']['task2'].accuracy, classifiers_results['transformer']['task2'].f1_micro,
           classifiers_results['transformer']['task2'].f1_macro,
           classifiers_results['transformer']['task2'].recall,classifiers_results['transformer']['task2'].precision])
print(x)

TASK2
+-------------+----------+----------+---------------------+---------------------+--------------------+
|    model    | accuracy | f1_micro |       f1_macro      |        recall       |     precision      |
+-------------+----------+----------+---------------------+---------------------+--------------------+
|   bayesian  |  0.787   |  0.787   |  0.3968305029876156 | 0.40132515731936985 | 0.4081828647301029 |
|   fasttext  |  0.866   |  0.866   | 0.31570883450582704 | 0.45615796519410984 | 0.3360065258385067 |
| transformer |  0.892   |  0.892   |  0.4954170104708358 |  0.5979220028092208 | 0.4658903461378195 |
+-------------+----------+----------+---------------------+---------------------+--------------------+


<h4>LIME utility functions</h4>

In [328]:
class DenseTransformer(sklearn.base.TransformerMixin):

    def fit(self, X, y=None, **fit_params):
        return self

    def transform(self, X, y=None, **fit_params):
        return X.todense()

def fasttext_prediction_in_sklearn_format(classifier, texts):
    res = []
    for item in model_task1.predict_proba(texts,10):
        if item[0][0]=='0':
            res.append([item[0][1],item[1][1]])
        else:
            res.append([item[1][1],item[0][1]])

    return np.array(res)   

def predict_transformer_lime(texts):
    tokenized = [tokenizer(x, truncation=True) for x in texts]
    trainer = transformers.Trainer(model=task1_model, tokenizer=tokenizer)
    return trainer.predict(tokenized)

def lime_transformer_task1(t):
    exp = lime.lime_text.LimeTextExplainer(class_names=["neutral", "negative"]).explain_instance(t, lambda x: predict_transformer_lime( x)[0])
    return exp.as_list()
def lime_bayesian_task1(t):
    c = sklearn.pipeline.make_pipeline(vectorizer_taks1,DenseTransformer(), gnb_task1)
    exp = lime.lime_text.LimeTextExplainer(class_names=["neutral", "negative"]).explain_instance( t,c.predict_proba)
    return exp.as_list()
def lime_fasttext_task1(t):
    exp = lime.lime_text.LimeTextExplainer(class_names=["neutral", "negative"]).explain_instance(t,lambda x: fasttext_prediction_in_sklearn_format(model_task1, x))
    return exp.as_list()

<h4>Selection of examples (predictions made by transformer classifier were used as it returned best results)</h4>

In [340]:
selectable = list(zip(dataset_task1['test']['label'],np.argmax(tmp1[0], axis=1)))
tp_index = selectable.index((1, 1))
tn_index = selectable.index((0, 0))
fp_index = selectable.index((0, 1))
fn_index = selectable.index((1, 0))
print("Exmaple true positive:")
print(dataset_task1['test']['text'][tp_index])
print("Exmaple true negative:")
print(dataset_task1['test']['text'][tn_index])
print("Exmaple false positive:")
print(dataset_task1['test']['text'][fp_index])
print("Exmaple false negative:")
print(dataset_task1['test']['text'][fn_index])

Exmaple true positive:
@anonymized_account Dokładnie, pisdzielstwo nie ma prawa rozpierdalać systemu,  sądownictwa nie mając większości
Exmaple true negative:
@anonymized_account Spoko, jak im Duda z Morawieckim zamówią po pięć piw to wszystko będzie ok.
Exmaple false positive:
Jestem do tylu ale czy Ari zerwała z tym ćpunem?
Exmaple false negative:
@anonymized_account Tej szmaty się nie komentuje


<h4>Comparison of decisions made by other classifiers on examples chosen for transformer classifier output </h4>

In [354]:
def decision_type_bayes(idx):
    match (dataset_task1['test']['label'][idx],task1_pred[idx]):
        case (1,1):
            return "true positive"
        case (0,0):
            return "true negative"
        case (0,1):
            return "false positive"
        case (1,0):
            return "false negative"
def decision_type_fasttext(idx):
    match (dataset_task1['test']['label'][idx],task1_pred_fast[idx]):
        case (1,1):
            return "true positive"
        case (0,0):
            return "true negative"
        case (0,1):
            return "false positive"
        case (1,0):
            return "false negative"
x = PrettyTable()
x.field_names = ["transformer_decision", "fasttext_decision", "bayesian_decision",]
x.add_row(["true positive",decision_type_fasttext(tp_index),decision_type_bayes(tp_index)])
x.add_row(["true negative",decision_type_fasttext(tn_index),decision_type_bayes(tn_index)])
x.add_row(["false positive",decision_type_fasttext(fp_index),decision_type_bayes(fp_index)])
x.add_row(["false negative",decision_type_fasttext(fn_index),decision_type_bayes(fn_index)])
print(x)

+----------------------+-------------------+-------------------+
| transformer_decision | fasttext_decision | bayesian_decision |
+----------------------+-------------------+-------------------+
|    true positive     |   false negative  |   false negative  |
|    true negative     |   true negative   |   true negative   |
|    false positive    |   true negative   |   true negative   |
|    false negative    |   false negative  |   true positive   |
+----------------------+-------------------+-------------------+


<h4>True positive LIME explanation</h4>

In [346]:
print("Transformer:")
print(*lime_transformer_task1(dataset_task1['test']['text'][tp_index]),sep='\n')
print("Bayesian:")
print(*lime_bayesian_task1(dataset_task1['test']['text'][tp_index]),sep='\n')
print("Fasttext:")
print(*lime_fasttext_task1(dataset_task1['test']['text'][tp_index]),sep='\n')

Transformer:


No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
***** Running Prediction *****
  Num examples = 5000
  Batch size = 8


('pisdzielstwo', 4.3331948419941355)
('anonymized_account', 0.6932275746752234)
('rozpierdalać', 0.48553664274834596)
('nie', -0.20003645114863353)
('systemu', 0.1986852965970299)
('sądownictwa', -0.17240545786764708)
('prawa', 0.09027234826159426)
('mając', 0.07140052973614262)
('ma', 0.05223604044352335)
('Dokładnie', -0.04935914593684743)
Bayesian:
('sądownictwa', -0.15041608783071128)
('mając', -0.1500899642678433)
('większości', -0.13916068018854746)
('ma', 0.005683714311630539)
('nie', -0.0052194814268681725)
('systemu', -0.004662567713990882)
('prawa', -0.003279169464704961)
('anonymized_account', -0.002035592567654725)
('rozpierdalać', -0.0020203451835214847)
('Dokładnie', -0.0016848549490595067)
Fasttext:
('pisdzielstwo', 0.2170514473308283)
('systemu', -0.11675253024243636)
('Dokładnie', -0.11542863296551906)
('anonymized_account', 0.026566897543009418)
('nie', 0.019374845854321476)
('ma', -0.014228221866048442)
('sądownictwa', 0.005715522912811025)
('mając', 0.00571055345124

<h4>True negative LIME explanation</h4>

In [345]:
print("Transformer:")
print(*lime_transformer_task1(dataset_task1['test']['text'][tn_index]),sep='\n')
print("Bayesian:")
print(*lime_bayesian_task1(dataset_task1['test']['text'][tn_index]),sep='\n')
print("Fasttext:")
print(*lime_fasttext_task1(dataset_task1['test']['text'][tn_index]),sep='\n')

Transformer:


No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
***** Running Prediction *****
  Num examples = 5000
  Batch size = 8


('Morawieckim', 0.5405107677998163)
('Duda', 0.38501462509601225)
('piw', -0.2972579227364432)
('ok', -0.2730876012418016)
('zamówią', -0.20714911448960216)
('pięć', -0.1981653954158361)
('im', 0.16757968199815287)
('anonymized_account', 0.16566373383126076)
('z', -0.13250086035586287)
('będzie', -0.11309757535712882)
Bayesian:
('Morawieckim', -0.1432560446196628)
('pięć', -0.14309884082636146)
('ok', -0.142289700654506)
('im', -0.010784465022443769)
('po', -0.008999643343233502)
('wszystko', -0.006775246876795237)
('jak', -0.00581653558447634)
('będzie', 0.005315040640373634)
('zamówią', 0.00388990517087464)
('z', -0.003698499636610202)
Fasttext:
('będzie', -0.09512405550842264)
('Duda', 0.07196867003257745)
('im', 0.06709455858338466)
('Spoko', -0.06262153118614013)
('wszystko', 0.0530254150377875)
('ok', -0.04240363810308521)
('jak', 0.03587453000371867)
('to', -0.02339452879268083)
('z', -0.020910986228178288)
('anonymized_account', 0.01851609309371786)


<h4>False positive LIME explanation</h4>

In [347]:
print("Transformer:")
print(*lime_transformer_task1(dataset_task1['test']['text'][fp_index]),sep='\n')
print("Bayesian:")
print(*lime_bayesian_task1(dataset_task1['test']['text'][fp_index]),sep='\n')
print("Fasttext:")
print(*lime_fasttext_task1(dataset_task1['test']['text'][fp_index]),sep='\n')

Transformer:


No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
***** Running Prediction *****
  Num examples = 5000
  Batch size = 8


('ćpunem', 4.43738940871803)
('tym', 0.7542544786602385)
('ale', -0.5722882453118739)
('Jestem', -0.26945758641511)
('czy', -0.25072302217275694)
('Ari', -0.15999653388717883)
('z', -0.15649637698546917)
('tylu', 0.09113237907503344)
('do', 0.02804369594730482)
('zerwała', 0.020633889317540947)
Bayesian:
('tylu', -0.5778116801406644)
('Ari', -0.5702098614166565)
('ćpunem', 0.41119632356014424)
('z', 0.024917049467477772)
('do', 0.019663439420494224)
('ale', 0.01818547160393231)
('Jestem', 0.009785677632022306)
('czy', 0.009025678067249446)
('tym', 0.006738856773425721)
('zerwała', 0.0023226118779101058)
Fasttext:
('tym', -9.467998023930894e-06)
('ale', -8.919216754983273e-06)
('do', -8.856150820898994e-06)
('czy', 4.707257836315553e-06)
('tylu', 4.673097575054306e-06)
('z', 4.3899412746581926e-06)
('Jestem', 4.107625056151985e-06)
('Ari', 4.075008418869659e-06)
('ćpunem', 3.7512181833719866e-06)
('zerwała', 2.1351520300194493e-06)


<h4>False negative LIME explanation</h4>

In [348]:
print("Transformer:")
print(*lime_transformer_task1(dataset_task1['test']['text'][fn_index]),sep='\n')
print("Bayesian:")
print(*lime_bayesian_task1(dataset_task1['test']['text'][fn_index]),sep='\n')
print("Fasttext:")
print(*lime_fasttext_task1(dataset_task1['test']['text'][fn_index]),sep='\n')

Transformer:


No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
***** Running Prediction *****
  Num examples = 5000
  Batch size = 8


('szmaty', 2.770559799210798)
('anonymized_account', 1.7494970844970126)
('nie', -1.006269238087004)
('się', -0.40547075658296167)
('komentuje', -0.27382569753906916)
('Tej', 0.25265883798215916)
Bayesian:
('szmaty', 0.6633921303073481)
('komentuje', -0.3355436376297456)
('się', -0.03764488179266531)
('Tej', -0.033256429522814365)
('nie', -0.024308991697711224)
('anonymized_account', -0.0241214160738303)
Fasttext:
('szmaty', 0.00773730065945517)
('nie', 0.0051392767807105656)
('Tej', 0.005092592935976661)
('się', -0.002791644650219082)
('anonymized_account', 0.0021916099007646224)
('komentuje', -0.00027656144062449445)


<h4>Anwsers</h4>
<h3>I</h3>
<p>For both tasks the transformer architecture based classifier (a fine tuned allegro-herbert model) achieved the best results in all metrics chosen.</p>
<h3>II</h3>
<p>For task 1 the transformer model can certainly deliver results comparable to the results of the PolEval task in terms of accuracy, f1 score and recall. However it severely lacks performance in terms of precision. compared to most PolEval solutions.
For task 2 the results are comparable to the top solutions of the PolEval task with the f1-micro score being better than all other solutions and the f1-macro score being only slightly worse.
</p>
<h3>III</h3>
<p>As of time of writing of this report (27.11.2022) the Klej leaderboard page was unaccessible.</p>
<h3>IV</h3>
<p>The transformer based model was best in terms of decisions made by the model, it did however require significant time for training and classification, whereas the bayesian and fasttext classifiers were much faster (the fasttext seemed to be the fastest in terms of training time). The disk size needed for storage is also much higher for the transformer based model architecture compared to other ones. The saving grace for the bayesian classifier is the fact that it is implemented in the sklearn package and follows an API convention that allows it for seamless integration with popular tools such as the LIME explainer.</p>
<h3>V</h3>
<p>In my opinion a simple comparison of raw performance values on a single task is insufficient for a proper assessment of given algorithms/models. Since not  only certain algorithms/models may work better depending on the dataset given, because of biases and other factors in the data, but other than that for certain real-world applications of those algorithms/models other factors than the quality of predictions can play a major factor in the feasibility of using those algorithms. For example in network request classification workloads we care not only about the accuracy of the model (although accuracy is still important) but also about the computational complexity needed for classifications since we need to classify up to millions of requests per second without incurring significant time overheads in request answering. As such we may find that in literature some “simpler” statistical models such as Hiden Markov Models are popular and real-word solutions are rules or signature based systems that use little to no machine learning at all.</p>
<h3>VI</h3>
<p>LIME showed that model use valuable words in classification with certain words being treated with higher values than other (eg. “ćpun”).</p>