# Model Evaluation & Selection
**`Aspect-based Sentiment Analysis`**

**`Goal:`** Compare the performance of the three different model approaches (pos-tagging with word similarity, binary relevance, and multilabel classification) on the validation and test sets to settle on a final method to be used for our prediction tasks

## 1. Import packages

In [1]:
import sys
import pandas as pd
import clean_tweets
from absa_metrics import binary_precision_recall_fscore, aspect_sentiment_accuracy

## 2. Load the evaluation data

**Validation dataset**

In [2]:
eval_val_df = pd.read_csv("../data/model-evaluation/validation_dataset.csv")
eval_val_df.head()

Unnamed: 0,Text,price,speed,reliability,coverage,customer service,Aspects,Sentiment
0,officialkome_ spectranet_ng this people don fr...,0,0,0,0,0,[None],[None]
1,ayomikun_o_ yoruba_dev spectranet_ng i'm.yet t...,0,0,0,0,0,[None],[None]
2,spectranet !!!!! 🤬🤬🤬🤬🤬🤬,0,0,0,0,0,[None],[None]
3,"after buying data see airtel telling me ""now t...",0,0,0,0,0,[None],[None]
4,spectranet ooooo,0,0,0,0,0,[None],[None]


**Test dataset**

In [3]:
eval_test_df = pd.read_csv("../data/model-evaluation/test_dataset.csv")
eval_test_df.head()

Unnamed: 0,Text,price,speed,reliability,coverage,customer service,Aspects,Sentiment
0,deejay_klem smilecomsng well this means i will...,0,0,0,0,0,[None],[None]
1,myaccessbank hello please i can't seem to find...,0,0,0,0,0,[None],[None]
2,spectranet_ng hello please can i add multiple ...,0,0,0,0,0,[None],[None]
3,spectranet is shit,0,0,0,0,0,[None],[None]
4,"tizeti is a special brand of useless, when it ...",0,1,0,0,0,['speed'],['Negative']


## 3. Load the models

### a. Load the POS tagger with word similarity model

In [4]:
sys.path.append("../models/full_absa_models")
import pos_word_similarity_model

2021-12-07 10:57:37.860689: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Some layers from the model checkpoint at absa/classifier-rest-0.2 were not used when initializing BertABSClassifier: ['dropout_379']
- This IS expected if you are initializing BertABSClassifier from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertABSClassifier from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of BertABSClassifier were not initialized fr

### b. Load the binary relevance model

In [14]:
sys.path.append("../models/full_absa_models")
import binary_relevance_model

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some layers from the model checkpoint at absa/classifier-rest-0.2 were not used when initializing BertABSClassifier: ['dropout_379']
- This IS expected if you are initializing BertABSClassifier from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertABSClassifier from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of BertABSClassifier were not initialized from the model checkpoint at absa/classifier-rest-0.2 and are newly initialized: ['dropout_75']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### c. Load the multilabel classifier

---

## Modeling: Validation set

### a. Model 1: POS Tagger with word similarity

#### (i) Perform slight cleaning of the tweets

In [5]:
#Clean the validation set tweets
model_1_cleaned_val = clean_tweets.run_cleaner(eval_val_df,'Text',no_punc=True,
                                               no_emoji=True, no_isp_name=True)

model_1_cleaned_val.head()

Unnamed: 0,Text,price,speed,reliability,coverage,customer service,Aspects,Sentiment
0,officialkome spectranetng this people don frus...,0,0,0,0,0,[None],[None]
1,ayomikuno yorubadev spectranetng imyet to turn...,0,0,0,0,0,[None],[None]
2,spectranet,0,0,0,0,0,[None],[None]
3,after buying data see airtel telling me now th...,0,0,0,0,0,[None],[None]
4,spectranet ooooo,0,0,0,0,0,[None],[None]


#### (ii) Run ABSA model on the tweets
*Takes a couple of seconds to run*

In [6]:
pos_word_sim_absa = pos_word_similarity_model.run(model_1_cleaned_val, 'Text')

  similarity_score = round(asp_token.similarity(spacy_token),1)


#### (iii) Merge true annotations on the model's predictions

In [7]:
pos_word_sim_absa[['Aspects', 'Sentiment']] = model_1_cleaned_val[['Aspects', 'Sentiment']]
pos_word_sim_absa.head()

Unnamed: 0,Text,Detected aspects,Corresponding sentiment,Aspects,Sentiment
0,officialkome spectranetng this people don frus...,[None],[None],[None],[None]
1,ayomikuno yorubadev spectranetng imyet to turn...,[None],[None],[None],[None]
2,spectranet,[None],[None],[None],[None]
3,after buying data see airtel telling me now th...,[None],[None],[None],[None]
4,spectranet ooooo,[None],[None],[None],[None]


#### (iv) Aspect extraction evaluation

In [8]:
#Calculate precision, recall and f-0.5
md1_class_metrics, md1_precision, md1_recall, md1_fscore = binary_precision_recall_fscore(
    pos_word_sim_absa['Aspects'],
    pos_word_sim_absa['Detected aspects'], 
    beta = 0.5)

print(f"Precision: {md1_precision:.3f}  Recall: {md1_recall:.3f} F-0.5: {md1_fscore:.3f}")

Precision: 0.500  Recall: 0.033 F-0.5: 0.130


In [9]:
(pd.DataFrame(md1_class_metrics).T).iloc[:,-3:]

Unnamed: 0,Precision,Recall,F-0.5
price,0.0,0.0,0.0
speed,1.0,0.076923,0.294118
reliability,0.0,0.0,0.0
coverage,0.0,0.0,0.0
customer service,0.333333,0.0625,0.178571


#### (iv) Aspect sentiment prediction evaluation

In [11]:
md1_accuracies,md1_micro_accuracy,md1_macro_accuracy=aspect_sentiment_accuracy(pos_word_sim_absa['Aspects'],
                                                                          pos_word_sim_absa['Detected aspects'],
                                                                          pos_word_sim_absa['Sentiment'],
                                                                          pos_word_sim_absa['Corresponding sentiment'])

print(f"Micro accuracy:{md1_micro_accuracy:.3f}   Macro accuracy:{md1_macro_accuracy:.3f}")

Micro accuracy:0.500   Macro accuracy:0.500


In [12]:
(pd.DataFrame([md1_accuracies]).T).rename(columns={0:'Accuracy'})

Unnamed: 0,Accuracy
price,No prediction for this aspect
speed,0.0
reliability,No prediction for this aspect
coverage,No prediction for this aspect
customer service,1.0


---

### a. Model 2: POS Tagger with word similarity

#### (i) Run ABSA model on the tweets
*Takes a couple of seconds to run*

In [15]:
binary_relevance_absa = binary_relevance_model.run(eval_val_df, 'Text')

#### (ii) Merge true annotations on the model's predictions

In [19]:
binary_relevance_absa[['Aspects', 'Sentiment']] = model_1_cleaned_val[['Aspects', 'Sentiment']]
binary_relevance_absa.head()

Unnamed: 0,Text,Detected aspects,Predicted sentiment,Aspects,Sentiment
0,officialkome spectranetng this people don frus...,[None],[None],[None],[None]
1,ayomikuno yorubadev spectranetng imyet to turn...,[None],[None],[None],[None]
2,spectranet,[None],[None],[None],[None]
3,after buying data see airtel telling me now th...,[None],[None],[None],[None]
4,spectranet ooooo,[None],[None],[None],[None]


#### (iii) Aspect extraction evaluation

In [20]:
#Calculate precision, recall and f-0.5
md2_class_metrics, md2_precision, md2_recall, md2_fscore = binary_precision_recall_fscore(
    binary_relevance_absa['Aspects'],
    binary_relevance_absa['Detected aspects'], 
    beta = 0.5)

print(f"Precision: {md2_precision:.3f}  Recall: {md2_recall:.3f} F-0.5: {md2_fscore:.3f}")

Precision: 0.585  Recall: 0.393 F-0.5: 0.533


In [21]:
(pd.DataFrame(md2_class_metrics).T).iloc[:,-3:]

Unnamed: 0,Precision,Recall,F-0.5
price,0.230769,0.25,0.234375
speed,0.75,0.230769,0.517241
reliability,0.8,0.363636,0.645161
coverage,0.6,0.333333,0.517241
customer service,0.785714,0.6875,0.763889


#### (iv) Aspect sentiment prediction evaluation

In [23]:
md2_accuracies,md2_micro_accuracy,md2_macro_accuracy=aspect_sentiment_accuracy(binary_relevance_absa['Aspects'],
                                                                          binary_relevance_absa['Detected aspects'],
                                                                          binary_relevance_absa['Sentiment'],
                                                                          binary_relevance_absa['Predicted sentiment'])

print(f"Micro accuracy:{md2_micro_accuracy:.3f}   Macro accuracy:{md2_macro_accuracy:.3f}")

Micro accuracy:0.833   Macro accuracy:0.733


In [24]:
(pd.DataFrame([md2_accuracies]).T).rename(columns={0:'Accuracy'})

Unnamed: 0,Accuracy
price,0.333333
speed,0.333333
reliability,1.0
coverage,1.0
customer service,1.0
