## Testing the accuracy of my self-developed functions for ABSA metrics

**`Goal:`** I found out late that I could apply sklearn and other packages to compute the f-score and accuracy metrics with my unique dataset. Hence, I created functions adapted to my dataset which compute these metrics. In this notebook, I test the accuracy of my developed functions in computing the relevant metrics against sklearn

For this notebook, I only use the POS tagger for obtaining predictions to prevent unnecesarry overhead. If the metrics align with this single model's predictions they are likely to align with the other models (e.g. binary relevance, MLC classifier,etc.)

### 1. Load packages/libraries

In [4]:
import sys
sys.path.append("/Users/koredeakande/Documents/Capstone/ISP Project/Coding/nigerian_isp_sentiment_analysis/py_scripts")
import pandas as pd
import clean_tweets

#Load scipy and sklearn necessary for the multi-label classification computation
from scipy.sparse import lil_matrix
from sklearn.metrics import fbeta_score, accuracy_score

#Note: The module below was personally designed to compute the metrics given the dataset structure
from absa_metrics import weighted_binary_precision_recall_fscore, aspect_sentiment_accuracy

### 2. Load the data

In [5]:
eval_val_df = pd.read_csv("../data/model-evaluation/validation_dataset.csv")
eval_val_df.head()

Unnamed: 0,Text,price,speed,reliability,coverage,customer service,Aspects,Sentiment
0,officialkome_ spectranet_ng this people don fr...,0,0,0,0,0,[None],[None]
1,ayomikun_o_ yoruba_dev spectranet_ng i'm.yet t...,0,0,0,0,0,[None],[None]
2,spectranet !!!!! 🤬🤬🤬🤬🤬🤬,0,0,0,0,0,[None],[None]
3,"after buying data see airtel telling me ""now t...",0,0,0,0,0,[None],[None]
4,spectranet ooooo,0,0,0,0,0,[None],[None]


In [3]:
true_preds = ['price','speed'], ['reliability'], ['customer service','coverage']

model_preds = ['price'], ['coverage'], ['customer service','coverage']

In [None]:
pri

In [15]:
y_true = [[1,2],[1,0],[]]
y_preds = [[1,3],[2,1],[1]]

In [16]:
def label_to_sm(labels, n_classes):
    sm = lil_matrix((len(labels), n_classes))
    for i, label in enumerate(labels):
        sm[i, label] = 1
    return sm

In [17]:
y_true_sm = label_to_sm(labels=y_true, n_classes=4)
y_true_sm.toarray()

array([[0., 1., 1., 0.],
       [1., 1., 0., 0.],
       [0., 0., 0., 0.]])

In [10]:
y_pred_sm = label_to_sm(labels=y_preds, n_classes=4)
y_pred_sm.toarray()

array([[0., 1., 0., 1.],
       [0., 1., 1., 0.],
       [0., 1., 0., 0.]])

In [19]:
fbeta_score(y_true_sm,y_pred_sm, average='macro', beta=0.5)

0.17857142857142855

In [24]:
eval_val_df = pd.read_csv("../data/model-evaluation/validation_dataset.csv")
eval_val_df.head()

Unnamed: 0,Text,price,speed,reliability,coverage,customer service,Aspects,Sentiment
0,officialkome_ spectranet_ng this people don fr...,0,0,0,0,0,[None],[None]
1,ayomikun_o_ yoruba_dev spectranet_ng i'm.yet t...,0,0,0,0,0,[None],[None]
2,spectranet !!!!! 🤬🤬🤬🤬🤬🤬,0,0,0,0,0,[None],[None]
3,"after buying data see airtel telling me ""now t...",0,0,0,0,0,[None],[None]
4,spectranet ooooo,0,0,0,0,0,[None],[None]


In [33]:
true_aspects = eval_val_df.Aspects.apply(eval).to_list()

---

### 3. Load binary relevance model

In [6]:
sys.path.append("../models/full_absa_models")
import binary_relevance_model

2022-04-01 19:45:37.887995: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some layers from the model checkpoint at absa/classifier-rest-0.2 were not used when initializing BertABSClassifier: ['dropout_379']
- This IS expected if you are initializing BertABSClassifier from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertAB

#### (i) Perform slight cleaning of the tweets

In [7]:
#Clean the validation set tweets
model_1_cleaned_val = clean_tweets.run_cleaner(eval_val_df,'Text',no_punc=True,
                                               no_emoji=True, no_isp_name=True)

model_1_cleaned_val.head()

Unnamed: 0,Text,price,speed,reliability,coverage,customer service,Aspects,Sentiment
0,officialkome spectranetng this people don frus...,0,0,0,0,0,[None],[None]
1,ayomikuno yorubadev spectranetng imyet to turn...,0,0,0,0,0,[None],[None]
2,spectranet,0,0,0,0,0,[None],[None]
3,after buying data see airtel telling me now th...,0,0,0,0,0,[None],[None]
4,spectranet ooooo,0,0,0,0,0,[None],[None]


#### (ii) Run ABSA model on the tweets
*Takes a couple of seconds to run*

In [8]:
binary_relevance_absa = binary_relevance_model.run(eval_val_df, 'Text')

#### (iii) Merge true annotations on the model's predictions

In [9]:
binary_relevance_absa[['Aspects', 'Sentiment']] = model_1_cleaned_val[['Aspects', 'Sentiment']]
binary_relevance_absa.head()

Unnamed: 0,Text,Detected aspects,Predicted sentiment,Aspects,Sentiment
0,officialkome spectranetng this people don frus...,[None],[None],[None],[None]
1,ayomikuno yorubadev spectranetng imyet to turn...,[None],[None],[None],[None]
2,spectranet,[None],[None],[None],[None]
3,after buying data see airtel telling me now th...,[None],[None],[None],[None]
4,spectranet ooooo,[None],[None],[None],[None]


#### (iv) Aspect extraction evaluation

In [11]:
#Calculate precision, recall and f-0.5
md1_class_metrics, md1_precision, md1_recall, md1_fscore = weighted_binary_precision_recall_fscore(
    binary_relevance_absa['Aspects'],
    binary_relevance_absa['Detected aspects'], 
    beta = 0.5)

print(f"Precision: {md1_precision:.3f}  Recall: {md1_recall:.3f} F-0.5: {md1_fscore:.3f}")

Precision: 0.828  Recall: 0.770 F-0.5: 0.812


In [12]:
md1_df = pd.DataFrame(md1_class_metrics)
(md1_df.T).iloc[:,-4:]

Unnamed: 0,Support,Precision,Recall,F-0.5
price,12.0,0.818182,0.75,0.803571
speed,13.0,0.846154,0.846154,0.846154
reliability,11.0,0.769231,0.909091,0.793651
coverage,9.0,0.833333,0.555556,0.757576
customer service,16.0,0.857143,0.75,0.833333


#### (v) Aspect sentiment prediction evaluation

In [13]:
md1_accuracies,md1_micro_accuracy,md1_macro_accuracy, md1_extraction_support = aspect_sentiment_accuracy(binary_relevance_absa['Aspects'],
                                                                          binary_relevance_absa['Detected aspects'],
                                                                          binary_relevance_absa['Sentiment'],
                                                                          binary_relevance_absa['Predicted sentiment'])

print(f" Correct extractions:{md1_extraction_support} \n Micro accuracy:{md1_micro_accuracy:.3f}   Macro accuracy:{md1_macro_accuracy:.3f}")

 Correct extractions:{'price': 9, 'speed': 11, 'reliability': 10, 'coverage': 5, 'customer service': 12} 
 Micro accuracy:0.766   Macro accuracy:0.772


In [14]:
(pd.DataFrame([md1_accuracies]).T).rename(columns={0:'Accuracy'})

Unnamed: 0,Accuracy
price,0.222222
speed,0.636364
reliability,1.0
coverage,1.0
customer service,1.0


---

### 4. Sklearn evaluation

#### a. Define function to encode the aspects contained in a list of lists

In [15]:
def encode_detected_aspects(multi_label_aspects):
    
    """
    Function to encode a list of lists representing the detected aspects as integers
    """
    
    final_list = []

    aspect_map = {'price':0,'speed':1,'reliability':2,'coverage':3, 'customer service':4}

    for aspect_list in multi_label_aspects:

        #If None reformat to empty string
        if aspect_list == [None]:
            final_list.append([])

        #If just a single aspect
        elif len(aspect_list) == 1:

            #Encode and add to list as a list
            final_list.append([aspect_map[aspect_list[0]]])

        #If more than one aspect
        else:

            #List to store the encoding of all the aspects
            encoded_list = []

            #Iterate through each aspect
            for aspect in aspect_list:

                #Encode and add to encoding list
                encoded_list.append(aspect_map[aspect])

            final_list.append(encoded_list)
    
    return final_list

#### b. Define function to convert list of lists to sparse matrix for evaluation using sklearn

In [16]:
def label_to_sm(labels, n_classes):
    sm = lil_matrix((len(labels), n_classes))
    for i, label in enumerate(labels):
        sm[i, label] = 1
    return sm

#### c. Encode true aspects and predicted aspects

**True aspects**

In [17]:
y_true = eval_val_df.Aspects.apply(eval).to_list()
encoded_y_true = encode_detected_aspects(y_true)

**Predicted aspects**

In [19]:
y_pred = binary_relevance_absa['Detected aspects'].to_list()
encoded_y_pred = encode_detected_aspects(y_pred)

#### d. Convert list of lists into sparse matrix

In [20]:
#True aspects
y_true_sm = label_to_sm(labels=encoded_y_true, n_classes=5)

#Predicted aspects
y_pred_sm = label_to_sm(labels=encoded_y_pred, n_classes=5)

#### e. COMPUTE F-0.5 SCORE WITH SKLEARN & COMPARE WITH SELF-DEVELOPED MODEL

**Weighted F-0.5 score**

In [24]:
round(md1_fscore,4) == round(fbeta_score(y_true_sm,y_pred_sm, average='weighted', beta=0.5),4)

True

**Class-level F-0.5 scores**

In [25]:
my_func_class_metrics = [md1_class_metrics[key]['F-0.5'] for key in md1_class_metrics.keys()]
sklearn_class_metrics = fbeta_score(y_true_sm,y_pred_sm, average=None, beta=0.5)

my_func_class_metrics == sklearn_class_metrics

array([ True,  True,  True,  True,  True])

#### e. COMPUTE ACCURACIES WITH SKLEARN & COMPARE WITH SELF-DEVELOPED MODEL
`METRIC RESULTS WILL DIFFER BELOW`

Sklearn computes subset accuracy (the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true) which is slightly different from what I designed for computation. In my case, the labels predicted for a sample must not exactly match the corresponding set of labels in y_true. As long as an aspect is correctly detected (i.e. it is both in the predicted labels and true labels, it contributes to the accuracy score

In [28]:
sklearn_accuracy = accuracy_score(y_true_sm,y_pred_sm)
sklearn_accuracy

0.7857142857142857

In [34]:
md1_macro_accuracy

0.7717171717171717

In [35]:
md1_micro_accuracy

0.7659574468085106