This notebooks showcases the technique of comparing various pre-trained text-classification models based on different evaluation metrics. *For this, we will use the method of TOPSIS - Technique for Order of Preference by Similarity to Ideal Solution*

### Importing Libraries

In [2]:
from transformers import pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, hamming_loss, cohen_kappa_score, log_loss
import pandas as pd
import numpy as np

### Importing Hugging face models for text classification


In [3]:
model_names = [
    "distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    "lxyuan/distilbert-base-multilingual-cased-sentiments-student",
    "cardiffnlp/twitter-roberta-base-sentiment-latest",
    "siebert/sentiment-roberta-large-english"
]

models = []

for model_name in model_names:
    model = pipeline(model=model_name)
    models.append(model)

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


config.json:   0%|          | 0.00/759 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/541M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/373 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/996k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.92M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Device set to use cpu


config.json:   0%|          | 0.00/929 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Device set to use cpu


config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Device set to use cpu


### Importing dataset for four different categories of text

In [4]:
education = pd.read_csv("/kaggle/input/sentiment-analysis-evaluation-dataset/Education.csv")
sports = pd.read_csv("/kaggle/input/sentiment-analysis-evaluation-dataset/Sports.csv")
politics = pd.read_csv("/kaggle/input/sentiment-analysis-evaluation-dataset/Politics.csv")
finance=pd.read_csv("/kaggle/input/sentiment-analysis-evaluation-dataset/Finance.csv")

In [5]:
education.head()

Unnamed: 0,Text,Label
0,The impact of educational reforms remains unce...,positive
1,Critics argue that recent improvements in the ...,negative
2,Innovative teaching methods have led to unexpe...,positive
3,"Despite budget constraints, the school has man...",positive
4,The true effectiveness of online learning plat...,negative


In [6]:
sports.head()

Unnamed: 0,Text,Label
0,The team's recent victories have raised suspic...,positive
1,"Despite their recent loss, the team's morale r...",positive
2,Rumors of match-fixing have cast a shadow over...,negative
3,The unexpected resignation of the coach has le...,negative
4,Speculations about doping allegations have led...,negative


In [7]:
finance.head()

Unnamed: 0,Text,Label
0,The financial markets are influenced by a myri...,positive
1,Financial literacy is essential for making inf...,positive
2,"The stock market can be volatile, with prices ...",positive
3,Financial regulations aim to protect investors...,positive
4,Access to credit and capital is essential for ...,positive


In [8]:
politics.head()

Unnamed: 0,Text,Label
0,The government's recent policies have received...,positive
1,Political analysts are divided on the long-ter...,negative
2,Efforts to promote unity among political facti...,positive
3,"Despite allegations of corruption, the governm...",negative
4,The recent diplomatic initiatives have been me...,positive


In [9]:
dset = [education,sports,politics,finance]

### Creating Metric Dataframes

In [10]:
edu_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss',  "Log_Loss"], index=[0,1,2,3])
sports_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss',  "Log_Loss"], index=[0,1,2,3])
pol_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss',  "Log_Loss"], index=[0,1,2,3])
fin_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss',  "Log_Loss"], index=[0,1,2,3])

### Calculating various performance metrics of all models

In [11]:
for idx,genre in enumerate(dset, start=1):
    
    for i in range(len(models)):
        
        model_pred=[]

        # Predicted labels
        for str in genre['Text']:
            model_pred.append(1 if models[i](str)[0]['label'].lower()=="positive" else 0)

        # True Labels
        model_actual = genre['Label'].apply(lambda x: {True:1, False:0}[x=="positive"])

        # Metrics
        accuracy = accuracy_score(model_actual, model_pred)
        precision = precision_score(model_actual, model_pred)
        recall = recall_score(model_actual, model_pred)
        f1 = f1_score(model_actual, model_pred)
        hamming = hamming_loss(model_actual, model_pred)
       
        ll = log_loss(model_actual, model_pred)

        if idx==1:
            # Education
            edu_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming, ll]
        elif idx==2:
            # Sports
            sports_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming,ll]
        elif idx==3:
            # Politics
            pol_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming, ll]
        else:
            # Finance
            fin_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming, ll]

In [12]:
edu_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss
0,Model 1,0.576923,0.611111,0.423077,0.5,0.423077,15.249238
1,Model 2,0.634615,0.6,0.807692,0.688525,0.365385,13.169796
2,Model 3,0.634615,0.888889,0.307692,0.457143,0.365385,13.169796
3,Model 4,0.673077,0.714286,0.576923,0.638298,0.326923,11.783502


In [13]:
fin_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss
0,Model 1,0.8125,0.903226,0.823529,0.861538,0.1875,6.758185
1,Model 2,0.770833,0.870968,0.794118,0.830769,0.229167,8.260004
2,Model 3,0.416667,1.0,0.176471,0.3,0.583333,21.025464
3,Model 4,0.895833,0.914286,0.941176,0.927536,0.104167,3.754547


In [14]:
pol_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss
0,Model 1,0.849057,0.947368,0.72,0.818182,0.150943,5.440551
1,Model 2,0.754717,0.772727,0.68,0.723404,0.245283,8.840896
2,Model 3,0.54717,1.0,0.04,0.076923,0.45283,16.321654
3,Model 4,0.924528,0.92,0.92,0.92,0.075472,2.720276


In [15]:
sports_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss
0,Model 1,0.839286,0.827586,0.857143,0.842105,0.160714,5.79273
1,Model 2,0.803571,0.742857,0.928571,0.825397,0.196429,7.080003
2,Model 3,0.892857,1.0,0.785714,0.88,0.107143,3.86182
3,Model 4,0.910714,0.896552,0.928571,0.912281,0.089286,3.218183


## Topsis

Function for normalizing dataframe

In [16]:
def normalize(df):
    divisor = df.apply(lambda x: x**2).apply(sum).apply(lambda x: x**0.5)
    df = df.div(divisor)
    return df

Function for weighted normalization

In [17]:
def weight_normalized(df, weights):
    df = df.mul(weights)
    return df

Function for finding the best and worst ideal outputs

In [18]:
def best_worst(df, impacts):
    best=[]
    worst=[]
    for i in range(len(impacts)):
        if impacts[i]=='+':
            best.append(max(df.iloc[:,i]))
            worst.append(min(df.iloc[:,i]))
        else:
            best.append(min(df.iloc[:,i]))
            worst.append(max(df.iloc[:,i]))
    return (best,worst)

Function for calculating the topsis score of all the models

In [19]:
def calc_performance(df, best, worst):
    s_best=[]
    s_worst=[]
    for i in range(len(df)):
        s_best.append((sum((df.loc[i] - best)**2))**0.5)
        s_worst.append((sum((df.loc[i] - worst)**2))**0.5)
    s_total = [i+j for i,j in zip(s_worst,s_best)]
    performance = [i/j for i,j in zip(s_worst,s_total)]
    df.loc[:,'Topsis Score'] = performance

Function for calculating the rank of each model for every category

In [20]:
def rank(df):
    sorted_array = df.loc[:,'Topsis Score'].argsort()
    ranks = np.empty_like(sorted_array)
    ranks[sorted_array] = np.arange(len(sorted_array))
    n=len(sorted_array)
    ranks = [n-i for i in ranks]
    df.loc[:,'Rank'] = ranks

Function for calling all the topsis related functions

In [21]:
def topsis(input, weights, impacts):
    df=input.iloc[:,1:]

    df = normalize(df)
    df = weight_normalized(df,weights)

    (best,worst) = best_worst(df,impacts)
    calc_performance(df,best,worst)
    rank(df)
    return df

Specifying the weights and impacts and calculating the ranks of models for different genres using topsis

In [22]:
weights=[1,1,1,1,1,1]
impacts = ["+", "+", "+", "+","-", "+"]
result1 = topsis(edu_result, weights, impacts)
result2 = topsis(sports_result, weights, impacts)
result3 = topsis(pol_result, weights, impacts)
result4 = topsis(fin_result, weights, impacts)

In [23]:
result1.insert(0,"Model",edu_result['Model'])
result2.insert(0,"Model",sports_result['Model'])
result3.insert(0,"Model",pol_result['Model'])
result4.insert(0,"Model",fin_result['Model'])

## Education

In [24]:
result1

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss,Topsis Score,Rank
0,Model 1,0.457336,0.428517,0.377075,0.431859,0.568987,0.568987,0.272631,4
1,Model 2,0.50307,0.420725,0.719871,0.59469,0.491398,0.491398,0.689287,1
2,Model 3,0.50307,0.623297,0.274236,0.394842,0.491398,0.491398,0.313649,3
3,Model 4,0.533559,0.500864,0.514193,0.551309,0.439672,0.439672,0.547393,2


Model 2 i.e."**lxyuan/distilbert-base-multilingual-cased-sentiments-student**"  has the best performace for educational based text.

## Sports


In [25]:
result2

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss,Topsis Score,Rank
0,Model 1,0.486453,0.474611,0.488678,0.486427,0.554964,0.554964,0.485037,3
1,Model 2,0.465753,0.42602,0.529401,0.476775,0.678289,0.678289,0.482566,4
2,Model 3,0.517503,0.573488,0.447955,0.508316,0.369976,0.369976,0.519938,1
3,Model 4,0.527853,0.514162,0.529401,0.526962,0.308313,0.308313,0.514487,2


Model 3 i.e.**"cardiffnlp/twitter-roberta-base-sentiment-latest"** has the best performace for sports related text.

In [26]:
result3

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss,Topsis Score,Rank
0,Model 1,0.543036,0.518298,0.532414,0.572134,0.278524,0.278524,0.612328,2
1,Model 2,0.482699,0.422753,0.502835,0.505858,0.452602,0.452602,0.594085,3
2,Model 3,0.349957,0.547093,0.029579,0.05379,0.835573,0.835573,0.381581,4
3,Model 4,0.591306,0.503325,0.680307,0.643333,0.139262,0.139262,0.622231,1


Model 4 i.e.**"siebert/sentiment-roberta-large-english"** has the best performace for politics based text.

In [27]:
result4

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Log_Loss,Topsis Score,Rank
0,Model 1,0.544033,0.489101,0.551999,0.55813,0.283052,0.283052,0.584253,2
1,Model 2,0.516134,0.471633,0.532285,0.538197,0.345953,0.345953,0.581025,3
2,Model 3,0.278991,0.541505,0.118285,0.194349,0.880608,0.880608,0.414452,4
3,Model 4,0.599831,0.49509,0.630856,0.600885,0.157251,0.157251,0.58624,1


Model 4 i.e.**"siebert/sentiment-roberta-large-english"** has the best performace for politics based text.

*Hence, we have the following result*

| Domain | Best Model | Model Name |
|-----------------|-----------------|-----------------|
| Education    | Model 2    | lxyuan/distilbert-base-multilingual-cased-sentiments-student    |
| Sports    | Model 3    | cardiffnlp/twitter-roberta-base-sentiment-latest   |
| Politics    | Model 4    | siebert/sentiment-roberta-large-english    |
| Finance    | Model 4    | siebert/sentiment-roberta-large-english    |
