In this Notebook I trained two classifier, one to identify hope and one to identify nostalgia.

# Package Import

In [None]:
# General Packages
import pandas as pd
import numpy as np
import re # for text-cleaning
from google.colab import data_table
data_table.enable_dataframe_formatter() # to have tables which enable reading the text in full

# Data Handling
#!git config --global credential.helper store # To upload a Model to Huggingface
#!sudo apt-get install git-lfs # To upload a Model to Huggingface
#!huggingface-cli login # To upload a Model to Huggingface

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import balanced_accuracy_score, precision_recall_fscore_support, accuracy_score, classification_report

# set random seed for reproducibility
SEED_GLOBAL = 1984
np.random.seed(SEED_GLOBAL)

# Transformer Packages (Laurer, 2023)
!pip install datasets
!pip install transformers==4.40.0 # in Colab I got an error with the trainer when I did not download the most recent transformer
!pip install accelerate -U
import datasets
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline, TrainingArguments, Trainer, logging
import torch
device = "cuda:0" if torch.cuda.is_available() else "cpu"  # use GPU (cuda) if available, otherwise use CPU



Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
Your token has been sa

# The Data

### Dataset Import
The polnos datasets for Nostalgia expressions are open source. You can download them via the [Harvard Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/L198GI). The Polyhope Dataset is permitted to use only or redistribute only for non-commercial or academic-research purposes. It can be downloaded from the [HOPE at IberLEF 2024](https://codalab.lisn.upsaclay.fr/competitions/17714#participate-get_starting_kit) competition.

To make handling easier, I loaded them into my Github and will use them from there.

In [None]:
#https://www.geeksforgeeks.org/how-to-upload-folders-to-google-colab/
!git clone https://github.com/BeJa1996/political_hope_nostalgia/
!unzip political_hope_nostalgia/Training_Datasets.zip

Cloning into 'political_hope_nostalgia'...
remote: Enumerating objects: 23, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 23 (delta 5), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (23/23), 4.49 MiB | 7.35 MiB/s, done.
Resolving deltas: 100% (5/5), done.
Archive:  political_hope_nostalgia/Training_Datasets.zip
   creating: Training_Datasets/
  inflating: Training_Datasets/data_polnos_handcoding.csv  
  inflating: Training_Datasets/Task 2_Test_with_labels_English_PolyHope.csv  
  inflating: Training_Datasets/data_polnos_handcoding_validation.csv  


In [None]:
!unzip hope_classifier.zip

Archive:  hope_classifier.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of hope_classifier.zip or
        hope_classifier.zip.zip, and cannot find hope_classifier.zip.ZIP, period.


## Nostalgia Dataset

Müller and Proksch (2023a) created two datasets which I can use to train and validate my classifiers. One is '**data_polnos_handcoding**' and the other is '**data_polnos_handcoding_validation**'. Both come from Müller and Proksch (2023b). In Polnos Handcoding, there are 1200 sentences which where coded by four coders according to whether they contain nostalgia or not. In handcoding validation there are 3515 sentences which were coded as nostalgic by one of their annotation methods and which were manually vaidated by two coders.

Because in the first dataset there are only 219 sentences coded as nostalgic by at least two human corders, I add the sentences from the second dataset, which were also coded by two coders as nostalgic. In this way I also reduce the imbalance of the dataset.

### Dataset Import

In [None]:
nost_handcoding = pd.read_csv('/content/Training_Datasets/data_polnos_handcoding.csv')
nost_validation = pd.read_csv('/content/Training_Datasets/data_polnos_handcoding_validation.csv')

### Dataset Overview

I want to first inspect the structure and content of the datasets.

#### Handcoding

In [None]:
nost_handcoding.shape

(1200, 29)

In [None]:
nost_handcoding.dtypes

doc_id                          object
countryname                     object
party                            int64
manifesto_id                    object
text                            object
cmp_code                        object
nostalgic_at_least_1             int64
nostalgic_at_least_2             int64
nostalgic_at_least_3             int64
nostalgic_at_least_4             int64
translation_at_least_1           int64
translation_at_least_2           int64
translation_at_least_3           int64
translation_at_least_4           int64
nostalgic_coder1                 int64
nostalgic_coder2                 int64
nostalgic_coder3                 int64
nostalgic_coder4                 int64
translation_coder1               int64
translation_coder2               int64
translation_coder3               int64
translation_coder4               int64
translation_agreement_coders     int64
nostalgia_sum                    int64
nostalgia_sum_emb                int64
nostalgia_emb            

Nostalgia_agreement_coders informs about how many coders agreed that a text is be nostalgic. Let us see how the texts differ.

In [None]:
nost_handcoding.loc[nost_handcoding['nostalgia_agreement_coders'] == 0,
 ['text', 'nostalgia_agreement_coders']].head()

Unnamed: 0,text,nostalgia_agreement_coders
0,"Economic injustice and cultural, historical an...",0
1,"a new Packaging Act will be adopted, which, am...",0
2,Nationals of countries with which France has n...,0
4,The ULA condemns the complete failure of the g...,0
5,legal certainty - necessary for citizens and i...,0


In [None]:
nost_handcoding.loc[nost_handcoding['nostalgia_agreement_coders'] == 1,
 ['text', 'nostalgia_agreement_coders']].head()

Unnamed: 0,text,nostalgia_agreement_coders
8,The NATO returned to the main and most importa...,1
11,Erase from our streets and squares any honorab...,1
29,Which shows the effectiveness of what we do fr...,1
31,Germany is a successful integration of the cou...,1
35,The past four years of work was also made a se...,1


In [None]:
nost_handcoding.loc[nost_handcoding['nostalgia_agreement_coders'] == 2,
 ['text', 'nostalgia_agreement_coders']].head()

Unnamed: 0,text,nostalgia_agreement_coders
3,7. The re-emerging LSDP their active involveme...,2
7,Restore bilateral migration strategy cooperati...,2
9,11) Restoring the minimum wage at €8.65 an hour.,2
34,The result has been a mere administrative dece...,2
42,Popular Alliance proposes: - Ending the cyclic...,2


In [None]:
nost_handcoding.loc[nost_handcoding['nostalgia_agreement_coders'] > 2,
 ['text', 'nostalgia_agreement_coders']].head()

Unnamed: 0,text,nostalgia_agreement_coders
13,"• Modern, on the personal development focused ...",3
15,We disagree with the globalization that aims t...,4
32,2. Socialist ideas emerged Lithuania Lithuania...,4
41,In the history of civilization at a time when ...,4
86,The course of national history is indispensabl...,4


In [None]:
print(nost_handcoding.groupby(['nostalgic_at_least_2'])['text'].count())

nostalgic_at_least_2
0    981
1    219
Name: text, dtype: int64


#### Validation

In [None]:
nost_validation.shape

(3515, 17)

In [None]:
nost_validation.dtypes

manifesto_id                     object
countryname                      object
text_pre                         object
text                             object
text_post                        object
party                             int64
party_family_recoded             object
cmp_code                         object
nostalgia_sentence_dummy_emb      int64
nostalgia_sentence_bert           int64
nostalgia_sentence_svm            int64
nostalgic_coding_coder1           int64
nostalgic_coding_coder2           int64
nostalgia_coded_both              int64
nostalgia_coded_at_least_one      int64
score_gpt                       float64
justification_gpt                object
dtype: object

In [None]:
nost_validation.loc[nost_validation['nostalgia_coded_both'] == 1,['text']].head()

Unnamed: 0,text
7,Here for the first time since independence in ...
14,Strengthen the United Kingdom and protect and ...
16,Danish book rental and ground traffic control ...
23,The culture is not only considered one of the ...
32,Compulsory school age Restore: restoring lasti...


In [None]:
nost_validation.groupby('nostalgia_coded_both')['nostalgia_coded_both'].count()

nostalgia_coded_both
0    3269
1     246
Name: nostalgia_coded_both, dtype: int64

### Dataset Preparation

I rename columns into label and label test, because that is necessary for the NLI pipeline from Laurer et al., 2023. Then I will concat the two datasets and lowercase the texts.

In [None]:
nost_handcoding['label'] = np.where(nost_handcoding['nostalgia_agreement_coders'] >= 2, 1,0) # https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/
nost_handcoding['label_text'] = np.where(nost_handcoding['nostalgia_agreement_coders'] >= 2,
                                         'Nostalgia','Not Nostalgia') # https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/
nost_validation['label'] = np.where(nost_validation['nostalgia_coded_both'] == 1, 1,0) # https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/
nost_validation['label_text'] = np.where(nost_validation['nostalgia_coded_both'] == 1,
                                         'Nostalgia','Not Nostalgia') # https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/


In [None]:
nostalgia = nost_validation.loc[nost_validation['label'] == 1,['text', 'label', 'label_text']]
nostalgia = pd.concat([nostalgia, nost_handcoding[['text', 'label', 'label_text']]])
nostalgia.shape

(1446, 3)

The text does not entail much special characters, so it will probably be enough to lower case it.

In [None]:
nostalgia['text'] = nostalgia.apply(lambda row: row['text'].lower(),axis=1)

In [None]:
nostalgia.groupby('label_text')['text'].count()

label_text
Nostalgia        465
Not Nostalgia    981
Name: text, dtype: int64

Train Validation Test Split

In [None]:
nost_train, nost_test = train_test_split(nostalgia, test_size = .2, stratify = nostalgia['label'] )
nost_test, nost_validation = train_test_split(nost_test, test_size = .5, stratify = nost_test['label'] )

In [None]:
nost_train.groupby('label_text')['label_text'].count()

label_text
Nostalgia        372
Not Nostalgia    784
Name: label_text, dtype: int64

In [None]:
nost_test.groupby('label_text')['label_text'].count()

label_text
Nostalgia        46
Not Nostalgia    99
Name: label_text, dtype: int64

In [None]:
nost_validation.groupby('label_text')['label_text'].count()

label_text
Nostalgia        47
Not Nostalgia    98
Name: label_text, dtype: int64

## Hope
Dataset from *PolyHope: Two-level hope speech detection from tweets*. It can be downloaded through the [HOPE at IberLEF 2024](https://codalab.lisn.upsaclay.fr/competitions/17714#learn_the_details-terms_and_conditions) competition.

**Generalized Hope**
“According to Ezzy (2000), Smith and Sparkes (2005), Particularized hope is similar to the typical definition of hope used in the psychological literature as the expectation and desire for specific events and outcomes (e.g., I hope the surgery will be successful). In contrast, Generalized hope is characterized by openness to events and outcomes (e.g., I hope I will get well).” (Balouchzahi et al., 2023, p. 2)

**Realistic Hope**
“Realistic hope can be described as the hope for a specific outcome, which involves the process of mental imagery and the calculation of the probability of occurrence to prevent the person from losing touch with reality (Webb, 2007)” (Balouchzahi et al., 2023, p. 2)

**Unrealistic Hope**
“In contrast, unrealistic hopes are based on incomplete or incorrect information and hope for something unlikely to happen (Verhaeghe et al., 2007) (e.g., my grades are bad, and everyone says I have failed, but I am waiting for a miracle to happen)” (Balouchzahi et al., 2023, p. 2)

## Dataset import

In [None]:
hope = pd.read_csv('/content/Training_Datasets/Task 2_Test_with_labels_English_PolyHope.csv')

### Dataset Overview

Again. I want to get an overview about the data first.

In [None]:
hope.shape

(6192, 4)

In [None]:
hope.dtypes

text          object
binary        object
multiclass    object
id             int64
dtype: object

In [None]:
hope.groupby('binary')['binary'].count()

binary
Hope        3104
Not Hope    3088
Name: binary, dtype: int64

In [None]:
hope.groupby('multiclass')['multiclass'].count()

multiclass
Generalized Hope    1726
Not Hope            3088
Realistic Hope       730
Unrealistic Hope     648
Name: multiclass, dtype: int64

In [None]:
hope[hope['multiclass']== 'Generalized Hope'].head()

Unnamed: 0,text,binary,multiclass,id
1,#USER# Oh shit really? I would hope they'd she...,Hope,Generalized Hope,4061
2,"#USER# Good morning, Bud! 🥰 Another good decis...",Hope,Generalized Hope,1621
5,#USER# 49ers in the NFL are a private company....,Hope,Generalized Hope,3994
6,$SPY $SPX update:\nLooking excellent. Pretty n...,Hope,Generalized Hope,1780
11,I got baloons😌🌚\n\nA whole year has passed and...,Hope,Generalized Hope,6167


In [None]:
hope[hope['multiclass']== 'Realistic Hope'].head()

Unnamed: 0,text,binary,multiclass,id
0,"#USER# #USER# I'm really liking this project, ...",Hope,Realistic Hope,5820
33,Inshallah if I do become a doc and if I leave ...,Hope,Realistic Hope,4720
45,#USER# They're doing it now hoping everyone wi...,Hope,Realistic Hope,58
59,"#USER# #USER# He can pray all he wants, just n...",Hope,Realistic Hope,2599
67,#USER# #USER# hope you guys got d msg from d y...,Hope,Realistic Hope,6513


In [None]:
hope[hope['multiclass']== 'Unrealistic Hope'].head()

Unnamed: 0,text,binary,multiclass,id
3,i aspire to have the level of delusion to beli...,Hope,Unrealistic Hope,1754
16,Really wish they ain’t have to cut some of the...,Hope,Unrealistic Hope,6744
17,I wish my bf were less attractive to me like h...,Hope,Unrealistic Hope,2102
28,Wallahi Indian Muslims are the bravest Muslim...,Hope,Unrealistic Hope,3078
30,#USER# The course I wish more people created. ...,Hope,Unrealistic Hope,7739


### Dataset Preparation

In the texts there are many Hashtags and Emojis. I need to remove them, because they will not be there in the youtube content, but they might provide information on which the algorithm learns. I want the algorithm to concentrate only on the features which would be there in the final set as well.

In [None]:
# from https://www.kaggle.com/code/tariqsays/tweets-cleaning-with-python
hope['old_text'] = hope['text']
hope['text'] = hope.apply(lambda row: row['text'].lower(),axis=1)
hope['text'] = hope.apply(lambda row: re.sub("@[A-Za-z0-9_]+","",
                                                     row['text']),axis=1)
hope['text'] = hope.apply(lambda row: re.sub("#[A-Za-z0-9_]+","",
                                                   row['text']),axis=1)
# i need to do it a second time, because anonymization lead to #User# which
# was not removed
hope['text'] = hope.apply(lambda row: re.sub("#","",
                                                   row['text']),axis=1)
hope['text'] = hope.apply(lambda row: re.sub(r"http\S+","",
                                                     row['text']),axis=1)
hope['text'] = hope.apply(lambda row: re.sub(r"www.\S+","",
                                                  row['text']),axis=1)
# in the following I added punctuation, so that this is not removed
hope['text'] = hope.apply(lambda row: re.sub("[^a-z0-9\.,;:]"," ",
                                                     row['text']),axis=1)

hope.head()

Unnamed: 0,text,binary,multiclass,id,old_text
0,"i m really liking this project, let s work t...",Hope,Realistic Hope,5820,"#USER# #USER# I'm really liking this project, ..."
1,oh shit really i would hope they d shed some...,Hope,Generalized Hope,4061,#USER# Oh shit really? I would hope they'd she...
2,"good morning, bud another good decision fr...",Hope,Generalized Hope,1621,"#USER# Good morning, Bud! 🥰 Another good decis..."
3,i aspire to have the level of delusion to beli...,Hope,Unrealistic Hope,1754,i aspire to have the level of delusion to beli...
4,projects are continuously attacked by hacker...,Not Hope,Not Hope,401,#USER# #USER# Projects are continuously attack...


In [None]:
hope['label'] = np.where(hope['binary'] == 'Hope', 1,0) # https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/
hope['label_text'] = np.where(hope['binary'] == 'Hope', 'Hope','Not Hope') # https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/
hope_prepared = hope[['text', 'label', 'label_text']]

Train, Validation, Test Split

In [None]:
hope_train, hope_test = train_test_split(hope_prepared, test_size = .2, stratify = hope['label'] )
hope_test, hope_validation = train_test_split(hope_test, test_size = .5, stratify = hope_test['label'] )

In [None]:
hope_train.groupby('label_text')['label_text'].count()

label_text
Hope        2483
Not Hope    2470
Name: label_text, dtype: int64

In [None]:
hope_test.groupby('label_text')['label_text'].count()

label_text
Hope        310
Not Hope    309
Name: label_text, dtype: int64

In [None]:
hope_validation.groupby('label_text')['label_text'].count()

label_text
Hope        311
Not Hope    309
Name: label_text, dtype: int64

# The Functions


### Reformatting Functions
The functions are from [Laurer (2023)](https://colab.research.google.com/github/MoritzLaurer/summer-school-transformers-2023/blob/main/4_tune_bert_nli.ipynb#scrollTo=OkOD5cejttIV).

In [None]:
## function for reformatting the train set
def format_nli_trainset(df_train=None, hypo_label_dic=None, random_seed=42):
  print(f"Length of df_train before formatting step: {len(df_train)}.")
  length_original_data_train = len(df_train)

  df_train_lst = []
  for label_text, hypothesis in hypo_label_dic.items():
    ## entailment
    df_train_step = df_train[df_train.label_text == label_text].copy(deep=True)
    df_train_step["hypothesis"] = [hypothesis] * len(df_train_step)
    df_train_step["label"] = [0] * len(df_train_step)
    ## not_entailment
    df_train_step_not_entail = df_train[df_train.label_text != label_text].copy(deep=True)
    df_train_step_not_entail = df_train_step_not_entail.sample(n=min(len(df_train_step), len(df_train_step_not_entail)), random_state=random_seed)
    df_train_step_not_entail["hypothesis"] = [hypothesis] * len(df_train_step_not_entail)
    df_train_step_not_entail["label"] = [1] * len(df_train_step_not_entail)
    # append
    df_train_lst.append(pd.concat([df_train_step, df_train_step_not_entail]))
  df_train = pd.concat(df_train_lst)

  # shuffle
  df_train = df_train.sample(frac=1, random_state=random_seed)
  df_train["label"] = df_train.label.apply(int)
  df_train["label_nli_explicit"] = ["True" if label == 0 else "Not-True" for label in df_train["label"]]  # adding this just to simplify readibility

  print(f"After adding not_entailment training examples, the training data was augmented to {len(df_train)} texts.")
  print(f"Max augmentation could be: len(df_train) * 2 = {length_original_data_train*2}. It can also be lower, if there are more entail examples than not-entail for a majority class.")

  return df_train.copy(deep=True)

In [None]:
## function for reformatting the test set
def format_nli_testset(df_test=None, hypo_label_dic=None):
  ## explode test dataset for N hypotheses
  hypothesis_lst = [value for key, value in hypo_label_dic.items()]
  print("Number of hypotheses/classes: ", len(hypothesis_lst))

  # label lists with 0 at alphabetical position of their true hypo, 1 for not-true hypos
  label_text_label_dic_explode = {}
  for key, value in hypo_label_dic.items():
    label_lst = [0 if value == hypo else 1 for hypo in hypothesis_lst]
    label_text_label_dic_explode[key] = label_lst

  df_test["label"] = df_test.label_text.map(label_text_label_dic_explode)
  df_test["hypothesis"] = [hypothesis_lst] * len(df_test)
  print(f"Original test set size: {len(df_test)}")

  # explode dataset to have K-1 additional rows with not_entail label and K-1 other hypotheses
  # ! after exploding, cannot sample anymore, because distorts the order to true label values, which needs to be preserved for evaluation code
  df_test = df_test.explode(["hypothesis", "label"])  # multi-column explode requires pd.__version__ >= '1.3.0'
  print(f"Test set size for NLI classification: {len(df_test)}\n")

  df_test["label_nli_explicit"] = ["True" if label == 0 else "Not-True" for label in df_test["label"]]  # adding this just to simplify readibility

  return df_test.copy(deep=True)

### Metrics
Also from Laurer (2023).

In [None]:
def compute_metrics_nli_binary(eval_pred, label_text_alphabetical=None):
    predictions, labels = eval_pred

    ### reformat model output to enable calculation of standard metrics
    # split in chunks with predictions for each hypothesis for one unique premise
    def chunks(lst, n):  # Yield successive n-sized chunks from lst. https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
        for i in range(0, len(lst), n):
            yield lst[i:i + n]

    # for each chunk/premise, select the most likely hypothesis
    softmax = torch.nn.Softmax(dim=1)
    prediction_chunks_lst = list(chunks(predictions, len(set(label_text_alphabetical)) ))
    hypo_position_highest_prob = []
    for i, chunk in enumerate(prediction_chunks_lst):
        hypo_position_highest_prob.append(np.argmax(np.array(chunk)[:, 0]))  # only accesses the first column of the array, i.e. the entailment/true prediction logit of all hypos and takes the highest one

    label_chunks_lst = list(chunks(labels, len(set(label_text_alphabetical)) ))
    label_position_gold = []
    for chunk in label_chunks_lst:
        label_position_gold.append(np.argmin(chunk))  # argmin to detect the position of the 0 among the 1s

    #print("Highest probability prediction per premise: ", hypo_position_highest_prob)
    #print("Correct label per premise: ", label_position_gold)

    ### calculate standard metrics
    precision_macro, recall_macro, f1_macro, _ = precision_recall_fscore_support(label_position_gold, hypo_position_highest_prob, average='macro')  # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html
    precision_micro, recall_micro, f1_micro, _ = precision_recall_fscore_support(label_position_gold, hypo_position_highest_prob, average='micro')  # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html
    acc_balanced = balanced_accuracy_score(label_position_gold, hypo_position_highest_prob)
    acc_not_balanced = accuracy_score(label_position_gold, hypo_position_highest_prob)
    metrics = {
        'accuracy': acc_not_balanced,
        'f1_macro': f1_macro,
        'accuracy_balanced': acc_balanced,
        'f1_micro': f1_micro,
        'precision_macro': precision_macro,
        'recall_macro': recall_macro,
        'precision_micro': precision_micro,
        'recall_micro': recall_micro,
        #'label_gold_raw': label_position_gold,
        #'label_predicted_raw': hypo_position_highest_prob
    }
    #print("Aggregate metrics: ", {key: metrics[key] for key in metrics if key not in ["label_gold_raw", "label_predicted_raw"]} )  # print metrics but without label lists
    print("Detailed metrics: ", classification_report(label_position_gold, hypo_position_highest_prob, labels=np.sort(pd.factorize(label_text_alphabetical, sort=True)[0]), target_names=label_text_alphabetical, sample_weight=None, digits=2, output_dict=True,
                                zero_division='warn'), "\n")
    return metrics



### Function for One-Shot Inferences
I created the following function to consolidate some of the steps Laurer (2023) goes through and make it easier applicable. I mark my own additions with a #bj.

In [None]:
def one_shot_inference(df_test, hypo_label_dic, model_name):

  hypothesis_lst = list(hypo_label_dic.values())

  print('Initializing Tokenizer') #bj
  tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)

  print('\nInitializing Pipeline')
  pipe_classifier = pipeline(
    "zero-shot-classification",
    model= model_name,
    tokenizer=tokenizer,
    framework="pt",
    device=device,
  )

  # Create dummy dataset  #bj
  df_inference = df_test.copy(deep=True)
  text_lst = df_inference["text"].tolist()

  # use the pipeline with your chosen model for inference (prediction)
  print('\nPredicting ...') #bj
  pipe_output = pipe_classifier(
      text_lst,  # input any list of texts here
      candidate_labels=hypothesis_lst,
      hypothesis_template="{}",
      multi_label=False,  # here you can decide if, for your task, only one hypothesis can be true, or multiple can be true
      batch_size=32  # reduce this number to 8 or 16 if you get an out-of-memory error
  )
  print(pipe_output)

  # extract the predictions from pipe_outut
  hypothesis_pred_true_probability = []
  hypothesis_pred_true = []
  for dic in pipe_output:
     hypothesis_pred_true_probability.append(dic["scores"][0])
     hypothesis_pred_true.append(dic["labels"][0])

  # map the long hypotheses to their corresponding short label names
  hypothesis_label_dic_inference_inverted = {value: key for key, value in hypo_label_dic.items()}
  label_pred = [hypothesis_label_dic_inference_inverted[hypo] for hypo in hypothesis_pred_true]

  # add inference data to your original dataframe
  df_inference["label_text_pred"] = label_pred
  df_inference["label_text_pred_proba"] = hypothesis_pred_true_probability

  # printing the classification report #bj
  print("\n")
  print(classification_report(df_inference['label_text'],  #bj
                              df_inference['label_text_pred'])) #bj

  return df_inference

### Function for Fine-Tuning
In the following function I consolidate the Fine-Tuning Steps from Laurer (2023)

In [None]:
def model_finetuning(name, df_train, df_test, hypo_label_dic, model_name, seed):
  # Prepare Data
  print('\nData Preparation and Tokenizer Download\n')
  df_train_formatted = format_nli_trainset(df_train=df_train,
                                           hypo_label_dic=hypo_label_dic,
                                           random_seed=seed)
  df_test_formatted = format_nli_testset(df_test=df_test,
                                         hypo_label_dic=hypo_label_dic)
  dataset = datasets.DatasetDict({
      "train": datasets.Dataset.from_pandas(df_train_formatted),
      "test": datasets.Dataset.from_pandas(df_test_formatted)
  })

  # Tokenize
  tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)

  def tokenize_nli_format(examples):
    return tokenizer(examples["text"], examples["hypothesis"], truncation=True, max_length=512)  # max_length can be reduced to e.g. 256 to increase speed, but long texts will be cut off

  dataset = dataset.map(tokenize_nli_format, batched=True)
  dataset = dataset.remove_columns([
    'label_text'])

  label_text_alphabetical = np.sort(df_train.label_text.unique())

  # Set Training Arguments and Hyperparameter
  print('\nSetting Training Arguments and Hyperparameter\n')
  fp16_bool = True if torch.cuda.is_available() else False

  train_args = TrainingArguments(
    output_dir=f'./results/{name}', #bj
    logging_dir=f'./logs/{name}', #bj
    learning_rate=2e-5,
    per_device_train_batch_size=16,  # if you get an out-of-memory error, reduce this value to 8 or 4 and restart the runtime. Higher values increase training speed, but also increase memory requirements. Ideal values here are always a multiple of 8.
    per_device_eval_batch_size=80,  # if you get an out-of-memory error, reduce this value, e.g. to 40 and restart the runtime
    #gradient_accumulation_steps=4, # Can be used in case of memory problems to reduce effective batch size. accumulates gradients over X steps, only then backward/update. decreases memory usage, but also slightly speed. (!adapt/halve batch size accordingly)
    num_train_epochs=3,  # this can be increased, but higher values increase training time. Good values for NLI are between 3 and 20.
    warmup_ratio=0.25,  # a good normal default value is 0.06 for normal BERT-base models, but since we want to reuse prior NLI knowledge and avoid catastrophic forgetting, we set the value higher
    weight_decay=0.1,
    seed=seed, #bj
    load_best_model_at_end=True,
    metric_for_best_model="f1_macro",
    fp16=fp16_bool,  # Can speed up training and reduce memory consumption, but only makes sense at batch-size > 8. loads two copies of model weights, which creates overhead. https://huggingface.co/transformers/performance.html?#fp16
    fp16_full_eval=fp16_bool,
    evaluation_strategy="epoch", # options: "no"/"steps"/"epoch"
    #eval_steps=10_000,  # evaluate after n steps if evaluation_strategy!='steps'. defaults to logging_steps
    save_strategy = "epoch",  # options: "no"/"steps"/"epoch"
    #save_steps=10_000,              # Number of updates steps before two checkpoint saves.
    #save_total_limit=10,             # If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir
    #logging_strategy="steps",
    report_to="all",  # "all"  # logging
    #push_to_hub=False,
    #push_to_hub_model_id=f"{model_name}-finetuned-{task}",
  )

  # Train
  print("\nDownloading the Model\n")
  model = AutoModelForSequenceClassification.from_pretrained(model_name)

  print("\nTraining\n")
  trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    compute_metrics=lambda eval_pred: compute_metrics_nli_binary(eval_pred, label_text_alphabetical=label_text_alphabetical)
  )

  trainer.train()
  print("\nEvaluating\n")
  results = trainer.evaluate()
  print(results)
  trainer.save_model(output_dir=f'{name}_NLI_classifier')

  return(trainer, tokenizer)


# The Model

## Nostalgia Analysis

### One-Shot

In [None]:
nost_hypotheses = {
    "Nostalgia": "The text expresses nostalgia, it speaks positive about events and objects in the past or the past in general.",
    "Not Nostalgia": "The text does not express nostalgia."
    }
model = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-docnli-ling-2c"

dataset = one_shot_inference(nost_train, nost_hypotheses, model)

Initializing Tokenizer





Initializing Pipeline


config.json:   0%|          | 0.00/1.04k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/369M [00:00<?, ?B/s]


Predicting ...
[{'sequence': 'aigars kalvitis-led government has worked for one of the most prolific periods of work in latvian history.', 'labels': ['The text does not express nostalgia.', 'The text expresses nostalgia, it speaks positive about events and objects in the past or the past in general.'], 'scores': [0.7350220680236816, 0.26497793197631836]}, {'sequence': 'understanding the language of co-operation over the limit and the second language group needs is a basic prerequisite for the swedish language remains alive in administration, education, care and justice.', 'labels': ['The text does not express nostalgia.', 'The text expresses nostalgia, it speaks positive about events and objects in the past or the past in general.'], 'scores': [0.6923011541366577, 0.3076988160610199]}, {'sequence': '7.-opening economic opportunities.', 'labels': ['The text does not express nostalgia.', 'The text expresses nostalgia, it speaks positive about events and objects in the past or the past i

### Fine-Tuning

In [None]:
nost_hypotheses = {
    "Nostalgia": "The text expresses nostalgia, it speaks positive about events and objects in the past or the past in general.",
    "Not Nostalgia": "The text does not express nostalgia."
    }
model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-docnli-ling-2c"


trainer, tokenizer = model_finetuning('nostalgia',
                 nost_train,
                 nost_validation,
                 nost_hypotheses,
                 model_name,
                 SEED_GLOBAL)


Data Preparation and Tokenizer Download

Length of df_train before formatting step: 1156.
After adding not_entailment training examples, the training data was augmented to 1900 texts.
Max augmentation could be: len(df_train) * 2 = 2312. It can also be lower, if there are more entail examples than not-entail for a majority class.
Number of hypotheses/classes:  2
Original test set size: 145
Test set size for NLI classification: 290





Map:   0%|          | 0/1900 [00:00<?, ? examples/s]

Map:   0%|          | 0/290 [00:00<?, ? examples/s]


Setting Training Arguments and Hyperparameter


Downloading the Model






Training



Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro,Accuracy Balanced,F1 Micro,Precision Macro,Recall Macro,Precision Micro,Recall Micro
1,No log,0.258854,0.931034,0.922938,0.932371,0.931034,0.915415,0.932371,0.931034,0.931034
2,No log,0.293588,0.944828,0.937715,0.942575,0.944828,0.933355,0.942575,0.944828,0.944828
3,No log,0.287522,0.944828,0.937715,0.942575,0.944828,0.933355,0.942575,0.944828,0.944828


Detailed metrics:  {'Nostalgia': {'precision': 0.8627450980392157, 'recall': 0.9361702127659575, 'f1-score': 0.8979591836734694, 'support': 47}, 'Not Nostalgia': {'precision': 0.9680851063829787, 'recall': 0.9285714285714286, 'f1-score': 0.9479166666666666, 'support': 98}, 'accuracy': 0.9310344827586207, 'macro avg': {'precision': 0.9154151022110972, 'recall': 0.932370820668693, 'f1-score': 0.922937925170068, 'support': 145}, 'weighted avg': {'precision': 0.9339404140232763, 'recall': 0.9310344827586207, 'f1-score': 0.9317235514895613, 'support': 145}} 

Detailed metrics:  {'Nostalgia': {'precision': 0.8979591836734694, 'recall': 0.9361702127659575, 'f1-score': 0.9166666666666666, 'support': 47}, 'Not Nostalgia': {'precision': 0.96875, 'recall': 0.9489795918367347, 'f1-score': 0.9587628865979382, 'support': 98}, 'accuracy': 0.9448275862068966, 'macro avg': {'precision': 0.9333545918367347, 'recall': 0.9425749023013461, 'f1-score': 0.9377147766323024, 'support': 145}, 'weighted avg': {'

Detailed metrics:  {'Nostalgia': {'precision': 0.8979591836734694, 'recall': 0.9361702127659575, 'f1-score': 0.9166666666666666, 'support': 47}, 'Not Nostalgia': {'precision': 0.96875, 'recall': 0.9489795918367347, 'f1-score': 0.9587628865979382, 'support': 98}, 'accuracy': 0.9448275862068966, 'macro avg': {'precision': 0.9333545918367347, 'recall': 0.9425749023013461, 'f1-score': 0.9377147766323024, 'support': 145}, 'weighted avg': {'precision': 0.9458040112596764, 'recall': 0.9448275862068966, 'f1-score': 0.9451179049650432, 'support': 145}} 

{'eval_loss': 0.2935761511325836, 'eval_accuracy': 0.9448275862068966, 'eval_f1_macro': 0.9377147766323024, 'eval_accuracy_balanced': 0.9425749023013461, 'eval_f1_micro': 0.9448275862068966, 'eval_precision_macro': 0.9333545918367347, 'eval_recall_macro': 0.9425749023013461, 'eval_precision_micro': 0.9448275862068966, 'eval_recall_micro': 0.9448275862068966, 'eval_runtime': 0.7015, 'eval_samples_per_second': 413.425, 'eval_steps_per_second': 5.

Next, I upload the model to Huggingface. (Again according to Laurer, 2023)

In [None]:
#model_path = "/content/nostalgia_NLI_classifier"
#model = AutoModelForSequenceClassification.from_pretrained(model_path)
#tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, model_max_length=512)  # we load the tokenizer from the original BERT-NLI model
#repo_id = 'beja1996/NLI_nostalgia_classification'
#model.push_to_hub(repo_id=repo_id, use_temp_dir=True, private=True, use_auth_token="")
#tokenizer.push_to_hub(repo_id=repo_id, use_temp_dir=True, private=True, use_auth_token="")



model.safetensors:   0%|          | 0.00/738M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/beja1996/NLI_nostalgia_classification/commit/2ccf913121a4795ae1b73fac0f97d556c40c4ece', commit_message='Upload tokenizer', commit_description='', oid='2ccf913121a4795ae1b73fac0f97d556c40c4ece', pr_url=None, pr_revision=None, pr_num=None)

### Evaluating the finetuned model
Our model is even slightly better than the best model from Müller and Proksch [\(DistilBERT, F1 = 0.81\)](https://static.cambridge.org/content/id/urn:cambridge.org:id:article:S0007123423000571/resource/name/S0007123423000571sup001.pdf).

In [None]:
model = "beja1996/NLI_nostalgia_classification" # Getting the Fine-Tuned Model from Huggingface
nost_hypotheses = {
    "Nostalgia": "The text expresses nostalgia, it speaks positive about events and objects in the past or the past in general.",
    "Not Nostalgia": "The text does not express nostalgia."
    }

dataset = one_shot_inference(nost_test, nost_hypotheses, model)

Initializing Tokenizer




tokenizer_config.json:   0%|          | 0.00/1.36k [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.65M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/970 [00:00<?, ?B/s]


Initializing Pipeline


config.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/738M [00:00<?, ?B/s]


Predicting ...
[{'sequence': 'as is known, however, in order to justify measures which, circumventing the rules would allow direct financing in favor of private schools, we have tried, over the years, to interpret the provisions of our charter in a less stringent .', 'labels': ['The text does not express nostalgia.', 'The text expresses nostalgia, it speaks positive about events and objects in the past or the past in general.'], 'scores': [0.9976233839988708, 0.0023766392841935158]}, {'sequence': 'and subsidies to purchase new vehicles that match their comfort of the 21st century.', 'labels': ['The text does not express nostalgia.', 'The text expresses nostalgia, it speaks positive about events and objects in the past or the past in general.'], 'scores': [0.9962823987007141, 0.003717554034665227]}, {'sequence': '/ cultural heritage and cultural events to be included in the overall tourism offer and sustainable economic development.', 'labels': ['The text expresses nostalgia, it speaks

## Hope Analysis

### One-Shot

In [None]:
hope_hypotheses = {
    "Hope": "The text expresses hope, a future-oriented expectation, desire or wish towards a general or specific event.",
    "Not Hope": "The text does not express hope, wish, desire, or future-oriented expectation."
}

model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-docnli-ling-2c"

dataset = one_shot_inference(hope_train, hope_hypotheses, model_name)

Initializing Tokenizer





Initializing Pipeline

Predicting ...


              precision    recall  f1-score   support

        Hope       0.57      0.98      0.72      2483
    Not Hope       0.92      0.27      0.42      2470

    accuracy                           0.62      4953
   macro avg       0.74      0.62      0.57      4953
weighted avg       0.74      0.62      0.57      4953



### Fine-Tuning

In [None]:
hope_trainer = model_finetuning('hope',
                 hope_train,
                 hope_validation,
                 hope_hypotheses,
                 model_name,
                 SEED_GLOBAL)


Data Preparation and Tokenizer Download

Length of df_train before formatting step: 4953.
After adding not_entailment training examples, the training data was augmented to 9893 texts.
Max augmentation could be: len(df_train) * 2 = 9906. It can also be lower, if there are more entail examples than not-entail for a majority class.
Number of hypotheses/classes:  2
Original test set size: 620
Test set size for NLI classification: 1240





Map:   0%|          | 0/9893 [00:00<?, ? examples/s]

Map:   0%|          | 0/1240 [00:00<?, ? examples/s]


Setting Training Arguments and Hyperparameter


Downloading the Model






Training



Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro,Accuracy Balanced,F1 Micro,Precision Macro,Recall Macro,Precision Micro,Recall Micro
1,0.5121,0.478281,0.841935,0.841339,0.842142,0.841935,0.847641,0.842142,0.841935,0.841935
2,0.2822,0.454948,0.837097,0.837096,0.8371,0.837097,0.837097,0.8371,0.837097,0.837097
3,0.172,0.693536,0.851613,0.851588,0.85158,0.851613,0.851756,0.85158,0.851613,0.851613


Detailed metrics:  {'Hope': {'precision': 0.8929889298892989, 'recall': 0.7781350482315113, 'f1-score': 0.831615120274914, 'support': 311}, 'Not Hope': {'precision': 0.8022922636103151, 'recall': 0.9061488673139159, 'f1-score': 0.8510638297872339, 'support': 309}, 'accuracy': 0.8419354838709677, 'macro avg': {'precision': 0.8476405967498071, 'recall': 0.8421419577727136, 'f1-score': 0.841339475031074, 'support': 620}, 'weighted avg': {'precision': 0.8477868816954184, 'recall': 0.8419354838709677, 'f1-score': 0.8413081061447637, 'support': 620}} 

Detailed metrics:  {'Hope': {'precision': 0.8387096774193549, 'recall': 0.8360128617363344, 'f1-score': 0.8373590982286635, 'support': 311}, 'Not Hope': {'precision': 0.8354838709677419, 'recall': 0.8381877022653722, 'f1-score': 0.8368336025848142, 'support': 309}, 'accuracy': 0.8370967741935483, 'macro avg': {'precision': 0.8370967741935484, 'recall': 0.8371002820008533, 'f1-score': 0.8370963504067388, 'support': 620}, 'weighted avg': {'preci

Detailed metrics:  {'Hope': {'precision': 0.8454258675078864, 'recall': 0.8617363344051447, 'f1-score': 0.8535031847133758, 'support': 311}, 'Not Hope': {'precision': 0.858085808580858, 'recall': 0.8414239482200647, 'f1-score': 0.849673202614379, 'support': 309}, 'accuracy': 0.8516129032258064, 'macro avg': {'precision': 0.8517558380443722, 'recall': 0.8515801413126047, 'f1-score': 0.8515881936638774, 'support': 620}, 'weighted avg': {'precision': 0.8517354187845771, 'recall': 0.8516129032258064, 'f1-score': 0.8515943710543596, 'support': 620}} 

{'eval_loss': 0.6935445666313171, 'eval_accuracy': 0.8516129032258064, 'eval_f1_macro': 0.8515881936638774, 'eval_accuracy_balanced': 0.8515801413126047, 'eval_f1_micro': 0.8516129032258064, 'eval_precision_macro': 0.8517558380443722, 'eval_recall_macro': 0.8515801413126047, 'eval_precision_micro': 0.8516129032258064, 'eval_recall_micro': 0.8516129032258064, 'eval_runtime': 2.805, 'eval_samples_per_second': 442.064, 'eval_steps_per_second': 5.

Next, I upload the model to Huggingface. (Again according to Laurer, 2023)

In [None]:
#model_path = "/content/hope_NLI_classifier"
#model = AutoModelForSequenceClassification.from_pretrained(model_path)
#tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, model_max_length=512)  # we load the tokenizer from the original BERT-NLI model
#repo_id = 'beja1996/NLI_hope_classification'
#model.push_to_hub(repo_id=repo_id, use_temp_dir=True, private=True, use_auth_token="")
#tokenizer.push_to_hub(repo_id=repo_id, use_temp_dir=True, private=True, use_auth_token="")

model.safetensors:   0%|          | 0.00/738M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/beja1996/NLI_hope_classification/commit/720e25495e6906be311d8dc032650e0befc1fda3', commit_message='Upload tokenizer', commit_description='', oid='720e25495e6906be311d8dc032650e0befc1fda3', pr_url=None, pr_revision=None, pr_num=None)

### Evaluating the finetuned model

In [None]:
model = "beja1996/NLI_hope_classification"

dataset = one_shot_inference(hope_test, hope_hypotheses, model)

Initializing Tokenizer




tokenizer_config.json:   0%|          | 0.00/1.36k [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.65M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/970 [00:00<?, ?B/s]


Initializing Pipeline


config.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/738M [00:00<?, ?B/s]


Predicting ...
[{'sequence': '  jimmie johnson has 22 nascar cup wins on concrete tracks  if you count martinsville with concrete corners ', 'labels': ['The text does not express hope, wish, desire, or future-oriented expectation.', 'The text expresses hope, a future-oriented expectation, desire or wish towards a general or specific event.'], 'scores': [0.9994212985038757, 0.0005787022528238595]}, {'sequence': ' what else would you expect from nike  nike has supported everything concerning the woke generation.', 'labels': ['The text does not express hope, wish, desire, or future-oriented expectation.', 'The text expresses hope, a future-oriented expectation, desire or wish towards a general or specific event.'], 'scores': [0.9991124272346497, 0.0008875105413608253]}, {'sequence': 'i wish i had a good reason to want to replace my phone my galaxy s10e is a great phone, but i keep seeing all these foldable phones and great cameras and it makes me yearn for something more', 'labels': ['Th

# References
- Balouchzahi, F., Sidorov, G., & Gelbukh, A. (2023). PolyHope: Two-level hope speech detection from tweets. Expert Systems with Applications, 225, 120078. https://doi.org/10.1016/j.eswa.2023.120078
- Laurer, M. (2023, August 22). Fine-tuning BERT-NLI. Data Science Summer School 2023, Berlin. https://github.com/MoritzLaurer/summer-school-transformers-2023/blob/main/4_tune_bert_nli.ipynb
- Müller, S., & Proksch, S.-O. (2023). Nostalgia in European Party Politics: A Text-Based Measurement Approach. British Journal of Political Science, 1–13. doi:10.1017/S0007123423000571
- Müller, S., & Proksch, S.-O. (2023). PolNos: Political nostalgia in party manifestos \[Data set]. doi:10.7910/DVN/L198GI