## Results with Improved Prompting + Best Model
| method | model_name      | acc (no info)    | f1 (no info)    | acc (definition) | f1 (definition) |
|-|-------------------|--------|-------|-|-|
|Classification|bert-base-cased (last)|0.522 | 0.5218 | 0.5188 | 0.5188 |
|Classification|bert-base-cased (best) | 0.5418 | 0.5355 | 0.545 | 0.5328 |
|Classification|roberta-base (last) | 0.5471 | 0.5498 | 0.5596 | 0.5619 |
|Classification|roberta-base (best)  | 0.5837 | 0.5785 | 0.5879 | 0.5873 |
||||||
|Masking|bert-base-cased (last)  | 0.4498 | 0.4522 | 0.4393 | 0.4348 |
|Masking|bert-base-cased (best)  | 0.4801 | 0.4867 | 0.4728 | 0.4895 |
|Masking|roberta-base (last) | 0.0481 | 0.0918 | 0.0575 | 0.1088 |
|Masking|roberta-base (best)  | 0.522 | 0.5246 | 0.5073 | 0.522 |

<br>

---

<br>

## Results of Sentence Pair Classification (Best Model Only)
| Emotion Range | Model Name | Prompting Method | acc | f1 |
|-|-|-|-|-|
|full|bert-base-cased|no info|0.5387|0.5309|
|full|bert-base-cased|idiom hint|0.5513|0.5459|
|full|bert-base-cased|idiom hint + definition|0.5262|0.5184|
|full|roberta-base|no info|0.59|0.5852|
|**full**|**roberta-base**|**idiom hint**|**0.59**|**0.5855**|
|full|roberta-base|idiom hint + definition|0.5701|0.5693|
|||||
|reduced|bert-base-cased|no info|0.8002|0.7976|
|reduced|bert-base-cased|idiom hint|0.795|0.7947|
|reduced|bert-base-cased|idiom hint + definition|0.7992|0.797|
|reduced|roberta-base|no info|0.8243|0.8219|
|reduced|roberta-base|idiom hint|0.8211|0.8218|
|reduced|roberta-base|idiom hint + definition|0.8149|0.8154|

<br>

### For the B-Sentence:
**idiom hint**: "This sentence includes the idiomatic expression [IDIOM]"


**idiom hint + definition**: "This sentence includes the idiomatic expression [IDIOM]. The definition of this idiom is [DEFINITION]"

In [None]:
import pandas as pd
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import torch
import copy

In [2]:
id2emotion = {
    0: 'Anger',
    1: 'Resentment',
    2: 'Frustration',
    3: 'Hate',
    4: 'Disgust',
    5: 'Boredom',
    6: 'Reluctance',
    7: 'Sadness',
    8: 'Pity',
    9: 'Loneliness',
    10: 'Humiliation',
    11: 'Longing',
    12: 'Envy',
    13: 'Guilt',
    14: 'Regret',
    15: 'Shame',
    16: 'Fear',
    17: 'Anxiety',
    18: 'Doubt',
    19: 'Desperation',
    20: 'Confusion',
    21: 'Shock',
    22: 'Pleasure',
    23: 'Serenity',
    24: 'Relief',
    25: 'Happiness',
    26: 'Lust',
    27: 'Affection',
    28: 'Gratitude',
    29: 'Admiration',
    30: 'Pride',
    31: 'Determination',
    32: 'Fascination',
    33: 'Surprise',
    34: 'Excitement',
    35: 'Hope'
}
emotion2id = {val: key for key, val in id2emotion.items()}

In [3]:
idiom_lexicon = pd.read_csv('../../../dataset/idiom_lexicon.csv')
idiom_lexicon.head()

Unnamed: 0,Idiom,WiktionaryURL,Pos,Neg,Neu,Inapprop.,Total,%Pos,%Neg,%Neu,sentiment,FilterOut(X),definition,idiom_id
0,American Dream,https://en.wiktionary.org/wiki/American_Dream,8,0,2,0,10,0.8,0.0,0.2,positive,,A widespread determination by Americans to pro...,0.0
1,Catch-22,https://en.wiktionary.org/wiki/Catch-22,0,7,3,0,10,0.0,0.7,0.3,negative,,A difficult situation from which there is no e...,1.0
2,Christmas present,https://en.wiktionary.org/wiki/Christmas_present,6,0,4,0,10,0.6,0.0,0.4,positive,,,2.0
3,Downing Street,https://en.wiktionary.org/wiki/Downing_Street,0,0,10,0,10,0.0,0.0,1.0,other,,,3.0
4,Dutch courage,https://en.wiktionary.org/wiki/Dutch_courage,2,2,6,0,10,0.2,0.2,0.6,other,,The courage or bravado induced by alcohol. An ...,4.0


In [4]:
wrong_emotion_ids = []

In [5]:
def make_df_from_csv(filepath):
    dataset = pd.read_csv(filepath, index_col=0)
    idiom, text_a, text_b, emotion, emotion_id = [], [], [], [], []
    for i, row in dataset.iterrows():

        if row['emotion'] not in emotion2id.keys():
            if 'eval' in filepath:
                wrong_emotion_ids.append(i)
            continue
        idiom.append(row['idiom'])

        if row['sentence'][0] == ' ':
            full_sent = row['sentence'][1:]
        else:
            full_sent = row['sentence']

        try:
            idiom_info = idiom_lexicon[idiom_lexicon['Idiom'] == dataset.loc[i]['idiom']]['Idiom'].values[0]
        except IndexError:
            idiom_info = "."
        full_sent_b = f'This sentence includes the idiomatic expression \"{idiom_info}\".'
        # full_sent_b = 'This sentence may or may not contain an idiomatic expression.'

        if 'eval' in filepath:
            definition = idiom_lexicon[idiom_lexicon['Idiom'] == dataset.loc[i]['idiom']]['definition'].values
        else:
            definition = idiom_lexicon[idiom_lexicon['idiom_id'] == dataset.loc[i]['idiom_id']]['definition'].values

        if len(definition) > 0:
            full_sent_b += f' The definition of this idiom is "{definition[0]}."'

        text_a.append(full_sent)
        text_b.append(full_sent_b)
        emotion.append(row['emotion'])
        emotion_id.append(emotion2id[row['emotion']])



    df = pd.DataFrame()
    df['idiom'] = idiom
    df['text_a'] = text_a
    df['text_b'] = text_b
    df['emotion'] = emotion
    df['emotion_id'] = emotion_id

    return df

In [6]:
train_df = make_df_from_csv('../../../dataset/idem_train.csv')
eval_df = make_df_from_csv('../../../dataset/idem_test.csv')

In [13]:
indices = []
for i, row in train_df.iterrows():
    # if row['sentence'] in eval_df['sentence'].to_list():
    if row['text_a'] in eval_df['text_a'].to_list():
        indices.append(i)


for index in indices:
    train_df = train_df.drop(index=index)
train_df = train_df.reset_index(drop=True)

print(len(indices))
eval_df.head()

0


Unnamed: 0,idiom,text_a,text_b,emotion,emotion_id
0,jot down,The detective jotted down clues with fascinati...,This sentence includes the idiomatic epxressio...,Fascination,32
1,find it in one's heart,Can you find it in your heart to give this poo...,This sentence includes the idiomatic epxressio...,Pity,8
2,business as usual,"Even after winning the lottery, Jane returned ...",This sentence includes the idiomatic epxressio...,Anxiety,17
3,close to home,When the speaker discussed the struggles of si...,This sentence includes the idiomatic epxressio...,Gratitude,28
4,fourth wall,Whenever the main character in the novel broke...,This sentence includes the idiomatic epxressio...,Excitement,34


In [14]:
st_train_df = pd.DataFrame({
    'text_a': train_df['text_a'],
    'text_b': train_df['text_b'],
    'labels': train_df['emotion_id'],
})
st_eval_df = pd.DataFrame({
    'text_a': eval_df['text_a'],
    'text_b': eval_df['text_b'],
    'labels': eval_df['emotion_id'],
})

In [15]:
print(f'Length Training Data: {len(st_train_df)}')
print(f'Length Eval Data: {len(st_eval_df)}')

Length Training Data: 8729
Length Eval Data: 956


In [19]:
print(st_eval_df['text_b'][1])

This sentence includes the idiomatic epxression "find it in one's heart". The definition of this idiom is "To feel compassionate, especially in order to forgive someone or to be willing to help them in some way.."


In [None]:
model_args = ClassificationArgs(
    num_train_epochs = 1,
    #evaluate_during_training = True,
    overwrite_output_dir = True,
    save_eval_checkpoints=True,
    train_batch_size=16,
    eval_batch_size=16
)
model = ClassificationModel(
    'roberta',
    'roberta-base',
    args=model_args,
    num_labels=len(emotion2id.keys()),
    use_cuda=torch.cuda.is_available()
)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
from sklearn.metrics import f1_score, accuracy_score

best_model = None
best_acc = 0

for i in range(10):
  model.train_model(st_train_df)

  predictions, raw_outputs = model.predict([[x, y] for x, y in zip(st_eval_df['text_a'].to_list(), st_eval_df['text_b'].to_list())])
  acc = accuracy_score(st_eval_df['labels'].to_list(), predictions)
  f1 = f1_score(st_eval_df['labels'].to_list(), predictions, average='weighted')

  if acc > best_acc:
    print(f'Epoch: {i+1}')
    print(f'Acc: {round(acc, 4)}')
    print(f'F1: {round(f1, 4)}')
    best_model = copy.deepcopy(model)
    best_acc = acc

  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]

  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

Epoch: 1
Acc: 0.5063
F1: 0.4751


  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]



  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

Epoch: 2
Acc: 0.5669
F1: 0.5587


  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]



  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

Epoch: 3
Acc: 0.5879
F1: 0.582


  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]

  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]

  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]

  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]



  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]

  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]



  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

  0%|          | 0/8729 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/546 [00:00<?, ?it/s]

  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

In [None]:
predictions, raw_outputs = best_model.predict(st_eval_df['text'].to_list())

  0%|          | 0/956 [00:00<?, ?it/s]

  0%|          | 0/60 [00:00<?, ?it/s]

In [None]:
from sklearn.metrics import f1_score, accuracy_score

acc = accuracy_score(st_eval_df['labels'].to_list(), predictions)
f1 = f1_score(st_eval_df['labels'].to_list(), predictions, average='weighted')

print(f'Accuracy Score: {round(acc, 4)}')
print(f'F1 Score: {round(f1, 4)}')

Accuracy Score: 0.5785
F1 Score: 0.5767
