<a href="https://colab.research.google.com/github/Iispar/hlt-project/blob/main/course_project_2023_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to HLT 2023 Project (Template)

- Student(s) Name(s): Iiro Partanen
- Date: -
- Chosen Corpus: emotion
- Contributions (if group project):

### Corpus information

- Description of the chosen corpus: 
Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. For more detailed information please refer to the paper.
- Paper(s) and other published materials related to the corpus: 
  - CARER: Contextualized Affect Representations for Emotion Recognition (Saravia et al., EMNLP 2018)
  - https://paperswithcode.com/dataset/emotion
- State-of-the-art performance (best published results) on this corpus: 95% f1

---

## 1. Setup

In [8]:
!pip3 install -q transformers datasets evaluate
!pip install trankit
!pip install optuna
import datasets
import sklearn.feature_extraction
import torch
import transformers
import numpy as np
import evaluate
import optuna

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting optuna
  Downloading optuna-3.1.1-py3-none-any.whl (365 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
Collecting alembic>=1.5.0
  Downloading alembic-1.10.3-py3-none-any.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.3/212.3 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
Collecting cmaes>=0.9.1
  Downloading cmaes-0.9.1-py3-none-any.whl (21 kB)
Collecting colorlog
  Downloading colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Collecting Mako
  Downloading Mako-1.2.4-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, cmaes, alembic, optuna

---

## 2. Data download and preprocessing

### 2.1. Download the corpus

In [9]:
dset = datasets.load_dataset("emotion");
# check it works
print(dset);



  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})


### 2.2. Preprocessing

In [10]:
# vectorizes one item
def vectorize_item(item):
    vectorized = vectorizer.transform([item["text"]]); # vectorize. Initialized below...
    non_zero_features = vectorized.nonzero()[1]; # get the nonzeros and we take only the columns of the nonzeros because our matrix is only one row.
    non_zero_features += 1; # index zero is for padding so let's avoid it by adding 1 to all.

    return {"input_ids":non_zero_features} 

In [11]:
dset.shuffle(); #shuffle dataset for safety

# vectorization.
vectorizer = sklearn.feature_extraction.text.CountVectorizer( # get the vectorizer.
    binary = True,
    max_features = 20000, # Selected 20k to start with.
    token_pattern = r"(?u)\b\w+\b" # Token pattern to include one char words.
    )

texts=[item["text"] for item in dset["train"]]; # get all texts from train
vectorizer.fit(texts); # fitting the vectorizer

# vectorize the whole dataset.
dset_tokenized = dset.map(vectorize_item,num_proc=4);
# check it works
print(dset_tokenized["train"][0]);



{'text': 'i didnt feel humiliated', 'label': 0, 'input_ids': [3620, 4931, 6438, 6495]}


In [12]:
# padding and batching
def collator(list_of_items):
    allLabels = [item["label"] for item in list_of_items]; # list of all labels.
    batch = {"labels": torch.tensor(allLabels)}; # create a tenstor for the item (batch)
    tensors = [];
    max_len = max(len(item["input_ids"]) for item in list_of_items); # longest example in the batch. Pad to here.
    for item in list_of_items:
        ids = torch.tensor(item["input_ids"]); # input ids to tensor
        padded = torch.nn.functional.pad(ids,(0,max_len-ids.shape[0])); # actual padding. Pads ids, from + to max with 0.
        tensors.append(padded); # appended ids to tensors
    batch["input_ids"] = torch.vstack(tensors); # stacks items as they are now same len. Now these are matrixes.
    return batch;

# check it works
batch=collator([dset_tokenized["train"][2],dset_tokenized["train"][7]])
print("Shape of labels:",batch["labels"].shape)
print("Shape of input_ids:",batch["input_ids"].shape)
print(batch["labels"])
print(batch["input_ids"])
     

Shape of labels: torch.Size([2])
Shape of input_ids: torch.Size([2, 13])
tensor([3, 4])
tensor([[    1,  4931,  5739,  5800,  6495,  6563,  8456, 10157, 13643, 15061,
             0,     0,     0],
        [    1,    34,   749,  2680,  4931,  6495,  7076,  7712,  8076,  9245,
          9325, 13339, 15116]])


---

## 3. Machine learning model

### 3.1. Model training

In [13]:
# Your code to train the machine learning model on the training set and evaluate the performance on the validation set here

# needs a config, we wil just pass it.
class MLPConfig(transformers.PretrainedConfig):
    pass;

# model
class MLP(transformers.PreTrainedModel):
      config_class = MLPConfig; # sets config
      #initilazition
      def __init__(self,config):
        super().__init__(config); # call the super with out config which is now pass..
        self.vocab_size = config.vocab_size; # embedding matrix row count
        # Build embedding of vocab size +1 x hidden size. +1 again because of padding.
        self.embedding = torch.nn.Embedding(num_embeddings=self.vocab_size+1,embedding_dim=config.hidden_size,padding_idx=0);
        torch.nn.init.uniform_(self.embedding.weight.data,-0.001,0.001); # initialization of the embedding values
        self.output = torch.nn.Linear(in_features=config.hidden_size,out_features=config.nlabels); # output layer is the size of the labels x hidden size.

      # forward
      def forward(self,input_ids,labels=None):
        embedded = self.embedding(input_ids); # sum up all the embeddings
        embedded_summed = torch.sum(embedded,dim=1); # sum up across word dimension
        projected = torch.tanh(embedded_summed); # non-linearity
        logits = self.output(projected); # apply the outer layer
      

        ## calculates the loss
        if labels is not None:
            # calculates the loss.
            loss = torch.nn.CrossEntropyLoss();
            return (loss(logits,labels),logits);
        else:
            # if no labels, just return the logits
            return (logits,);
  
# config
mlp_config = MLPConfig(vocab_size=len(vectorizer.vocabulary_),hidden_size=25,nlabels=6);

In [14]:
# training

# Set training arguments
trainer_args = transformers.TrainingArguments(
    "mlp_checkpoints", #save checkpoints here
    evaluation_strategy = "steps",
    logging_strategy = "steps",
    eval_steps = 500,
    logging_steps = 500,
    learning_rate = 1e-4, #learning rate of the gradient descent
    max_steps = 20000,
    load_best_model_at_end = True,
    per_device_train_batch_size = 128
)


# evaluating
accuracy = evaluate.load("accuracy");
def compute_accuracy(outputs_and_labels):
    outputs, labels = outputs_and_labels;
    predictions = np.argmax(outputs, axis=-1); #pick the index of the "winning" label
    return accuracy.compute(predictions=predictions, references=labels);

# actual training
mlp = MLP(mlp_config); # Make a the actual model  
early_stopping = transformers.EarlyStoppingCallback(5); # stop training if the eval loss is not getting better.

# params
trainer = transformers.Trainer(
    model = mlp,
    args = trainer_args,
    train_dataset = dset_tokenized["train"],
    eval_dataset = dset_tokenized["test"],
    compute_metrics = compute_accuracy,
    data_collator = collator,
    callbacks = [early_stopping]
)

# FINALLY!
trainer.train();




Step,Training Loss,Validation Loss,Accuracy
500,1.6402,1.527431,0.457
1000,1.4466,1.394457,0.557
1500,1.2628,1.237796,0.6035
2000,1.0601,1.079695,0.679
2500,0.8701,0.936762,0.754
3000,0.7071,0.814198,0.805
3500,0.5743,0.713081,0.8365
4000,0.4693,0.631903,0.8575
4500,0.3875,0.567124,0.8635
5000,0.3238,0.516251,0.869


### 3.2 Hyperparameter optimization

In [28]:
# Used optuna for optimization

def objective(trial):
    # Define the search space for hyperparameters
    learning_rate = trial.suggest_float("learning_rate", 1e-10, 1e-1, log=True)
    batch_size = trial.suggest_categorical("batch_size", [8, 16, 64, 128, 256])
    epochs=trial.suggest_int('num_train_epochs', low = 2,high = 10),

    # params
    trainer_args = transformers.TrainingArguments(
        "mlp_checkpoints",
        evaluation_strategy = "steps",
        logging_strategy = "steps",
        eval_steps = 500,
        logging_steps = 500,
        learning_rate = learning_rate,
        max_steps = 30000,
        load_best_model_at_end = True,
        per_device_train_batch_size = batch_size,
        per_device_eval_batch_size = batch_size,
        num_train_epochs = epochs
    )

    # the model
    mlp = MLP(mlp_config)

    # train a model
    trainer = transformers.Trainer(
        model = mlp,
        args = trainer_args,
        train_dataset = dset_tokenized["train"],
        eval_dataset = dset_tokenized["validation"],
        compute_metrics = compute_accuracy,
        data_collator = collator,
        callbacks = [early_stopping]
    )

    trainer.train()
    eval_results = trainer.evaluate()
    return eval_results["eval_accuracy"] # return the best result.

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=30)


[32m[I 2023-04-17 14:01:23,657][0m A new study created in memory with name: no-name-1908305c-ec0c-4ddf-b696-bb956a232a8e[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.7844,1.780946,0.089
1000,1.7839,1.780847,0.089
1500,1.7846,1.78075,0.089
2000,1.7833,1.780658,0.089
2500,1.783,1.780564,0.089
3000,1.7845,1.780474,0.089
3500,1.7834,1.780386,0.089
4000,1.7838,1.780298,0.089
4500,1.7836,1.780212,0.089
5000,1.7832,1.780129,0.089


[32m[I 2023-04-17 14:05:35,965][0m Trial 0 finished with value: 0.089 and parameters: {'learning_rate': 4.9906320835125987e-08, 'batch_size': 16, 'num_train_epochs': 8}. Best is trial 0 with value: 0.089.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.7717,1.774695,0.275
1000,1.771,1.773991,0.275
1500,1.7702,1.773301,0.275
2000,1.7695,1.772623,0.275
2500,1.7688,1.771959,0.275
3000,1.7681,1.771309,0.275
3500,1.7674,1.77067,0.275
4000,1.7667,1.770044,0.275
4500,1.766,1.769429,0.275
5000,1.7654,1.768826,0.275


[32m[I 2023-04-17 14:09:28,331][0m Trial 1 finished with value: 0.275 and parameters: {'learning_rate': 2.934274792592448e-07, 'batch_size': 64, 'num_train_epochs': 10}. Best is trial 1 with value: 0.275.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.4806,1.263229,0.5695
1000,1.0134,0.796438,0.7945
1500,0.6114,0.511125,0.8595
2000,0.4574,0.380732,0.8845
2500,0.1813,0.344927,0.8865
3000,0.1593,0.337007,0.886
3500,0.1636,0.326885,0.8885
4000,0.1982,0.327794,0.8895
4500,0.077,0.330163,0.89
5000,0.0915,0.332646,0.8885


[32m[I 2023-04-17 14:10:21,189][0m Trial 2 finished with value: 0.8885 and parameters: {'learning_rate': 0.001057766242934867, 'batch_size': 8, 'num_train_epochs': 2}. Best is trial 2 with value: 0.8885.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.6528,1.557569,0.421
1000,1.5155,1.478774,0.5045
1500,1.4078,1.387329,0.562
2000,1.3443,1.282345,0.5915
2500,1.1559,1.165696,0.626
3000,1.0153,1.0502,0.664
3500,0.9078,0.935906,0.73
4000,0.8164,0.829964,0.7755
4500,0.6273,0.733473,0.8105
5000,0.5482,0.646778,0.836


[32m[I 2023-04-17 14:13:02,280][0m Trial 3 finished with value: 0.8965 and parameters: {'learning_rate': 0.00022617875422998606, 'batch_size': 8, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.1725,0.696194,0.782
1000,0.5164,0.444882,0.855
1500,0.4088,0.401314,0.8735
2000,0.4135,0.372928,0.875
2500,0.1107,0.409445,0.8755
3000,0.1261,0.448448,0.8705
3500,0.1476,0.443551,0.871
4000,0.1969,0.442791,0.8665
4500,0.0556,0.434884,0.876


[32m[I 2023-04-17 14:13:43,004][0m Trial 4 finished with value: 0.875 and parameters: {'learning_rate': 0.003631974679516256, 'batch_size': 8, 'num_train_epochs': 10}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.557,1.413534,0.54
1000,1.1315,1.042242,0.704
1500,0.7014,0.737277,0.838
2000,0.4263,0.553999,0.8715
2500,0.277,0.450857,0.885
3000,0.194,0.391892,0.891
3500,0.1441,0.358064,0.894
4000,0.1117,0.338601,0.894
4500,0.0893,0.327522,0.8945
5000,0.0731,0.321682,0.8945


[32m[I 2023-04-17 14:15:14,234][0m Trial 5 finished with value: 0.893 and parameters: {'learning_rate': 0.00020740051647775046, 'batch_size': 128, 'num_train_epochs': 3}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.8371,1.831921,0.0405
1000,1.8371,1.831881,0.0405
1500,1.837,1.831842,0.0405
2000,1.837,1.831804,0.0405
2500,1.837,1.831767,0.0405
3000,1.8369,1.831731,0.0405
3500,1.8369,1.831697,0.0405
4000,1.8369,1.831664,0.0405
4500,1.8368,1.831632,0.0405
5000,1.8368,1.8316,0.0405


[32m[I 2023-04-17 14:20:53,462][0m Trial 6 finished with value: 0.0405 and parameters: {'learning_rate': 8.284534785031437e-09, 'batch_size': 128, 'num_train_epochs': 9}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.7765,1.775763,0.275
1000,1.7722,1.771736,0.275
1500,1.768,1.767808,0.275
2000,1.764,1.76396,0.275
2500,1.76,1.76019,0.275
3000,1.756,1.756495,0.275
3500,1.7521,1.752861,0.275
4000,1.7484,1.749288,0.275
4500,1.7446,1.745784,0.275
5000,1.741,1.742354,0.275


[32m[I 2023-04-17 14:24:55,013][0m Trial 7 finished with value: 0.3435 and parameters: {'learning_rate': 1.4314541109173222e-06, 'batch_size': 64, 'num_train_epochs': 5}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.7946,1.789262,0.352
1000,1.7918,1.789074,0.352
1500,1.7915,1.788888,0.352
2000,1.7941,1.788713,0.352
2500,1.7927,1.788534,0.352
3000,1.7921,1.788364,0.352
3500,1.7908,1.788194,0.352
4000,1.7934,1.788028,0.352
4500,1.7908,1.787864,0.352
5000,1.7928,1.787705,0.352


[32m[I 2023-04-17 14:29:02,936][0m Trial 8 finished with value: 0.352 and parameters: {'learning_rate': 9.04778249508028e-08, 'batch_size': 16, 'num_train_epochs': 10}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.767,1.765709,0.275
1000,1.7567,1.755938,0.275
1500,1.7462,1.745985,0.275
2000,1.7355,1.735923,0.275
2500,1.7248,1.725899,0.275
3000,1.7142,1.716043,0.275
3500,1.7038,1.706395,0.275
4000,1.6937,1.697021,0.275
4500,1.6839,1.687958,0.275
5000,1.6744,1.679255,0.275


[32m[I 2023-04-17 14:33:07,764][0m Trial 9 finished with value: 0.44 and parameters: {'learning_rate': 3.988447184908041e-06, 'batch_size': 64, 'num_train_epochs': 5}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.7918,1.788097,0.352
1000,1.7918,1.788096,0.352
1500,1.7917,1.788095,0.352
2000,1.7918,1.788095,0.352
2500,1.7917,1.788094,0.352
3000,1.7918,1.788093,0.352
3500,1.7916,1.788092,0.352
4000,1.7918,1.788091,0.352
4500,1.7919,1.78809,0.352
5000,1.7917,1.78809,0.352


[32m[I 2023-04-17 14:41:37,670][0m Trial 10 finished with value: 0.352 and parameters: {'learning_rate': 2.513438826748237e-10, 'batch_size': 256, 'num_train_epochs': 7}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.628,1.516447,0.393
1000,1.3811,1.331546,0.5935
1500,1.1109,1.104042,0.7195
2000,0.8309,0.890409,0.7925
2500,0.6032,0.721724,0.8395
3000,0.4406,0.59816,0.8695
3500,0.3294,0.511297,0.8785
4000,0.2535,0.45034,0.8815
4500,0.2006,0.407789,0.886
5000,0.1626,0.377418,0.8905


[32m[I 2023-04-17 14:43:36,490][0m Trial 11 finished with value: 0.8935 and parameters: {'learning_rate': 0.00013442363189269255, 'batch_size': 128, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,0.2351,0.69327,0.854
1000,0.0472,0.89312,0.826
1500,0.0611,1.033267,0.831
2000,0.0204,1.169804,0.838
2500,0.0107,1.318338,0.834
3000,0.0118,1.424035,0.828


[32m[I 2023-04-17 14:44:18,639][0m Trial 12 finished with value: 0.854 and parameters: {'learning_rate': 0.08014739378695601, 'batch_size': 128, 'num_train_epochs': 3}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.7702,1.747414,0.276
1000,1.7142,1.688267,0.347
1500,1.6522,1.636931,0.4045
2000,1.6238,1.606063,0.4145
2500,1.5878,1.583518,0.424
3000,1.5546,1.566627,0.428
3500,1.5539,1.554403,0.45
4000,1.5524,1.543711,0.4545
4500,1.5206,1.533345,0.4705
5000,1.5033,1.523284,0.475


[32m[I 2023-04-17 14:49:00,950][0m Trial 13 finished with value: 0.6225 and parameters: {'learning_rate': 3.247198353678358e-05, 'batch_size': 8, 'num_train_epochs': 2}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.6712,1.579809,0.3935
1000,1.5073,1.488832,0.514
1500,1.3869,1.390052,0.5575
2000,1.2479,1.279172,0.5935
2500,1.103,1.167695,0.6475
3000,0.962,1.06129,0.6965
3500,0.8348,0.963136,0.7445
4000,0.72,0.873983,0.7875
4500,0.6207,0.794798,0.8225
5000,0.5356,0.725159,0.8375


[32m[I 2023-04-17 14:54:22,515][0m Trial 14 finished with value: 0.888 and parameters: {'learning_rate': 5.9438004985057305e-05, 'batch_size': 256, 'num_train_epochs': 4}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.8106,1.784389,0.0645
1000,1.7518,1.729073,0.299
1500,1.6955,1.680213,0.4235
2000,1.648,1.641141,0.468
2500,1.61,1.610448,0.4775
3000,1.5792,1.585591,0.4805
3500,1.5527,1.564152,0.4935
4000,1.5288,1.544739,0.5025
4500,1.5064,1.526637,0.517
5000,1.4849,1.509268,0.5255


[32m[I 2023-04-17 15:00:04,330][0m Trial 15 finished with value: 0.6895 and parameters: {'learning_rate': 1.4325402809134029e-05, 'batch_size': 128, 'num_train_epochs': 4}. Best is trial 3 with value: 0.8965.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.6215,1.530654,0.466
1000,1.4636,1.401087,0.5495
1500,1.305,1.255276,0.5805
2000,1.1942,1.106042,0.629
2500,0.932,0.950882,0.7075
3000,0.766,0.813789,0.7795
3500,0.6487,0.695751,0.819
4000,0.5672,0.602281,0.84
4500,0.3914,0.527942,0.8635
5000,0.3464,0.4705,0.8745


[32m[I 2023-04-17 15:02:04,183][0m Trial 16 finished with value: 0.897 and parameters: {'learning_rate': 0.00030222282060019396, 'batch_size': 8, 'num_train_epochs': 2}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.0314,0.590868,0.8025
1000,0.5217,0.501362,0.8335
1500,0.4459,0.42225,0.86
2000,0.4391,0.385817,0.864
2500,0.1121,0.41424,0.871
3000,0.1323,0.458516,0.86
3500,0.1452,0.454536,0.862
4000,0.1825,0.433976,0.863
4500,0.0572,0.454064,0.8675


[32m[I 2023-04-17 15:02:44,099][0m Trial 17 finished with value: 0.864 and parameters: {'learning_rate': 0.009219158703585834, 'batch_size': 8, 'num_train_epochs': 6}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.567,1.4342,0.517
1000,1.2815,1.131895,0.6265
1500,0.9627,0.842768,0.7565
2000,0.7443,0.619271,0.848
2500,0.3829,0.467317,0.8765
3000,0.2783,0.39367,0.88
3500,0.2365,0.349195,0.8875
4000,0.2448,0.329108,0.8855
4500,0.1252,0.317438,0.8935
5000,0.1277,0.312018,0.891


[32m[I 2023-04-17 15:04:05,369][0m Trial 18 finished with value: 0.889 and parameters: {'learning_rate': 0.0006405462321738785, 'batch_size': 8, 'num_train_epochs': 3}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.8308,1.832173,0.047
1000,1.8209,1.819201,0.074
1500,1.8082,1.805602,0.1105
2000,1.7928,1.792761,0.1745
2500,1.7797,1.779891,0.2135
3000,1.7651,1.766871,0.249
3500,1.7544,1.754849,0.281
4000,1.7469,1.743368,0.305
4500,1.7314,1.731958,0.3175
5000,1.7183,1.720882,0.3285


[32m[I 2023-04-17 15:08:48,453][0m Trial 19 finished with value: 0.4045 and parameters: {'learning_rate': 6.173736507853478e-06, 'batch_size': 8, 'num_train_epochs': 4}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.0346,0.587889,0.8075
1000,0.5228,0.498535,0.8355
1500,0.4493,0.416598,0.8645
2000,0.4333,0.387432,0.8665
2500,0.1119,0.425607,0.869
3000,0.1284,0.462981,0.866
3500,0.1493,0.434842,0.8695
4000,0.1843,0.430073,0.866
4500,0.0537,0.453127,0.872


[32m[I 2023-04-17 15:09:29,496][0m Trial 20 finished with value: 0.8665 and parameters: {'learning_rate': 0.008917991956124449, 'batch_size': 8, 'num_train_epochs': 6}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.6985,1.59581,0.3995
1000,1.5218,1.499563,0.4795
1500,1.4033,1.399878,0.5415
2000,1.2642,1.284112,0.5885
2500,1.1127,1.164215,0.6415
3000,0.9648,1.049388,0.699
3500,0.8296,0.944389,0.7555
4000,0.7106,0.850504,0.801
4500,0.6083,0.768485,0.8265
5000,0.5218,0.697333,0.85


[32m[I 2023-04-17 15:13:01,523][0m Trial 21 finished with value: 0.8915 and parameters: {'learning_rate': 7.276281204930603e-05, 'batch_size': 128, 'num_train_epochs': 2}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.5843,1.482877,0.5
1000,1.3807,1.280882,0.59
1500,1.1427,1.054418,0.686
2000,0.9618,0.844812,0.787
2500,0.6073,0.654383,0.8455
3000,0.4469,0.527091,0.869
3500,0.355,0.440411,0.8825
4000,0.3245,0.389483,0.8835
4500,0.1876,0.358491,0.8915
5000,0.177,0.338405,0.8925


[32m[I 2023-04-17 15:14:31,158][0m Trial 22 finished with value: 0.8955 and parameters: {'learning_rate': 0.00046234681859413685, 'batch_size': 8, 'num_train_epochs': 2}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.5294,1.386838,0.53
1000,1.2164,1.058049,0.687
1500,0.876,0.756416,0.783
2000,0.6635,0.546834,0.8585
2500,0.3219,0.417664,0.881
3000,0.2417,0.362778,0.8855
3500,0.2136,0.329241,0.8905
4000,0.2271,0.315994,0.8885
4500,0.1147,0.309054,0.897
5000,0.1185,0.307153,0.8925


[32m[I 2023-04-17 15:15:49,412][0m Trial 23 finished with value: 0.893 and parameters: {'learning_rate': 0.0006522654638573655, 'batch_size': 8, 'num_train_epochs': 3}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.8161,1.800987,0.137
1000,1.7729,1.753534,0.2835
1500,1.7225,1.70684,0.3375
2000,1.6861,1.670819,0.3635
2500,1.6488,1.641337,0.379
3000,1.6136,1.617902,0.3885
3500,1.6046,1.601738,0.3945
4000,1.599,1.589596,0.4035
4500,1.5735,1.579045,0.409
5000,1.5562,1.57021,0.4055


[32m[I 2023-04-17 15:20:39,398][0m Trial 24 finished with value: 0.553 and parameters: {'learning_rate': 2.100821325528695e-05, 'batch_size': 8, 'num_train_epochs': 2}. Best is trial 16 with value: 0.897.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.6277,1.54177,0.457
1000,1.488,1.439613,0.535
1500,1.3567,1.3227,0.563
2000,1.2688,1.19702,0.599
2500,1.0454,1.060819,0.6425
3000,0.8884,0.932017,0.7265
3500,0.7716,0.810781,0.7795
4000,0.6799,0.706122,0.8225
4500,0.4905,0.617233,0.853
5000,0.4279,0.543167,0.8645


[32m[I 2023-04-17 15:22:59,374][0m Trial 25 finished with value: 0.898 and parameters: {'learning_rate': 0.00027335616184497616, 'batch_size': 8, 'num_train_epochs': 3}. Best is trial 25 with value: 0.898.[0m


Step,Training Loss,Validation Loss,Accuracy
500,0.3947,0.320282,0.891
1000,0.0295,0.373681,0.8905
1500,0.0154,0.427017,0.8845
2000,0.0112,0.483711,0.8755
2500,0.0088,0.529741,0.8795
3000,0.0075,0.579022,0.875


[32m[I 2023-04-17 15:23:53,751][0m Trial 26 finished with value: 0.891 and parameters: {'learning_rate': 0.0023267575021394746, 'batch_size': 256, 'num_train_epochs': 3}. Best is trial 25 with value: 0.898.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.6471,1.531157,0.4395
1000,1.4636,1.406473,0.5585
1500,1.2745,1.250837,0.5985
2000,1.1202,1.089853,0.659
2500,0.8976,0.939447,0.733
3000,0.7627,0.80499,0.7935
3500,0.5875,0.687043,0.8345
4000,0.4924,0.59517,0.8515
4500,0.3768,0.522343,0.863
5000,0.331,0.467501,0.8795


[32m[I 2023-04-17 15:25:51,325][0m Trial 27 finished with value: 0.8985 and parameters: {'learning_rate': 0.000216232716038129, 'batch_size': 16, 'num_train_epochs': 4}. Best is trial 27 with value: 0.8985.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.6926,1.614051,0.4135
1000,1.571,1.54914,0.46
1500,1.503,1.504785,0.4915
2000,1.4692,1.456803,0.5415
2500,1.387,1.404245,0.555
3000,1.3401,1.348633,0.571
3500,1.2547,1.289192,0.585
4000,1.1903,1.231338,0.6045
4500,1.1044,1.171133,0.6175
5000,1.0498,1.113245,0.6505


[32m[I 2023-04-17 15:29:57,872][0m Trial 28 finished with value: 0.888 and parameters: {'learning_rate': 8.80238136501297e-05, 'batch_size': 16, 'num_train_epochs': 5}. Best is trial 27 with value: 0.8985.[0m


Step,Training Loss,Validation Loss,Accuracy
500,1.7598,1.751294,0.275
1000,1.7341,1.724635,0.275
1500,1.7047,1.697113,0.275
2000,1.6793,1.672441,0.276
2500,1.6498,1.649031,0.2925
3000,1.6336,1.629883,0.327
3500,1.6112,1.613315,0.348
4000,1.5956,1.59966,0.375
4500,1.5785,1.588036,0.392
5000,1.5734,1.578531,0.404


[32m[I 2023-04-17 15:34:00,334][0m Trial 29 finished with value: 0.5375 and parameters: {'learning_rate': 1.4428728455953327e-05, 'batch_size': 16, 'num_train_epochs': 4}. Best is trial 27 with value: 0.8985.[0m


### 3.3. Evaluation on test set

In [145]:
# Your code to evaluate the final model on the test set here
eval_results = trainer.evaluate(dset_tokenized["validation"])

print(eval_results)

{'eval_loss': 0.33677756786346436, 'eval_accuracy': 0.8895, 'eval_runtime': 1.293, 'eval_samples_per_second': 1546.794, 'eval_steps_per_second': 193.349, 'epoch': 112.0}


Having fun :)

In [58]:
# creates a dataset with one example
def create_example(texts):
  labels = [];
  for item in texts:
    labels.append(0);
  data = {
      'text': texts,
      'label': labels,
  }
  return datasets.Dataset.from_dict(data);

In [151]:
# Testing random sentences that could be tweeted to see what they give as the result :)
texts = ['wow last night was so cool would recommend to everyone',
         'jesus the concert last night was so bad',
         'cant stand to stay at the school with these annoying people',
         'just got an F on the test',
         'just got an A on the test',
         'wow i never tought this day would come',
         'I just feel so useless every day',
         'wow finally feel like people really like me',
         'Just came from a date and it was so fun. He seems so nice and I can\'t wait to call him again',
         'i have been having a really good time with my friends and i feel like they really do care about me'
         'wait what just happened. omd I can\'t believe my eyes',
         'i was really suprised with the results',
         'tomorrow i have to go meet with my ex boyfriend. I just can\'t stand him anymore hes such an asshole',
         'he just suddenly fell infront of me I couldn\'t believe it',
         'I love him so much that I can\'t wait to marry him'];
labels = ['sadness', 'joy', 'love', 'anger', 'fear', 'suprise'];
example = create_example(texts); #creates a dict with just the example
example_tokenized = example.map(vectorize_item, num_proc=4); # tokenize
for i in range(len(texts)):
  print(texts[i])
  prediction = trainer.predict(example_tokenized).predictions[i]; # predicts the label
  largest = max(prediction); # label with largest value
  labelOfLargest = labels[list(prediction).index(largest)]; # name of label
  print(f"{labelOfLargest} with confidence of: {largest}\n");

Map (num_proc=4):   0%|          | 0/14 [00:00<?, ? examples/s]

wow last night was so cool would recommend to everyone


joy with confidence of: 4.247760772705078

jesus the concert last night was so bad


sadness with confidence of: 2.9032840728759766

cant stand to stay at the school with these annoying people


anger with confidence of: 0.6968567371368408

just got an F on the test


fear with confidence of: 1.23690664768219

just got an A on the test


joy with confidence of: 0.9200850129127502

wow i never tought this day would come


sadness with confidence of: 1.5450146198272705

I just feel so useless every day


sadness with confidence of: 3.573556900024414

wow finally feel like people really like me


joy with confidence of: 1.3926676511764526

Just came from a date and it was so fun. He seems so nice and I can't wait to call him again


joy with confidence of: 1.4374252557754517

i have been having a really good time with my friends and i feel like they really do care about mewait what just happened. omd I can't believe my eyes


joy with confidence of: 2.719771385192871

i was really suprised with the results


joy with confidence of: 1.0176184177398682

tomorrow i have to go meet with my ex boyfriend. I just can't stand him anymore hes such an asshole


fear with confidence of: 1.5093954801559448

he just suddenly fell infront of me I couldn't believe it


sadness with confidence of: 0.9660069346427917

I love him so much that I can't wait to marry him


joy with confidence of: 1.9221845865249634



---

## 4. Results and summary

### 4.1 Corpus insights

(Briefly discuss what you learned about the corpus and its annotation)

The corpus includes Twitter messages in english and they have been annotated with six basic emotions which are anger, fear, joy, love, sadness, and surprise. 

By the paper the tweets have been selected with some hashtags and then annotated. The selected hastags can be seen from the paper.


### 4.2 Results

(Briefly summarize your results)

### 4.3 Relation to state of the art

(Compare your results to the state-of-the-art performance)

---

## 5. Bonus Task (optional)

### 5.1. Annotating out-of-domain documents

(Briefly describe the chosen out-of-domain documents)

(Briefly describe the process of annotation)

### 5.2 Conversion into dataset

In [19]:
# Your code to convert the annotations into a dataset here

### 5.3. Model evaluation on out-of-domain test set

In [20]:
# Your code to evaluate the model on the out-of-domain test set here

### 5.4 Bonus task results

(Present the results of the evaluation on the out-of-domain test set)

### 5.5. Annotated data

In [21]:
# Include your annotated out-of-domain data here