# Twitter Emotion Classification

In this project we will identify the primary emotion expressed in a tweet. We will be using SMILE dataset for this task. The project will be tested using 2 methodologies. The first method used for this task is Rule-based method, the second method is ML based Random Forest and the thrid method is using DistilBERT Transformer.

#### Random Forest:

Random Forest is an ensemble ML model that uses multiple decision trees algorithm at the same time during training . For classification task, the ouput class chosen by majority of decision trees will be selected as the final output class for the random forest. We choose this model for testing because Random Forest gives comparatively accurate and faster results than other algorithms as it grows multiple decision trees for giving output. It also works well while using larger datasets and it is capable of maintaining accuracy even if a large proportion of dataset is missing.  

#### DistilBERT Transformer:

BERT (Bi-Directional Encoder Representation Transformer) is a transformer model in NLP which calculates embeddings from the dataset that can be used for achieving various other NLP based tasks. **DistilBERT** is a derivative of BERT that reduces the model size upto 40% eventually giving faster results. Transformers are used in ML to reduce the training time as they enable simultaneous sequence processing through parallelization.



# Installing required libraries


In the below cell, first we will be downloading dataset from an opensource website using wget. Along with that we will install emoji library to extract the meaning of emojis used in the tweets. Along with that we will install nltk library that provides a wide range of suite of programs for symbolic ans statistical NLP for English language. The punkt module in nltk is an unsupervised training model.

In [1]:
!wget -O data.csv "https://figshare.com/ndownloader/files/4988956"
!pip install emoji

import nltk
nltk.download('punkt')

--2023-09-22 22:57:43--  https://figshare.com/ndownloader/files/4988956
Resolving figshare.com (figshare.com)... 52.215.42.80, 54.217.150.101, 2a05:d018:1f4:d000:646e:611b:a755:ea07, ...
Connecting to figshare.com (figshare.com)|52.215.42.80|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4988956/smileannotationsfinal.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIYCQYOYV5JSSROOA/20230922/eu-west-1/s3/aws4_request&X-Amz-Date=20230922T225744Z&X-Amz-Expires=10&X-Amz-SignedHeaders=host&X-Amz-Signature=3c00c03eeea546c4038f146a60f6eb1ec109fe5cd5eacca36a2e19cd08b1298f [following]
--2023-09-22 22:57:44--  https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4988956/smileannotationsfinal.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIYCQYOYV5JSSROOA/20230922/eu-west-1/s3/aws4_request&X-Amz-Date=20230922T225744Z&X-Amz-Expires=10&X-Amz-SignedHeaders=host&X-Amz-Signature=3c00c03eeea546c4038

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## Task 1. Data Cleaning, Preprocessing, and splitting
The `data` environment contains the SMILE dataset loaded into a pandas dataframe object. Our dataset has three columns: id, tweet, and label. The `tweet` column contains the raw scraped tweet and the `label` column contains the annotated emotion category. Each tweet is labelled with one of the following emotion labels:
- 'nocode', 'not-relevant'
- 'happy', 'happy|surprise', 'happy|sad'
- 'angry', 'disgust|angry', 'disgust'
- 'sad', 'sad|disgust', 'sad|disgust|angry'
- 'surprise'

### Task 1a. Label Consolidation
As we can see above the annotated categories are complex. Several tweets express complex emotions like (e.g. 'happy|sad') or multiple emotions (e.g. 'sad|disgust|angry'). The first things we need to do is clean up our dataset by removing complex examples and consolidating others so that we have a clean set of emotions to predict.

For Task 1a., we will write a code which do the following:
1. Drops all rows which have the label "happy|sad", "happy|surprise", 'sad|disgust|angry', and 'sad|angry'.
2. Re-label 'nocode' and 'not-relevant' as 'no-emotion'.
3. Re-label 'disgust|angry' and 'disgust' as 'angry'.
4. Re-label 'sad|disgust' as 'sad'.

In [2]:
import pandas as pd

data = pd.read_csv("https://figshare.com/ndownloader/files/4988956", names=['id', 'tweet', 'label'])

data = data.loc[~data.label.isin(["happy|sad", "happy|surprise", "sad|disgust|angry", "sad|angry"])]

data.loc[(data.label == 'not-relevant'), 'label']='no-emotion'
data.loc[(data.label == 'nocode'), 'label']='no-emotion'
data.loc[(data.label == 'disgust|angry'), 'label']='angry'
data.loc[(data.label == 'disgust'), 'label']='angry'
data.loc[(data.label == 'sad|disgust'), 'label']='sad'
data.loc[(data.label == 'sad'), 'label']='sad'

type(data)

pandas.core.frame.DataFrame

### Task 1b. Tweet Cleaning and Processing
Raw tweets are noisy. Consider the example below:
```
'@tateliverpool #BobandRoberta: I am angry more artists that have a profile are not speaking up #foundationcourses. 😠'
```
The mention @tateliverpool and hashtag #BobandRoberta are extra noise that don't directly help with understanding the emotion of the text. The accompanying emoji can be useful but needs to be decoded to it text form :angry: first.

For this task we will perform the following preprocessing steps:
1. Lower case all text
2. De-emoji the text
3. Remove all hashtags, mentions, and urls
4. Remove all non-alphabet characters except the followng punctuations: period, exclamation mark, and question mark

In [3]:
import emoji
import re

def preprocess_tweet(tweet: str) -> str:
    # Lower case the tweet
    tweet = tweet.lower()

    #remove emoji
    tweet = emoji.demojize(tweet)

    # Remove all hashtags, mentions, and URLs
    tweet = re.sub(r'\#\w+|\@\w+|https?://\S+', '', tweet)

    # Remove all non-alphabet characters except period, exclamation mark, and question mark
    tweet = re.sub(r'[^a-z\s.!?\s+]+', '', tweet)

    # Remove extra blank spaces from the test
    tweet = re.sub('\s+', ' ', tweet).strip()

    return tweet


test_tweet = "'@tateliverpool #BobandRoberta: I am angry more artists that have a profile are not speaking up! #foundationcourses 😠'"
print(preprocess_tweet(test_tweet))

i am angry more artists that have a profile are not speaking up! angryface


In [4]:
# Create new column with cleaned tweets. We will use this for the subsequent tasks
data["cleaned_tweet"] = data["tweet"].apply(preprocess_tweet)

### Task 1c. Generating Evaluation Splits
Finally, we need to split our data into a train, validation, and test set. We will split the data using a 60-20-20 split, where 60% of our data is used for training, 20% for validation, and 20% for testing. As the dataset is heaviliy imbalanced, we will use stratify parameter to ensure that the label distributions across the three splits are roughly equal.

In [5]:
from sklearn.model_selection import train_test_split

# Your code here

train, test = train_test_split(data, test_size=0.4, stratify=data['label'], random_state=2023)
test, val = train_test_split(test, test_size=0.2, stratify=test['label'], random_state=2023)

In [6]:
type(train)

pandas.core.frame.DataFrame

## Task 2: Naive Baseline Using a Rule-based Classifier

Now that we have a dataset, let's work on developing some solutions for emotion classification. We'll start with implementing a simple rule-based classifier which will also serve as our naive baseline. Emotive language (e.g. awesome, feel great, super happy) can be a strong signal as to the overall emotion being by the tweet. For each emotion in our label space (happy, surprised, sad, angry) we will generate a set of words and phrases that are often associated with that emotion. At classification time, the classifier will calculate a score based on the overlap between the words in the tweet and the emotive words and phrases for each of the emotions. The emotion label with the highest overlap will be selected as the prediction and if there is no match the "no-emotion" label will be predicted. We can break the implementation of this rules-based classifier into three steps:
1. Emotive language extraction from train examples
2. Developing a scoring algorithm
3. Building the end-to-end classification flow

### Task 2a. Emotive Language Extraction
For this task we will generate a set of unigrams and bigrams that will be used to predict each of the labels. Using the training data we will need to extract all the unique unigrams and bigrams associated with each label (excluding no-emotion). Then we should ensure that the extracted terms for each emotion label do not appear in the other lists. In the real world, we would then manually curate the generated lists to ensure that associated words were useful and emotive.

In [7]:
from typing import List
from nltk.util import ngrams

# Function to extract unigrams and bigrams from examples
def extract_words(examples: List[str]):
    """
    Given a list of tweets, return back the unigrams and bigrams found
    across all the tweets.
    """
    word_set = set()
    for example in examples:
      # print(example)
      unigrams = list(ngrams(example.split(), n=1))
      bigrams = list(ngrams(example.split(), n=2))
      word_set.update(unigrams)
      word_set.update(bigrams)
    return word_set

# Extract unique unigrams and bigrams for each label
happy_words = extract_words(train[train["label"] == "happy"]["cleaned_tweet"].values.tolist())
surprise_words = extract_words(train[train["label"] == "surprise"]["cleaned_tweet"].values.tolist())
sad_words = extract_words(train[train["label"] == "sad"]["cleaned_tweet"].values.tolist())
angry_words = extract_words(train[train["label"] == "angry"]["cleaned_tweet"].values.tolist())

# print(happy_words)


# Remove any words that appear in more than one list

happy_words = happy_words.difference(surprise_words, sad_words, angry_words)
surprise_words = surprise_words.difference(happy_words, sad_words, angry_words)
sad_words = sad_words.difference(surprise_words, happy_words, angry_words)
angry_words = angry_words.difference(surprise_words, sad_words, happy_words)

### Task 2b. Scoring using set overlaps

Next we will implement to scoring algorithm. Our score will simply be the count of overlapping terms between tweet text and emotive terms.

In [8]:
def score_tweet(tweet: str, emotive_words: List[str]) -> int:
    tweet_words = set(tweet.split())
    emotive_words_set = set(emotive_words)
    score = len(tweet_words & emotive_words_set)
    return score

### 2c. Rule-based classification
Let put together our rules-based classfication system. Given a tweet, `simple_clf` will generate the overlap score
for each of emotion labels and return the emotion label with the highest score. If there is no match amongst the emotions, the classifier will return 'no-emotion'.

In [9]:
def simple_clf(tweet: str) -> str:
    """
    Given a tweet, calculate all the emotion overlap scores.
    Return the emotion label which has the largest score. If
    overlap score is 0, return no-emotion.
    """

    # Your code here
    happy_score = score_tweet(tweet, happy_words)
    surprise_score = score_tweet(tweet, surprise_words)
    sad_score = score_tweet(tweet, sad_words)
    angry_score = score_tweet(tweet, angry_words)

    scores = [('happy', happy_score), ('surprise', surprise_score), ('sad', sad_score), ('angry', angry_score)]
    scores.sort(key=lambda x: x[1], reverse=True)

    if scores[0][1] == 0:
        return 'no-emotion'
    else:
        return scores[0][0]

After finishing the above section, let's evaluate how our model did.

In [10]:
from sklearn.metrics import classification_report

preds = test["cleaned_tweet"].apply(simple_clf)
print(classification_report(test["label"], preds))

              precision    recall  f1-score   support

       angry       0.00      0.00      0.00        23
       happy       0.00      0.00      0.00       364
  no-emotion       0.58      1.00      0.74       571
         sad       0.00      0.00      0.00        11
    surprise       0.00      0.00      0.00        11

    accuracy                           0.58       980
   macro avg       0.12      0.20      0.15       980
weighted avg       0.34      0.58      0.43       980



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## Task 3. Machine learning w/ grammar augmented features

Now that we have a naive baseline, let's build a more sophisticated solution using machine learning. Up to this point, we have only considered the words in the tweet as our primary features. The rules-based approach is a very simple bag-of-words classifier. Can we improve performance if we provide some additional linguistic knowledge?

For Task 3 we will do the following:
- Generate part-of-speech features our tweets
- Train two different machine learning classifiers, one with linguistic features and one without
- Evaluate the trained models on the test set

### Task 3a. Grammar Augmented Feature Generation
For this task, we will be generating part-of-speech tags for each token in our tweet. Additionally we'll lemmatize the text as well. We will directly include the POS information by appending the tag to the lemma of word itself. For example:
```
Raw Tweet: I am very angry with the increased prices.
POS Augmented Tweet: I-PRP be-VBP very-RB angry-JJ with-IN the-DT increase-VBN price-NNS .-.
```

In [11]:
import spacy
from tqdm.notebook import tqdm
nlp = spacy.load("en_core_web_sm")

def generate_pos_features(tweet: str) -> str:
    """
    Given a tweet, return the lemmatized tweet augmented
    with POS tags.
    E.g.:
    Input: "cats are super cool."
    output: "cat-NNS be-VBP super-RB cool-JJ .-."
    """

    doc = nlp(tweet)
    pos_features = []
    for token in doc:
        lemma = token.lemma_
        pos = token.tag_
        pos_features.append(f"{lemma}-{pos}")

    return " ".join(pos_features)

sample_tweet = "I am very angry with the increased prices."
generate_pos_features(sample_tweet)

'I-PRP be-VBP very-RB angry-JJ with-IN the-DT increase-VBN price-NNS .-.'

In [12]:
train["tweet_with_pos"] = train["cleaned_tweet"].apply(generate_pos_features)
test["tweet_with_pos"] = test["cleaned_tweet"].apply(generate_pos_features)

### Task 3b. Model Training
Next we will train two seperate RandomForest Classifier models. For this task you will generate two sets of input features using the `TfidfVectorizer`. We generate Tfidf statistic on the`cleaned_tweet` and the `tweet_with_pos` columns.

Once we've generated features, we will train two different Random Forest classifiers with the generated features and generate the predictions on the test set for each classifier.

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier

# Create TfidfVectorizer instances for cleaned_tweet and tweet_with_pos
cleaned_tweet_vectorizer = TfidfVectorizer()
pos_tweet_vectorizer = TfidfVectorizer()

# Generate features for cleaned_tweet
cleaned_tweet_train_features = cleaned_tweet_vectorizer.fit_transform(train["cleaned_tweet"])
cleaned_tweet_test_features = cleaned_tweet_vectorizer.transform(test["cleaned_tweet"])

# Generate features for tweet_with_pos
pos_tweet_train_features = pos_tweet_vectorizer.fit_transform(train["tweet_with_pos"])
pos_tweet_test_features = pos_tweet_vectorizer.transform(test["tweet_with_pos"])

# Train Random Forest Classifier on cleaned_tweet features
cleaned_tweet_clf = RandomForestClassifier(random_state=42)
cleaned_tweet_clf.fit(cleaned_tweet_train_features, train["label"])

# Train Random Forest Classifier on tweet_with_pos features
pos_tweet_clf = RandomForestClassifier(random_state=42)
pos_tweet_clf.fit(pos_tweet_train_features, train["label"])

# Generate predictions on the test set for each classifier
cleaned_tweet_predictions = cleaned_tweet_clf.predict(cleaned_tweet_test_features)
pos_tweet_predictions = pos_tweet_clf.predict(pos_tweet_test_features)


### Task 3c.
Generating classification reports for both models.

In [14]:
from sklearn.metrics import classification_report

# Classification Report for Tfidf features
print("Classification report for TFIDF features")
# Your code here

print(classification_report(test["label"], cleaned_tweet_predictions))

# Classfication Report for POS features
print("Classification report for TFIDF w/ POS features")
# Your code here

print(classification_report(test["label"], pos_tweet_predictions))

Classification report for TFIDF features
              precision    recall  f1-score   support

       angry       0.33      0.09      0.14        23
       happy       0.81      0.72      0.76       364
  no-emotion       0.79      0.89      0.84       571
         sad       0.00      0.00      0.00        11
    surprise       0.00      0.00      0.00        11

    accuracy                           0.79       980
   macro avg       0.39      0.34      0.35       980
weighted avg       0.77      0.79      0.77       980

Classification report for TFIDF w/ POS features
              precision    recall  f1-score   support

       angry       0.33      0.09      0.14        23
       happy       0.77      0.71      0.74       364
  no-emotion       0.78      0.87      0.82       571
         sad       0.00      0.00      0.00        11
    surprise       0.00      0.00      0.00        11

    accuracy                           0.77       980
   macro avg       0.38      0.33      0.3

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


### Evaluating Results

Looking at the accuracy score we can conclude that both the model performance was very similar with accuracy score for TFIDF features (0.79) being slightly better than accuracy score of TFIDF features with POS Tagging (0.76). Both models have higher precision, recall, and f1 score for no-emotion category. After that, the second best scores generated were for Happy emotion. Angry emotion didn't have good results with sad emotions at 0 results being the worst emotion to detect. Here, general TFIDF model performed slightly better than model with POS tagging but with good quality data TFIDF POS tagging model can outperform general TFIDF model.

## Task 4. Transfer Learning with DistilBERT

For this task we will finetune a pretrained language model (DistilBERT) using the huggingface `transformers` library. For this task we will need to:
- Encode the tweets using the BERT tokenizer
- Create pytorch datasets for for the train, val and test datasets
- Finetune the distilbert model for 5 epochs
- Extract predictions from the model's output logits and convert them into the emotion labels.
- Generate a classification report on the predictions.

In [15]:
!pip install transformers[torch] >> NULL
!pip install accelerate -U >> NULL

In [16]:
from sklearn.preprocessing import LabelEncoder
import torch
from torch.utils.data import Dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import Trainer
from transformers import TrainingArguments
import numpy as np

# Encode the tweets using the BERT tokenizer

# 1. Load Label Encoder
le = LabelEncoder()

# 2. Fit the label encoder to the label in our dataset
le.fit(train["label"])

# 3. Create a new column with encoded labels
train["encoded_label"] = le.transform(train["label"])
val["encoded_label"] = le.transform(val["label"])
test["encoded_label"] = le.transform(test["label"])

# Validate the mapping:
train.groupby(["label", "encoded_label"]).aggregate("count")


# Create pytorch datasets for for the train, val and test datasets

# le.inverse_transform([1,4,6])

train_labels = torch.tensor(train["encoded_label"].tolist())
val_labels = torch.tensor(val["encoded_label"].tolist())
test_labels = torch.tensor(test["encoded_label"].tolist())

# text encoding
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Encoding a sentence
sent = "The quick brown fox jumped over the lazy dog."
print(f"Tokenizer output a dictionary: {tokenizer(sent)}")

# We can also decode ids to vocabulary
print(tokenizer.decode([101, 1996, 4248, 2829, 4419, 5598, 2058, 1996, 13971, 3899, 1012, 102]))


train_encodings = tokenizer(
    train["tweet"].tolist(),
    padding=True,           # pad all inputs to max length
    max_length=24,         # Bert max is 512, we choose 24 for computational efficiency
    return_tensors="pt",    # Return format pytorch tensor
    truncation=True
)

train_encodings.keys()

print(train_encodings)

val_encodings = tokenizer(
    val["tweet"].tolist(),
    padding=True,           # pad all inputs to max length
    max_length=24,         # Bert max is 512, we choose 24 for computational efficiency
    return_tensors="pt",    # Return format pytorch tensor
    truncation=True
)

test_encodings = tokenizer(
    test["tweet"].tolist(),
    padding=True,           # pad all inputs to max length
    max_length=24,         # Bert max is 512, we choose 24 for computational efficiency
    return_tensors="pt",    # Return format pytorch tensor
    truncation=True
)

# Define Custom Class for DistilBert Inputs
class RelationDataset(Dataset):

    def __init__(self, encodings: dict):
        self.encodings = encodings

    def __len__(self) -> int:
        return len(self.encodings["input_ids"])

    def __getitem__(self, idx: int) -> dict:
        e = {k: v[idx] for k,v in self.encodings.items()}
        return e


# Update encodings with labels
train_encodings["labels"] = train_labels
val_encodings["labels"] = val_labels
test_encodings["labels"] = test_labels

# Generate Datasets
train_ds = RelationDataset(train_encodings)
val_ds = RelationDataset(val_encodings)
test_ds = RelationDataset(test_encodings)


print(train_ds[:2])


model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=7)



# Finetune the distilbert model for 5 epochs

# Freeze embeddings
for name, param in model.distilbert.embeddings.named_parameters():
    param.requires_grad = False
    print(name, param.requires_grad)

# Freeze layers 1-4
freeze_layers = [1,2,3,4]
for name, param in model.distilbert.transformer.layer.named_parameters():
    if int(name[0]) in freeze_layers:
        param.requires_grad = False
        print(name, param.requires_grad)



training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=5,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    lr_scheduler_type='cosine',
    per_device_train_batch_size = 32,
    per_device_eval_batch_size = 32,
)

trainer = Trainer(
    model,
    training_args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
)

trainer.train()

# Extract predictions from the model's output logits and convert them into the emotion labels.


preds = trainer.predict(test_ds)
print(preds)


preds = le.inverse_transform(np.argmax(preds.predictions, axis=1))
print(classification_report(test["label"].tolist(), preds))

Tokenizer output a dictionary: {'input_ids': [101, 1996, 4248, 2829, 4419, 5598, 2058, 1996, 13971, 3899, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
[CLS] the quick brown fox jumped over the lazy dog. [SEP]
{'input_ids': tensor([[  101,  2197,  2733,  ...,  4371,  1035,   102],
        [  101,  2156,  2035,  ...,  8299,  1024,   102],
        [  101,  1012,  2129,  ...,  1012,  2522,   102],
        ...,
        [  101,  2202,  1001,  ...,  1010, 11204,   102],
        [  101,  2054,  1037,  ..., 22949,  4939,   102],
        [  101, 14534,  2509,  ...,  1037,  2866,   102]]), 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        ...,
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1]])}
{'input_ids': tensor([[  101,  2197,  2733,  1024,  1001,  5033,  2696,  2100,  1030, 13970,
          7946, 11106,  4313,  1030,  9902,  3669,  6299, 168

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


word_embeddings.weight False
position_embeddings.weight False
LayerNorm.weight False
LayerNorm.bias False
1.attention.q_lin.weight False
1.attention.q_lin.bias False
1.attention.k_lin.weight False
1.attention.k_lin.bias False
1.attention.v_lin.weight False
1.attention.v_lin.bias False
1.attention.out_lin.weight False
1.attention.out_lin.bias False
1.sa_layer_norm.weight False
1.sa_layer_norm.bias False
1.ffn.lin1.weight False
1.ffn.lin1.bias False
1.ffn.lin2.weight False
1.ffn.lin2.bias False
1.output_layer_norm.weight False
1.output_layer_norm.bias False
2.attention.q_lin.weight False
2.attention.q_lin.bias False
2.attention.k_lin.weight False
2.attention.k_lin.bias False
2.attention.v_lin.weight False
2.attention.v_lin.bias False
2.attention.out_lin.weight False
2.attention.out_lin.bias False
2.sa_layer_norm.weight False
2.sa_layer_norm.bias False
2.ffn.lin1.weight False
2.ffn.lin1.bias False
2.ffn.lin2.weight False
2.ffn.lin2.bias False
2.output_layer_norm.weight False
2.output_laye

Epoch,Training Loss,Validation Loss
1,No log,0.736995
2,No log,0.551421
3,No log,0.524863
4,No log,0.504686
5,No log,0.50383


PredictionOutput(predictions=array([[-0.47031346,  3.5176709 ,  1.0031201 , ..., -0.29815102,
        -2.2166636 , -1.7575206 ],
       [-0.91213435,  4.6522493 ,  0.12920898, ..., -0.91044736,
        -2.0345085 , -1.6052629 ],
       [-0.86305004,  4.449284  ,  0.22161375, ..., -0.6610231 ,
        -2.0758698 , -1.6845105 ],
       ...,
       [-0.40037066,  2.4958737 ,  3.6499915 , ..., -0.92709213,
        -2.9040177 , -2.4778655 ],
       [-0.4378421 ,  2.8242404 ,  3.911037  , ..., -1.0357256 ,
        -2.8988397 , -2.6958158 ],
       [-0.8423688 ,  4.2201877 ,  0.25866452, ..., -0.5178934 ,
        -2.0381544 , -1.570565  ]], dtype=float32), label_ids=array([1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1,
       1, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2,
       1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2,
       2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2,
       2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 2, 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## Task 5. Model Recommendation

I would recommend to use DistilBERT model among other as this model has given the highest accuracy score of all ranging to 0.82. The accuracy score obtained after Rules-based model were 0.79 where as accuracy score obtained after machine learning w/POS features were 0.76.

Rule based models are a good choice for small, easy data. They execute really fast. But these models are specific to certain tasks and domain and might not perform better for other purposes. While they did give good accuracy but again it always depends upon datasets used.

Machine Learning models are not restricted to domains and they perform better when incorporated with labelled data by identifying relationships and patterns. If good data is provided and good pre-processing is carried out they can do very well. Also support POS tagging leading to better performace. Although, because they perform better on labelled data these models are not much flexible and pre-processing needs to be of top quality to obtain good results. Overfitting issues are very common in Machine Learning models.

I believe DistilBERT to be the best choice to use especially when doing tasks such as opinion mining or emotion detection. DistilBERT model provides State Of The Art performance and can work well on unsupervised data as well. Although it takes a lot of time eventually making it expensive than other models. Also, to get optimal performance we need to provide huge number of data. It is also more complex than other 2 models. Requires more computational resources than other 2 models but still a good choice. Although for this dataset it only performed well in case of 'no-emotion' and 'happy' data unlike the other two models. But good pre-processing and good data quality can improve the results.