<a href="https://colab.research.google.com/github/Vibhuarvind/Intelligent_Model_For_emotion_recogniton_in_text/blob/main/BERT_for_ISEAR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# install ktrain on Google Colab
!pip3 install ktrain

Collecting ktrain
  Downloading ktrain-0.41.3.tar.gz (25.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m25.3/25.3 MB[0m [31m61.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting langdetect (from ktrain)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m77.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting syntok>1.3.3 (from ktrain)
  Downloading syntok-1.4.4-py3-none-any.whl (24 kB)
Collecting tika (from ktrain)
  Downloading tika-2.6.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting keras_bert>=0.86.0 (from ktrain)
  Downloading keras-bert-0.89.0.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting whoosh (from ktrain)
  Downloading Whoosh-2.7.4-py2.py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import pandas as pd
import numpy as np

import ktrain
from ktrain import text

## 1. Import Data

In [None]:
data_train = pd.read_csv('/content/data_test.csv', encoding='utf-8')
data_test = pd.read_csv('/content/data_test.csv', encoding='utf-8')

X_train = data_train.Text.tolist()
X_test = data_test.Text.tolist()

y_train = data_train.Emotion.tolist()
y_test = data_test.Emotion.tolist()

data = pd.concat([data_train, data_test], ignore_index=True)

class_names = ['joy', 'sadness', 'fear', 'anger', 'neutral']

print('size of training set: %s' % (len(data_train['Text'])))
print('size of validation set: %s' % (len(data_test['Text'])))
print(data.Emotion.value_counts())


data.head(10)

size of training set: 3393
size of validation set: 3393
Emotion
joy        1414
anger      1386
fear       1358
sadness    1352
neutral    1276
Name: count, dtype: int64


Unnamed: 0,Emotion,Text
0,sadness,I experienced this emotion when my grandfather...
1,neutral,"when I first moved in , I walked everywhere ...."
2,anger,"` Oh ! "" she bleated , her voice high and rath..."
3,fear,"However , does the right hon. Gentleman recogn..."
4,sadness,My boyfriend didn't turn up after promising th...
5,neutral,It's freezing .
6,sadness,That ’ s not all ! I also had to finish writi...
7,anger,I don't have a warrant .
8,neutral,I guess so .
9,sadness,I was just robbed !


In [None]:
encoding = {
    'joy': 0,
    'sadness': 1,
    'fear': 2,
    'anger': 3,
    'neutral': 4
}

# Integer values for each class
y_train = [encoding[x] for x in y_train]
y_test = [encoding[x] for x in y_test]

## 2. Data preprocessing

* The text must be preprocessed in a specific way for use with BERT. This is accomplished by setting preprocess_mode to ‘bert’. The BERT model and vocabulary will be automatically downloaded

* BERT can handle a maximum length of 512, but let's use less to reduce memory and improve speed.

In [None]:
(x_train,  y_train), (x_test, y_test), preproc = text.texts_from_array(x_train=X_train, y_train=y_train,
                                                                       x_test=X_test, y_test=y_test,
                                                                       class_names=class_names,
                                                                       preprocess_mode='bert',
                                                                       maxlen=350,
                                                                       max_features=35000)

downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en


Is Multi-Label? False
preprocessing test...
language: en


task: text classification


## 2. Training and validation


Loading the pretrained BERT for text classification

In [None]:
model = text.text_classifier('bert', train_data=(x_train, y_train), preproc=preproc)

Is Multi-Label? False
maxlen is 350




done.


Wrap it in a Learner object

In [None]:
learner = ktrain.get_learner(model, train_data=(x_train, y_train),
                             val_data=(x_test, y_test),
                             batch_size=6)

Train the model. More about tuning learning rates [here](https://github.com/amaiya/ktrain/blob/master/tutorial-02-tuning-learning-rates.ipynb)

In [None]:
learner.fit_onecycle(2e-5, 3)



begin training using onecycle policy with max lr of 2e-05...
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.src.callbacks.History at 0x7b51d04f6800>

Validation

In [None]:
learner.validate(val_data=(x_test, y_test), class_names=class_names)

              precision    recall  f1-score   support

         joy       0.98      0.98      0.98       707
     sadness       0.99      0.98      0.98       676
        fear       0.99      0.99      0.99       679
       anger       0.98      0.98      0.98       693
     neutral       0.96      0.98      0.97       638

    accuracy                           0.98      3393
   macro avg       0.98      0.98      0.98      3393
weighted avg       0.98      0.98      0.98      3393



array([[693,   0,   0,   1,  13],
       [  0, 662,   0,   6,   8],
       [  3,   2, 671,   2,   1],
       [  6,   4,   2, 676,   5],
       [  5,   1,   2,   2, 628]])

#### Testing with other inputs

In [None]:
predictor = ktrain.get_predictor(learner.model, preproc)
predictor.get_classes()

['joy', 'sadness', 'fear', 'anger', 'neutral']

In [None]:
import time

message = 'I just broke up with my boyfriend'

start_time = time.time()
prediction = predictor.predict(message)

print('predicted: {} ({:.2f})'.format(prediction, (time.time() - start_time)))

predicted: sadness (0.17)


## 4. Saving Bert model


In [None]:
# let's save the predictor for later use
predictor.save("models/bert_model")

  saving_api.save_model(


Done! to reload the predictor use: ktrain.load_predictor

In [None]:
# Define a mapping dictionary for emotion labels
emotion_labels = {0: 'joy', 1: 'sadness', 2: 'fear', 3: 'anger', 4: 'neutral'}

# Load the saved model and preprocessors
predictor = ktrain.load_predictor('/content/models/bert_model')

# Test the model on test data
test_data = ["The laughter echoed through the halls, masking the hidden sorrow within.",
"Amidst the tears, a faint smile whispered tales of resilience.",
"The simmering rage beneath the calm surface threatened to erupt at any moment.",
"The clock ticked steadily, oblivious to the chaos unfolding around it.",
             ]  # List of example sentences for testing



predictions = predictor.predict(test_data, return_proba=True)

# Print the predictions with emotion labels, indices, and confidence scores
for text, pred in zip(test_data, predictions):
    emotion_idx = pred.argmax()  # Get the index of the highest confidence score
    confidence = pred.max()  # Get the highest confidence score
    emotion_label = emotion_labels[emotion_idx]  # Get the corresponding emotion label
    print(f'Text: {text}')
    print(f'Predicted emotion index: {emotion_idx}')
    print(f'Predicted emotion label: {emotion_label}')
    print(f'Confidence score: {confidence}')
    print()




Text: The laughter echoed through the halls, masking the hidden sorrow within.
Predicted emotion index: 1
Predicted emotion label: sadness
Confidence score: 0.9937253594398499

Text: Amidst the tears, a faint smile whispered tales of resilience.
Predicted emotion index: 0
Predicted emotion label: joy
Confidence score: 0.6290310025215149

Text: The simmering rage beneath the calm surface threatened to erupt at any moment.
Predicted emotion index: 3
Predicted emotion label: anger
Confidence score: 0.9122545123100281

Text: The clock ticked steadily, oblivious to the chaos unfolding around it.
Predicted emotion index: 3
Predicted emotion label: anger
Confidence score: 0.4058104157447815

