In [None]:
!pip install ktrain

**Importing required libraries:**

In [None]:
import pandas as pd
import numpy as np
import ktrain
import tensorflow as tf
from ktrain import text

Checking version of TF, so that when later we will reload the model, we can use same version of TF:

In [None]:
tf.version.VERSION

# Preparing our train and test dataset:

Get data loaded from train.txt to df_train dataframe and teast.txt to df_test dataframe.

In [None]:
df_train = pd.read_csv('../input/emotions-dataset-for-nlp/train.txt', header =None, sep =';', names = ['Input','Sentiment'], encoding='utf-8')
df_test = pd.read_csv('../input/emotions-dataset-for-nlp/test.txt', header = None, sep =';', names = ['Input','Sentiment'],encoding='utf-8')

Checking first few rows of our train dataset.

In [None]:
df_train.head()

Checking category wise distribution of our test data.

In [None]:
df_train.Sentiment.value_counts()

In [None]:
X_train = df_train.Input.tolist()
X_test = df_test.Input.tolist()
y_train = df_train.Sentiment.tolist()
y_test = df_test.Sentiment.tolist()

Checking size of our train and test datasets:

In [None]:
print(len(X_train),len(X_test),len(y_train),len(y_test))

Our dataset has below categories/factors:

In [None]:
factors = ['anger', 'fear', 'joy', 'love', 'sadness','surprise']

Encoding our sentiment categories into numeric values:

In [None]:
encoding = { 'anger': 0,
    'fear': 1,
    'joy': 2,
    'love': 3,
    'sadness': 4,
    'surprise': 5
}

In [None]:
y_train = [encoding[key] for key in y_train]
y_test = [encoding[key] for key in y_test]

# Building Model using Transformer

We are using bert-base-uncased model. You can choose any other model. I am selecting maxlen of tokenization as 512 (it's max for BERT).

In [None]:
model_arch ='bert-base-uncased'
MAXLEN = 512
trans = text.Transformer(model_arch, maxlen=MAXLEN, class_names= factors)


Let's preprocess out test and train data set.

In [None]:
train_data = trans.preprocess_train(X_train,y_train)
test_data = trans.preprocess_test(X_test,y_test)

In [None]:
model = trans.get_classifier()

In [None]:
learner = ktrain.get_learner(model, train_data=train_data, val_data=test_data, batch_size=10)

Finding the best learning rate:

In [None]:
learner.lr_find(show_plot=True, max_epochs=10)

In [None]:
learner.fit_onecycle(3e-5, 5)

**Confusion Matrix:**

In [None]:
learner.validate(val_data=test_data, class_names=factors)

**top 5 data points not performing good:**

In [None]:
learner.view_top_losses(n=5, preproc=trans)

In [None]:
X_test[1928]

Above data our model is predicting as sadness but label is mentioned as fear.

# Predict Data:

In [None]:
predictor = ktrain.get_predictor(learner.model, preproc=trans)

In [None]:
inp = 'I am very disappointed with this kind of front camera. Need refund.'

In [None]:
predictor.predict(inp)