# Emotion Analysis from Arabic Tweets Using AraBERT

**Workflow:**
1. Import Data
2. Load AraBERT model
3. Preprocessing
4. Training and validation
5. Saving the model


In [None]:
!pip install ktrain

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
os.environ["CUDA_VISIBLE_DEVICES"]="0";

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ktrain
from ktrain import text
from sklearn.metrics import ConfusionMatrixDisplay
import seaborn as sns

## Import Data

In [None]:
df_train = pd.read_csv('/content/drive/MyDrive/tweets-dataset/ar-train.csv')
df_test = pd.read_csv('/content/drive/MyDrive/tweets-dataset/ar-test.csv')
df_val = pd.read_csv('/content/drive/MyDrive/tweets-dataset/ar-dev.csv')

In [None]:
df_train.head()

Unnamed: 0.1,Unnamed: 0,Text,Emotion
0,1841,مش لازم ترتبط عشان #تفرح 😍 مش شرط عينك تلمع عش...,joy
1,2717,إيه لا لا لا 😉 😉 الفرح حيبوظ على وحدة ماتت ولا...,joy
2,3260,@just_mram1992 أوووف 😱\n\nأعوذ بالله ايش هذا!!!,fear
3,4082,إن كان الامس قد ازعجك فما ذنب اليوم يراك عابسا,sadness
4,3980,تبيني احبك؟ تحمل حب التملك الي فيني تحمل الغير...,joy


In [None]:
# set hyperparameters
maxlen = 64
batch_size = 16
lr = 2e-5
epochs = 3

## Load Model

In [None]:
MODEL_NAME = 'aubmindlab/bert-base-arabertv01'
t = text.Transformer(MODEL_NAME, maxlen=maxlen)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=576.0, style=ProgressStyle(description_…




## Preprocessing

In [None]:
trn = t.preprocess_train(df_train.Text.values, df_train.Emotion.values)
val = t.preprocess_test(df_val.Text.values, df_val.Emotion.values)
tst = t.preprocess_test(df_test.Text.values, df_test.Emotion.values)

preprocessing train...
language: ar
train sequence lengths:
	mean : 17
	95percentile : 27
	99percentile : 29


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=780034.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2697421.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=112.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=379.0, style=ProgressStyle(description_…




Is Multi-Label? False
preprocessing test...
language: ar
test sequence lengths:
	mean : 16
	95percentile : 26
	99percentile : 28


preprocessing test...
language: ar
test sequence lengths:
	mean : 17
	95percentile : 27
	99percentile : 29


## Train the model

#### Wrap the model in a learner object

In [None]:
model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=batch_size)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=741884184.0, style=ProgressStyle(descri…




#### Train

In [None]:
history = learner.fit_onecycle(lr, epochs)



begin training using onecycle policy with max lr of 2e-05...
Epoch 1/3
Epoch 2/3
Epoch 3/3


## Evaluate

In [None]:
learner.validate(val_data=tst)

              precision    recall  f1-score   support

           0       0.71      0.65      0.68       280
           1       0.73      0.83      0.78       160
           2       0.90      0.93      0.91       280
           3       0.67      0.64      0.66       160

    accuracy                           0.77       880
   macro avg       0.75      0.76      0.76       880
weighted avg       0.77      0.77      0.77       880



array([[183,  39,  15,  43],
       [ 23, 133,   1,   3],
       [ 13,   3, 259,   5],
       [ 39,   6,  12, 103]])

Let's make a prediction

In [None]:
p = ktrain.get_predictor(learner.model, t)

In [None]:
p.predict("إنا لله وإنا إليه راجعون، انتقل جدي إلى جوار ربه")

'sadness'

## Saving the model
To reload the predictor use: ktrain.load_predictor


In [None]:
predictor.save("/content/drive/MyDrive/models/ar-bert-model")