## Bhavesh Bhatt
[My YouTube channel](https://www.youtube.com/BhaveshBhatt8791)

# Install ktrain

In [1]:
# ktrain is a Python library that makes deep learning and AI 
# more accessible and easier to apply
!pip install -q ktrain

[K     |████████████████████████████████| 25.3 MB 1.2 MB/s 
[K     |████████████████████████████████| 22.3 MB 35.3 MB/s 
[K     |████████████████████████████████| 981 kB 47.1 MB/s 
[K     |████████████████████████████████| 263 kB 73.5 MB/s 
[K     |████████████████████████████████| 2.8 MB 37.9 MB/s 
[K     |████████████████████████████████| 1.2 MB 43.6 MB/s 
[K     |████████████████████████████████| 468 kB 10.6 MB/s 
[K     |████████████████████████████████| 895 kB 54.6 MB/s 
[K     |████████████████████████████████| 67 kB 5.8 MB/s 
[K     |████████████████████████████████| 596 kB 58.4 MB/s 
[K     |████████████████████████████████| 3.3 MB 49.7 MB/s 
[?25h  Building wheel for ktrain (setup.py) ... [?25l[?25hdone
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Building wheel for keras-bert (setup.py) ... [?25l[?25hdone
  Building wheel for keras-transformer (setup.py) ... [?25l[?25hdone
  Building wheel for keras-embed-sim (setup.py) ... [?25l[?25hdone

# Module imports

In [2]:
import ktrain 
from ktrain import text
import tensorflow as tf
import time
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

# GPU Check

In [3]:
physical_devices = tf.config.list_physical_devices('GPU') 
tf.config.experimental.set_memory_growth(physical_devices[0], True)

In [4]:
physical_devices

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [5]:
!nvidia-smi

Tue Feb 15 03:18:10 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Loading Data

In [16]:
df = pd.read_csv("drive/MyDrive/Review_Data.csv")

In [17]:
df.head()

Unnamed: 0,Review,Sentiment
0,I tried this product out of curiosity since it...,0
1,I love this oatmeal. I cook up a big batch and...,1
2,My dog was initially skeptical about these thi...,1
3,I had been enduring an agonizing week-long epi...,1
4,good service. love curry. sprinkle it and bas...,1


In [18]:
df["Sentiment"].value_counts()

1    443777
0     82037
Name: Sentiment, dtype: int64

# Sample Data

In [19]:
df_sample = df.sample(frac=0.02, replace=False, random_state=1)

In [20]:
df_sample.shape

(10516, 2)

In [21]:
df_sample["Sentiment"].value_counts()

1    8864
0    1652
Name: Sentiment, dtype: int64

In [22]:
X = df_sample['Review'].tolist()

In [23]:
X[:5]

['My reaction to my first sip was, wow, that is the smoothest coffee I\'ve ever had.  That was a little over a year ago, and I haven\'t been able to drink a "regular" cup of coffee since then. My boyfriend and I have been called coffee snobs because we won\'t drink any other coffee!<br /><br />Because it\'s instant, you can adjust the strength to your taste. We have found that one packet makes two cups of coffee for us.  One packet to one cup of water was way too strong, so we doubled the water and now it\'s perfect for us.  It doesn\'t have any added sweeteners or creamers like some of the others, so you can add whatever you want or nothing at all.  I usually like a little agave and almond milk in mine, but this is actually the only coffee I\'ve ever had that I can drink black and not gag.<br /><br />One nice thing is that it\'s in packets, so it travels well.  I keep one in my purse, take some to work, and we pack some when we travel.  I can always have "my" coffee wherever I go.  Ad

In [24]:
y = df_sample['Sentiment'].tolist()

In [25]:
y[:5]

[1, 0, 1, 0, 1]

# Split data into Train, Test and Validation

In [26]:
x_train, x_val_and_test, y_train, y_val_and_test = train_test_split(X, y, 
                                                                    test_size=0.3)

In [27]:
x_val, x_test, y_val, y_test = train_test_split(x_val_and_test, 
                                                y_val_and_test, 
                                                test_size=0.5)

In [28]:
(x_train,  y_train), (x_test, y_test), preproc = text.texts_from_array(x_train=x_train, 
                                                                       y_train=y_train,
                                                                       x_test=x_test, 
                                                                       y_test=y_test,
                                                                       class_names=['0', '1'],
                                                                       preprocess_mode='bert',
                                                                       ngram_range=1, 
                                                                       maxlen=320)

downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en


Is Multi-Label? False
preprocessing test...
language: en


task: text classification


# Training the model

In [29]:
model = text.text_classifier('bert', 
                             train_data=(x_train, y_train), 
                             preproc=preproc)

Is Multi-Label? False
maxlen is 320
done.


In [None]:
learner = ktrain.get_learner(model, 
                             train_data=(x_train, y_train), 
                             batch_size=8)

In [30]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Input-Token (InputLayer)       [(None, 320)]        0           []                               
                                                                                                  
 Input-Segment (InputLayer)     [(None, 320)]        0           []                               
                                                                                                  
 Embedding-Token (TokenEmbeddin  [(None, 320, 768),  23440896    ['Input-Token[0][0]']            
 g)                              (30522, 768)]                                                    
                                                                                                  
 Embedding-Segment (Embedding)  (None, 320, 768)     1536        ['Input-Segment[0][0]']    

In [31]:
hist = learner.fit_onecycle(1e-5, 1) 



begin training using onecycle policy with max lr of 1e-05...


In [32]:
learner.validate(val_data=(x_test, y_test))

              precision    recall  f1-score   support

           0       0.81      0.82      0.81       219
           1       0.97      0.97      0.97      1359

    accuracy                           0.95      1578
   macro avg       0.89      0.89      0.89      1578
weighted avg       0.95      0.95      0.95      1578



array([[ 179,   40],
       [  43, 1316]])

In [35]:
predictor = ktrain.get_predictor(learner.model, preproc)

In [None]:
data = ['the product was amazing',
        'the product was terrible']

In [36]:
predictor.predict(data)



['1', '0']