Using BERT and Tensorflow 2.0,write code to classify emails as spam or not spam. BERT will be used to generate sentence encoding for all emails and after that use a simple neural network with one drop out layer and one output layer. 


In [1]:
!pip install tensorflow_text


Collecting tensorflow_text
  Downloading tensorflow_text-2.7.3-cp37-cp37m-manylinux2010_x86_64.whl (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 5.2 MB/s 
Installing collected packages: tensorflow-text
Successfully installed tensorflow-text-2.7.3


In [2]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text
import pandas as pd


In [3]:
df = pd.read_csv("spam.csv")
df.head(5)

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [4]:
# basic analysis
df.groupby('Category').describe()

Unnamed: 0_level_0,Message,Message,Message,Message
Unnamed: 0_level_1,count,unique,top,freq
Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
ham,4825,4516,"Sorry, I'll call later",30
spam,747,641,Please call our customer service representativ...,4


In [5]:
# create spam column
df['spam'] = df['Category'].apply(lambda x: 1 if x == 'spam' else 0)
df.head()

Unnamed: 0,Category,Message,spam
0,ham,"Go until jurong point, crazy.. Available only ...",0
1,ham,Ok lar... Joking wif u oni...,0
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,1
3,ham,U dun say so early hor... U c already then say...,0
4,ham,"Nah I don't think he goes to usf, he lives aro...",0


In [6]:
# train test split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df['Message'], df['spam'], test_size= 0.2, stratify= df['spam'])

In [7]:
y_train.value_counts()

0    3859
1     598
Name: spam, dtype: int64

In [8]:
y_test.value_counts()

0    966
1    149
Name: spam, dtype: int64

In [9]:
# embedding using BERT

bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")

In [10]:
# define simple function that takes a simple sentence a gives an embedded vector
def get_sentence_embedding(sentences):
  preprocessed_text = bert_preprocess(sentences)

  return bert_encoder(preprocessed_text)['pooled_output']


In [11]:
get_sentence_embedding([
    "500$ discount. hurry up", 
    "Bhavin, are you up for a volleybal game tomorrow?"
                        
])

<tf.Tensor: shape=(2, 768), dtype=float32, numpy=
array([[-0.8435166 , -0.5132724 , -0.88845706, ..., -0.7474883 ,
        -0.7531471 ,  0.91964483],
       [-0.8720836 , -0.50544   , -0.9444667 , ..., -0.8584748 ,
        -0.71745366,  0.88082993]], dtype=float32)>

In [12]:
e = get_sentence_embedding([
    "banana", 
    "grapes",
    "mango",
    "jeff bezos",
    "elon musk",
    "bill gates"
                        
])

In [13]:
# use cosine similarity to compare two vectors
from sklearn.metrics.pairwise import cosine_similarity

cosine_similarity([e[0]], [e[1]])

array([[0.99110895]], dtype=float32)

In [14]:
cosine_similarity([e[0]], [e[3]])

array([[0.8470383]], dtype=float32)

In [15]:
cosine_similarity([e[3]], [e[4]])

array([[0.9872036]], dtype=float32)

types of models
1. sequenttial
2. functional

 https://becominghuman.ai/sequential-vs-functional-model-in-keras-20684f766057

In [19]:
# build a functional moddel
text_input = tf.keras.layers.Input(shape= (), dtype= tf.string, name= 'text')

preprocessed_text = bert_preprocess(text_input)
outputs = bert_encoder(preprocessed_text)

# neural netwrok layers
l = tf.keras.layers.Dropout(0.1, name= "dropout")(outputs['pooled_output'])

# Dense layer
l = tf.keras.layers.Dense(1, activation= "sigmoid", name= "output")(l)  # in functional model, pass previous layer

# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs= [text_input], outputs= [l])
model.summary()


Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer (KerasLayer)       {'input_type_ids':   0           ['text[0][0]']                   
                                (None, 128),                                                      
                                 'input_mask': (Non                                               
                                e, 128),                                                          
                                 'input_word_ids':                                                
                                (None, 128)}                                                

In [20]:
model.compile(optimizer= 'adam',
              loss= 'binary_crossentropy',
              metrics= ['accuracy'])

Train the model

In [21]:
model.fit(X_train, y_train, epochs= 5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fc683806510>

In [22]:
model.evaluate(X_test, y_test)



[0.14726804196834564, 0.9506726264953613]

Inference


In [23]:
emails = [
    'Reply to win Â£100 weekly! Where will the 2006 FIFA World Cup be held? Send STOP to 87239 to end service',
    'You are awarded a SiPix Digital Camera! call 09061221061 from landline. Delivery within 28days. T Cs Box177. M221BP. 2yr warranty. 150ppm. 16 . p pÂ£3.99',
    'it to 80488. Your 500 free text messages are valid until 31 December 2005.',
    'Hey Sam, Are you coming for a cricket game tomorrow',
    "Why don't you wait 'til at least wednesday to see if you get your ."
]

model.predict(emails)

array([[0.61495876],
       [0.6812726 ],
       [0.5209247 ],
       [0.05639218],
       [0.01955418]], dtype=float32)

values > .5 spam

exercise: dataset text classification with bert tesorflow

# tf serving

In [24]:
model.save("saved_models/1/")



INFO:tensorflow:Assets written to: saved_models/1/assets


INFO:tensorflow:Assets written to: saved_models/1/assets


In [25]:
model.save("saved_models/2/")



INFO:tensorflow:Assets written to: saved_models/2/assets


INFO:tensorflow:Assets written to: saved_models/2/assets


In [26]:
model.save("saved_models/3/")



INFO:tensorflow:Assets written to: saved_models/3/assets


INFO:tensorflow:Assets written to: saved_models/3/assets


#installing docker

In [43]:
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic


In [32]:
! sudo apt-get remove docker docker-engine docker.io containerd runc


Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package 'docker-engine' is not installed, so not removed
Package 'docker' is not installed, so not removed
Package 'containerd' is not installed, so not removed
Package 'docker.io' is not installed, so not removed
Package 'runc' is not installed, so not removed
0 upgraded, 0 newly installed, 0 to remove and 58 not upgraded.


In [33]:
 !sudo apt-get update
 !sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

0% [Working]            Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B]
0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [Conn0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [Conn0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com] [Connecting to                                                                               Hit:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com (91.189.88.152)                                                                               Err:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 544B7F63BF9E4D5F
0% [Connecting to archive.ubuntu.com (91.189.88.152)] [Connecting to security.u0% [2 InRelease gpgv 3,626 B] [Connecting to archive.ubu

In [34]:
!curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

In [36]:
! echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

In [37]:
!sudo apt-get update
!sudo apt-get install docker-ce docker-ce-cli containerd.io

0% [Working]            Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B]
0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com (91.180% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com (91.189.88.142)                                                                               Hit:2 http://security.ubuntu.com/ubuntu bionic-security InRelease
0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com (91.189.88.142)                                                                               Get:3 https://download.docker.com/linux/ubuntu bionic InRelease [64.4 kB]
Hit:4 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Ign:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Err:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 54

In [38]:
!apt-cache madison docker-ce

 docker-ce | 5:20.10.12~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.11~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.10~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.9~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.8~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.7~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.6~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.5~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:20.10.4~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/st

In [39]:
!sudo apt-get install docker-ce=<18.03.1~ce~3-0~ubuntu> docker-ce-cli=<18.03.1~ce~3-0~ubuntu> containerd.io

/bin/bash: 18.03.1~ce~3-0~ubuntu: No such file or directory


method 2

In [44]:
!sudo apt update

[33m0% [Working][0m            Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B]
[33m0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com (91.18[0m[33m0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com] [Connecting to[0m                                                                               Hit:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Hit:3 https://download.docker.com/linux/ubuntu bionic InRelease
Ign:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Err:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 544B7F63BF9E4D5F
Hit:5 http://security.ubuntu.com/ubuntu bionic-security InRelease
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:7 https://developer.d

In [45]:
!sudo apt install apt-transport-https ca-certificates curl software-properties-common


Reading package lists... Done
Building dependency tree       
Reading state information... Done
ca-certificates is already the newest version (20210119~18.04.2).
curl is already the newest version (7.58.0-2ubuntu3.16).
software-properties-common is already the newest version (0.96.24.32.18).
The following NEW packages will be installed:
  apt-transport-https
0 upgraded, 1 newly installed, 0 to remove and 58 not upgraded.
Need to get 4,348 B of archives.
After this operation, 154 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 apt-transport-https all 1.6.14 [4,348 B]
Fetched 4,348 B in 0s (16.3 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend

In [46]:
!curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -


OK


In [47]:
!sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"


0% [Working]            Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B]
0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [Conn0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com] [Connecting to                                                                               Hit:2 https://download.docker.com/linux/ubuntu bionic InRelease
0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com] [Waiting for h                                                                               Hit:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease
Ign:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Err:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 544B7F63BF9E

In [48]:
!sudo apt update


[33m0% [Working][0m            Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B]
[33m0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [Conn[0m[33m0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com] [Connecting to[0m                                                                               Hit:2 http://security.ubuntu.com/ubuntu bionic-security InRelease
[33m0% [1 InRelease gpgv 3,012 B] [Connecting to archive.ubuntu.com] [Connected to [0m                                                                               Hit:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Hit:4 https://download.docker.com/linux/ubuntu bionic InRelease
Ign:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Err:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease
  The following signatures couldn't be verified because the public key is n

In [49]:
!apt-cache policy docker-ce


docker-ce:
  Installed: 5:20.10.12~3-0~ubuntu-bionic
  Candidate: 5:20.10.12~3-0~ubuntu-bionic
  Version table:
 *** 5:20.10.12~3-0~ubuntu-bionic 500
        500 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
        100 /var/lib/dpkg/status
     5:20.10.11~3-0~ubuntu-bionic 500
        500 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
     5:20.10.10~3-0~ubuntu-bionic 500
        500 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
     5:20.10.9~3-0~ubuntu-bionic 500
        500 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
     5:20.10.8~3-0~ubuntu-bionic 500
        500 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
     5:20.10.7~3-0~ubuntu-bionic 500
        500 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
     5:20.10.6~3-0~ubuntu-bionic 500
        500 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
     5:20.10.5~3-0~

In [50]:
!docker


Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
      --config string      Location of client config files (default "/root/.docker")
  -c, --context string     Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket(s) to connect to
  -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default "/root/.docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default "/root/.docker/cert.pem")
      --tlskey string      Path to TLS key file (default "/root/.docker/key.pem")
      --tlsverify          Use TLS and verify the remote
  -v, --version            Print version information and quit

Management C

In [51]:
!docker pull tensorflow/serving

Using default tag: latest
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?


In [53]:
!sudo mkdir -p /etc/systemd/system/docker.service.d

In [55]:
!sudo apt-get install nano

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  spell
The following NEW packages will be installed:
  nano
0 upgraded, 1 newly installed, 0 to remove and 58 not upgraded.
Need to get 231 kB of archives.
After this operation, 778 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 nano amd64 2.9.3-2 [231 kB]
Fetched 231 kB in 1s (293 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package nano.
(Reading database ... 155839 files and director

In [1]:
!sudo nano /etc/systemd/system/docker.service.d/options.conf

)07[?47h[1;24r[m[4l[m[?1h=[?1h=[?1h=[m[m[H[2J[22;35H[7m[ New File ][m[H[7m  GNU nano 2.9.3  /etc/systemd/system/docker.service.d/options.conf             [1;79H[m[22B[7m^G[m Get Help  [7m^O[m Write Out [7m^W[m Where Is  [7m^K[m Cut Text  [7m^J[m Justify   [7m^C[m Cur Pos[1B[7m^X[m Exit[6C[7m^R[m Read File [7m^\[m Replace   [7m^U[m Uncut Text[7m^T[m To Spell  [7m^_[m Go To Line[2A[19A[m[m

KeyboardInterrupt: ignored

In [52]:
!sudo su
!systemctl start docker
!systemctl enable docker
!systemctl restart docker

bash: cannot set terminal process group (74): Inappropriate ioctl for device
bash: no job control in this shell
[01;34m/content[00m# 
[01;34m/content[00m# 
[01;34m/content[00m# 

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-52-997f0260b82b>", line 1, in <module>
    get_ipython().system('sudo su')
  File "/usr/local/lib/python3.7/dist-packages/google/colab/_shell.py", line 102, in system
    output = _system_commands._system_compat(self, *args, **kwargs)  # pylint:disable=protected-access
  File "/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py", line 447, in _system_compat
    shell.var_expand(cmd, depth=2), clear_streamed_output=False)
  File "/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py", line 199, in _run_command
    return _monitor_process(parent_pty, epoll, p, cmd, update_stdin_widget)
  File "/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py", line 229, in _monitor_process
    result = _poll_process(parent_pty, epo

KeyboardInterrupt: ignored

https://github.com/codebasics/deep-learning-keras-tf-tutorial/tree/master/48_tf_serving

https://www.youtube.com/watch?v=P-5sMcpTE0g&list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV&index=123