# Deploying NLP Using Tensorflow JS

Tutorial: [Dicoding](https://www.dicoding.com/academies)

Dataset: [Sentiment Labelled Sentences Dataset](https://www.kaggle.com/marklvl/sentiment-labelled-sentences-data-set) 

## Import modules and dataset

In [1]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
from sklearn.model_selection import train_test_split
import tensorflow as tf
import pandas as pd

In [20]:
df = pd.read_csv('yelp_labelled.txt', names=['sentence', 'label'], sep='\t')

In [21]:
df

Unnamed: 0,sentence,label
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1
...,...,...
995,I think food should have flavor and texture an...,0
996,Appetite instantly gone.,0
997,Overall I was not impressed and would not go b...,0
998,"The whole experience was underwhelming, and I ...",0


## Preprocessing

In [23]:
# Make all word become lowercase
df['sentence'] = df['sentence'].str.lower()

In [25]:
### Removing stopwords

## Buat download file
# import nltk
# nltk.download('stopwords')

# But import file
from nltk.corpus import stopwords

In [26]:
# Mengghilangkan stopwords (Kata yang umum digunakan)
stop = set(stopwords.words('english'))
df['sentence'] = df['sentence'].apply(lambda x:' ' \
                                      .join([word for word in x.split() if word not in (stop)]))

In [27]:
df.head()

Unnamed: 0,sentence,label
0,wow... loved place.,1
1,crust good.,0
2,tasty texture nasty.,0
3,stopped late may bank holiday rick steve recom...,1
4,selection menu great prices.,1


## Tokenization

In [35]:
# Global variables
vocab_size = 2000
oov_tok = "<OOV>"
filters = '!"#$%^&()*+.,-/:;=?@[\]<>}{|_~`' 

In [36]:
# Tokenization
from tensorflow.keras.preprocessing.text import Tokenizer

# Pad Sequences
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok, filters= filters)
tokenizer.fit_on_texts(df['sentence'].values)

word2index = tokenizer.word_index
print(len(word2index))

1998


In [38]:
import json

with open('word2index.json', 'w') as fp:
    json.dump(word2index, fp)

In [39]:
max_length = max(len(values.split()) for i, values in enumerate(df['sentence']))
max_length

18

In [40]:
trunc_type='post'

all_seq = tokenizer.texts_to_sequences(df['sentence'].values)
all_padded = pad_sequences(all_seq, maxlen=max_length, padding=trunc_type)
all_padded.shape

(1000, 18)

## Split data

In [41]:
from sklearn.model_selection import train_test_split

In [44]:
X = all_padded
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(800, 18) (800,)
(200, 18) (200,)


In [74]:
model = tf.keras.Sequential([
            tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=16, input_length=max_length),
            tf.keras.layers.LSTM(64),
#             tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(24, activation='relu'),
#             tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(1, activation='sigmoid')
        ])

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [75]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 18, 16)            32000     
_________________________________________________________________
lstm_2 (LSTM)                (None, 64)                20736     
_________________________________________________________________
dense_4 (Dense)              (None, 24)                1560      
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 25        
Total params: 54,321
Trainable params: 54,321
Non-trainable params: 0
_________________________________________________________________


In [76]:
epoch = 30
history = model.fit(X_train, y_train, epochs=epoch, validation_data=(X_test, y_test))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


## Testing

Convert text into sequences

In [77]:
def to_sequences(sentence):
    pad = []
    for stc in sentence.split():
        if stc.lower() in word2index.keys():
            pad.append(word2index[stc.lower()])
        else:
            continue
    return pad

text = to_sequences("nice idea and well priced")
text
model.predict([text])





array([[0.64647126]], dtype=float32)

Those warnings above, displayed because I input only 5 sequence number and the model ask to fill with 18 numbers

In [78]:
text = [20, 1736, 254, 58, 413, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
len(text)

18

In [79]:
model.predict([text])

array([[0.99994195]], dtype=float32)

The model said it labeled as almost 1 (Positive)

## Saving the model

In [61]:
# !pip install tensorflowjs # Syntax buat download tensorflow.js

In [80]:
saved_path = 'mymodel/'
tf.saved_model.save(model, saved_path)




FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.



FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.


INFO:tensorflow:Assets written to: mymodel/assets


INFO:tensorflow:Assets written to: mymodel/assets


In [81]:
# Convert the model, to make tfjs understand the model
!tensorflowjs_converter \
 --input_format=tf_saved_model \
 mymodel/ \
 tfjsmodel

Writing weight file tfjsmodel\model.json...


2021-08-07 06:34:45.071900: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-08-07 06:34:50.771577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-08-07 06:34:50.799201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:02:00.0 name: GeForce MX150 computeCapability: 6.1
coreClock: 1.5315GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 44.76GiB/s
2021-08-07 06:34:50.799227: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-08-07 06:34:50.810867: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-08-07 06:34:50.810897: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
202

Model is ready for use

## Grab metadata

#JustForNote

In [70]:
from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(num_words = vocab_size, oov_token="<OOV>")
tokenizer.fit_on_texts(df['sentence'].values)

word2index = tokenizer.word_index

In [71]:
import json
with open("word2index.json", "w") as fp:
    json.dump(word2index, fp)