# Word Embedding Model

The architecture used for sentiment analysis is "Word Embeddings" whose guide can be viewed at the following link:
>https://www.tensorflow.org/text/guide/word_embeddings

## Creating Class

In [1]:
class Tweet():
    def __init__(self, text, label):
        self.text = text
        self.label = label

class Utils():
    def __init__(self, tweets):
        self.tweets = tweets
        
    def get_text(self):
        return [x.text for x in self.tweets]
    
    def get_label(self):
        return [x.label for x in self.tweets]

## Imports

In [17]:
import tensorflow as tf
import json

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Embedding, GlobalAveragePooling1D
from tensorflow.keras.layers import TextVectorization

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

## Process Data

### Read data from json.file

In [3]:
file_name = '../data/Data_processed.json'

tweets = []
with open(file_name) as f:
    for line in f:
        tweet = json.loads(line)
        tweets.append(Tweet(tweet['Text'], tweet['Target']))
    
# Taking a look at an example of our data
print(tweets[0].text)
print(tweets[0].label)

   awww thats a bummer  you shoulda got david carr of third day to do it d
0


## Creating our Tensorflow model

### Setting Hyper-parameters

In [4]:
BATCH_SIZE = 1024
SEED = 123
DENSE_NODES = 16
OPTIMIZER = 'adam'
METRICS = ['accuracy']
EPOCHS = 5
VOCAB_SIZE = 10000
SEQUENCE_LEN = 50
EMBEDDING_DIM = 16

### Creating text/label datasets

In [18]:
dataset_text = Utils(tweets).get_text()
dataset_labels = Utils(tweets).get_label()

ds_labels = tf.convert_to_tensor(dataset_labels)
ds_text = tf.convert_to_tensor(dataset_text)

#### For creating our TextVectorizer Vocab(Encoder)

In [6]:
p_text = tf.data.Dataset.from_tensors(ds_text)

## Text Vectorization


Use the text vectorization layer to normalize, split, and map strings to integers. Note that the layer uses the custom standardization defined above.Set maximum_sequence length as all samples are not of the same length.

Calling adapt mathod to build vocabulary from training dataset while also transforming our test dataset for future.

In [19]:
vectorize_layer = TextVectorization(standardize='lower_and_strip_punctuation',
                                   max_tokens=VOCAB_SIZE,
                                   split='whitespace',
                                   output_mode='int',
                                   output_sequence_length=SEQUENCE_LEN)

vectorize_layer.adapt(p_text)

2022-02-17 19:23:24.365125: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 316989920 exceeds 10% of free system memory.
2022-02-17 19:23:24.365207: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 475484880 exceeds 10% of free system memory.


# Model

## Create Model

In [20]:
model = Sequential([
    vectorize_layer,
    Embedding(VOCAB_SIZE, EMBEDDING_DIM, name='embedding'),
    GlobalAveragePooling1D(),
    Dense(DENSE_NODES, activation='relu'),
    Dense(1, activation='sigmoid') # We want either 0 or 1 for our sentiment analysis
])

## Compile and train model

In [9]:
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir='logs') # Saving statistics for tensorboard

model.compile(optimizer=OPTIMIZER,
             loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
             metrics=METRICS)

model.fit(x=ds_text,
         y=ds_labels,
         batch_size=BATCH_SIZE,
         epochs=EPOCHS, 
         validation_split=0.1,
         callbacks=[tensorboard_callback])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fbcd02d4d90>

### Predict

In [14]:
def get_sentiment(x):
    if x >= 0.5:
        return "Positive"
    else:
        "Negative"

In [16]:
print("Sentiment for this tweet is:", get_sentiment(model.predict(["Got praised by Hamza, Yay"])))

Sentiment for this tweet is: Positive


## Visualize model on tensorboard

In [11]:
%load_ext tensorboard
%tensorboard --logdir logs

ERROR: Could not find `tensorboard`. Please ensure that your PATH
contains an executable `tensorboard` program, or explicitly specify
the path to a TensorBoard binary by setting the `TENSORBOARD_BINARY`
environment variable.