## NLP Workshop-3 : Tensorboard Evaluation


Authored by [@abhilash1910](https://www.kaggle.com/abhilash1910)

### Movie Reviews !!

The third part of the Notebook series consists of undertanding the creation of tensorflow graphs and optimizations using Tensorboard while training any deep network including transformer variants. While this notebook is a supplementary notebook for [Notebook-1](https://www.kaggle.com/colearninglounge/nlp-end-to-end-cll-nlp-workshop) and [Npotebook-2](https://www.kaggle.com/colearninglounge/nlp-end-to-end-cll-nlp-workshop-2), we will be looking into how to extend tensorboard with our models.

[Tensorboard](https://www.tensorflow.org/tensorboard) provides the visualization and tooling needed for machine learning experimentation:


- Tracking and visualizing metrics such as loss and accuracy
- Visualizing the model graph (ops and layers)
- Viewing histograms of weights, biases, or other tensors as they change over time
- Projecting embeddings to a lower dimensional space
- Displaying images, text, and audio data
- Profiling TensorFlow programs


<img src="https://www.tensorflow.org/tensorboard/images/tensorboard.gif">

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Loading the Dataset and Minor modifications

Since our main aim is to visualize training in Tensorboard, we can refer to the previous notebooks for an efficient cleaning and preprocessing/model building process. Here we use the raw data as is without any modifications.

In [None]:
%%time

#Load the set
from sklearn.preprocessing import LabelEncoder

train_df=pd.read_csv('../input/imdb-dataset-of-50k-movie-reviews/IMDB Dataset.csv')
train_df.head()
#Convert the labels into integers (numerics) for reference.

train_li=[]
for i in range(len(train_df)):
    if (train_df['sentiment'][i]=='positive'):
        train_li.append(1)
    else:
        train_li.append(0)
train_df['Binary']=train_li
train_df.head()
#Label Encode the labels
label_y= LabelEncoder()
labels=label_y.fit_transform(train_df['sentiment'])
labels

## Loading the Tensorboard 

This can be loaded using the below magic commands through Kaggle.

Some resource:

- [Kaggle Docs](https://www.kaggle.com/aagundez/using-tensorboard-in-kaggle-kernels)

# Load Tensorboard with Kaggle 

%reload_ext tensorboard
%tensorboard --logdir logs

In [None]:
%reload_ext tensorboard
%tensorboard --logdir logs

## Tokenizing the Data and Building the Model

This is a simple deep learning model which we created in our previous [Notebook](https://www.kaggle.com/colearninglounge/nlp-end-to-end-cll-nlp-workshop-2#Creating-the-Model-architecture) , and we will be using the same model for showcasing how to activate the Tensorboard. Using this model, we will just add 2 lines :

```python
tensorboard_callback = tf.keras.callbacks.TensorBoard("logs")
```

and inside model.fit() 

```python
callbacks=[tensorboard_callback]
```


Note: TensorBoard requires a running kernel, so its output will only be available in an editor session.

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.layers import LSTM, Dense,Flatten,Conv2D,Conv1D,GlobalMaxPooling1D,GlobalMaxPool1D
from keras.optimizers import Adam
import numpy as np  
import pandas as pd 
import keras.backend as k
from sklearn.model_selection import train_test_split
from keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Input, LSTM, Embedding, Dense, Concatenate, TimeDistributed, Bidirectional,GRU
from tensorflow.keras.models import Model,Sequential
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import OneHotEncoder
from keras.utils import to_categorical
from keras.utils.vis_utils import plot_model
maxlen=1000
max_features=5000 
embed_size=300

#clean some null words or use the previously cleaned & lemmatized corpus

train_y=labels
train_x,test_x,train_y,test_y=train_test_split(train_df['review'],train_y,test_size=0.2,random_state=42)

val_x=test_x
#Tokenizing steps- must be remembered
tokenizer=Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(list(train_x))
train_x=tokenizer.texts_to_sequences(train_x)
val_x=tokenizer.texts_to_sequences(val_x)

#Pad the sequence- To allow same length for all vectorized words
train_x=pad_sequences(train_x,maxlen=maxlen)
val_x=pad_sequences(val_x,maxlen=maxlen)
val_y=test_y
print("Padded and Tokenized Training Sequence".format(),train_x.shape)
print("Target Values Shape".format(),train_y.shape)
print("Padded and Tokenized Training Sequence".format(),val_x.shape)
print("Target Values Shape".format(),val_y.shape)
model=Sequential()
model.add(Embedding(max_features,embed_size,input_length=maxlen))
model.add(LSTM(60))
model.add(Dense(16,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
tensorboard_callback = tf.keras.callbacks.TensorBoard("logs")

plot_model(
    model,
    to_file="simple_model.png",
    show_shapes=True,
    show_layer_names=True,
    rankdir="TB",
    expand_nested=False,
    dpi=96,
)
model.fit(train_x,train_y,batch_size=512,epochs=1,verbose=2,callbacks=[tensorboard_callback])

## Sample Training on Port 6006

<img src="https://github.com/abhilash1910/MiniAttention/blob/master/Tensorboard-training.PNG?raw=true">


## Issue with Tensorboard in Kaggle

In this case, there are some issues with running Tensorboard in Kaggle. The issue is 'This site can’t be reached
kkb-production.jupyter-proxy.kaggle.net took too long to respond.' This may be an error/timeout from KAggle as Tensorboard takes a lot of time to load up. 

Alternatives

- Running Tensorboard through [Colab/Jupyter notebook](https://www.tensorflow.org/tensorboard/tensorboard_in_notebooks)
- Running Tensorboard through [cmd](https://stackoverflow.com/questions/44175037/cant-open-tensorboard-0-0-0-06006-or-localhost6006)

Inside the Tensorboard there are tensorgraphs being computed which appears as follows:


<img src ="https://cs230.stanford.edu/doks-theme/assets/images/section/5/tfgraph.png">

## Tensorboard Graph


Tensorboard Graph shows all the important layers in the model as well as the parameters for each layer. It also shows how the backend tf.graph gets computed. We can select each individual cells of the diagram to get an idea of that particular block of the graph.

<img src="https://i.imgur.com/CkL2XbH.png">

## Upload through Tensorboard Dev

Details of training logs can be uploaded to Tensorboard Dev, through Command /anaconda prompt.
An authentication key will jhave to be porovided for this.

In [None]:
tensorboard dev upload --logdir \logs 

## Conclusion

We come to the end of the Notebook, this notebook will get updated with supplementary materials on optimizing deep learning hyperparameters through tensorboard. Until then ,for more resources on NLP ,follow [@Colearninglounge](https://www.kaggle.com/colearninglounge) and [@abhilash1910](https://www.kaggle.com/abhilash1910)