# Learn basics in NLP with TensorFlow 

I'm gonna follow this github tutorial.

https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/08_introduction_to_nlp_in_tensorflow.ipynb

Get dataset from kaggle.

In [1]:
import pandas as pd
import numpy as np

In [2]:
train_data = pd.read_csv('./dataaset/train.csv')

In [3]:
train_data.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


Split data into train and test

In [4]:
from sklearn.model_selection import train_test_split

train_sentences, val_sentences, train_lables, val_lables = train_test_split(
    train_data["text"].to_numpy(),
    train_data["target"].to_numpy(),
    test_size=0.1
    )

In [5]:
train_sentences

array(['#TweetLikeItsSeptember11th2001 Those two buildings are on fire',
       'ouvindo Peace Love &amp; Armageddon',
       '#Thorium Radioactive Weapons. Scandals murders and environmental devastation: - VIDEO http://t.co/mly7sDN6eV',
       ...,
       "The greatest female beat boxer ever now but it's w/e... Save babies outta burning buildings on my free time but ya know.. whatevs..",
       'Arsonist Sets NYC Vegetarian Restaurant on Fire: Police #NewYork - http://t.co/Nr7usT3uh8',
       "Timestack' Photos Collapse Entire Sunsets Into Single Mesmerizing Images. http://t.co/Cas8xC2DFE"],
      dtype=object)

# Converting text into numbers

Create words to vector function.

In [6]:
from tensorflow.keras.layers import TextVectorization

In [7]:
text2vec = TextVectorization(
    max_tokens=10000, standardize='lower_and_strip_punctuation',
    split='whitespace', ngrams=None, output_mode='int',
    output_sequence_length=None, pad_to_max_tokens=False, vocabulary=None,
    idf_weights=None, sparse=False, ragged=False
)

2021-12-20 16:57:13.578758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-20 16:57:13.609312: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-12-20 16:57:13.609331: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-12-20 16:57:13.609754: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN

In [8]:
text2vec.adapt(train_sentences)

See how the words 

In [9]:
sample_sentence = "There is a flood in my street!"
text2vec([sample_sentence])

<tf.Tensor: shape=(1, 7), dtype=int64, numpy=array([[ 71,   9,   3, 205,   4,  13, 703]])>

Get first words

In [10]:
text2vec.get_vocabulary()[:5]

['', '[UNK]', 'the', 'a', 'in']

Get the words from 100 to 105th.

In [11]:
text2vec.get_vocabulary()[100:105]

['first', 'day', 'youtube', 'rt', 'off']

# Creating Embedding layer

We are going to use TnsorFlow's embedding layers.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding

In [12]:
from tensorflow.keras import layers

embedding = layers.Embedding(input_dim = 10000, # set imput shape
                             output_dim = 128, # output shape
                             input_length = 10000 # how long is each input 
                            )

embedding

<keras.layers.embeddings.Embedding at 0x7f9e4c569070>

Get a random sentence from the training set

In [13]:
import random
random_sentence = random.choice(train_sentences)

print(f"Original text:\n {random_sentence}\
        \n\nEmbedded version:")

# Embed the random sentence (turn it into dense vectors of fixed size)
sample_embed = embedding(text2vec([random_sentence]))
sample_embed


Original text:
 Ted Cruz Bashes Obama Comparison GOP To Iranians Shouting 'Death To America' http://t.co/cuFGVupKzi        

Embedded version:


<tf.Tensor: shape=(1, 13, 128), dtype=float32, numpy=
array([[[-0.04589216, -0.00582256,  0.0495182 , ...,  0.01232766,
          0.04608992, -0.00596225],
        [-0.03095578, -0.02672704,  0.0455375 , ...,  0.0242402 ,
         -0.02012538,  0.01928905],
        [-0.02260274,  0.01802236,  0.04129959, ...,  0.03398833,
         -0.02228462,  0.0096962 ],
        ...,
        [ 0.04314781,  0.00130932, -0.04351991, ..., -0.04134014,
          0.00373631,  0.01372821],
        [-0.02758521,  0.03660771,  0.00398973, ..., -0.01729576,
          0.03329347,  0.04871484],
        [ 0.01186009,  0.00745732, -0.04007559, ..., -0.02745699,
          0.01358886, -0.02192879]]], dtype=float32)>

In [14]:
sample_embed[0][0], sample_embed[0][0].shape, random_sentence

(<tf.Tensor: shape=(128,), dtype=float32, numpy=
 array([-0.04589216, -0.00582256,  0.0495182 , -0.03466858,  0.0086895 ,
         0.01952423, -0.03805994,  0.03828658,  0.00859358, -0.01071567,
         0.01370361, -0.00163607, -0.03469817, -0.01308414, -0.03475895,
         0.03837911, -0.01894958,  0.02977941,  0.00869714,  0.03028344,
        -0.04239063,  0.00973883,  0.03364274, -0.00975827,  0.01012588,
        -0.00565052,  0.00670647, -0.01210244,  0.03268604,  0.01059494,
        -0.01873146, -0.03173634,  0.01254826,  0.00018758,  0.03714723,
         0.02555737,  0.0196383 ,  0.03538947,  0.03046412, -0.0056526 ,
        -0.02763193,  0.02000533, -0.0138407 , -0.01639475,  0.03793392,
        -0.0441292 ,  0.00782797,  0.00688474,  0.04415778,  0.04546814,
         0.02921785,  0.03169814, -0.04615304, -0.00032622, -0.00660159,
        -0.0237491 , -0.04002531, -0.01122   , -0.01492711,  0.04986307,
         0.02069842,  0.03790443,  0.00544722,  0.03572123,  0.01364151,
  

# Modelling a text dataset with running a series of experiment

There are some Model to learn text:

0, Naive Bayes with TF-IDF encoder (baseline)

1, Feed-forward neural network (dence model)

2, LSTM (RNN)

3, GRU (RNN)

4, Bidirectional-LSTM (RNN)

5, 1D Convolutional Neural Network

6, TensorFlow Hub Pretrained Feature Extractor

7, TensorFlow Hub Pretrained Feature Extractor (10% of data)

How are we going to approach all of these?

Use the standard steps in modeling with tensorflow:

* Create a model
* Build a model
* Fit a model
* Evaluate our model

## Model 1 : Simple Dence layer

Create simple dence layer prediction