<a href="https://colab.research.google.com/github/TirendazAcademy/Deep-Learning-with-TensorFlow/blob/main/TextVectorization-TensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to the TextVectorization Layer in TensorFLow

This video walks you through how to use the TextVectorization layer in TensorFlow

In [1]:
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization

In [2]:
# Instanting
text_vectorization = TextVectorization()

In [3]:
data = [
    "Bugün hava çok güzel",
    "Ali, Efe ve Ece çay içecek",
    "Selam söyle"
]

In [4]:
# Creating the vocabulary with the adapt method.
text_vectorization.adapt(data)

In [5]:
# Let's take a look at the vocabulary.
text_vectorization.get_vocabulary()

['',
 '[UNK]',
 'çok',
 'çay',
 've',
 'söyle',
 'selam',
 'içecek',
 'hava',
 'güzel',
 'efe',
 'ece',
 'bugün',
 'ali']

In [6]:
# Data preprocessing with the layer
vectorized_text = text_vectorization(data)
vectorized_text

<tf.Tensor: shape=(3, 6), dtype=int64, numpy=
array([[12,  8,  2,  9,  0,  0],
       [13, 10,  4, 11,  3,  7],
       [ 6,  5,  0,  0,  0,  0]])>

# Using the custom functions TextVectorization

In [7]:
import re
import string

In [8]:
def standardization_fn(string_tensor):
  lowercase=tf.strings.lower(string_tensor)
  return tf.strings.regex_replace(
      lowercase, f"[{re.escape(string.punctuation)}]", ""
  )

In [9]:
def split_fn(string_tensor):
  return tf.strings.split(string_tensor)

In [10]:
text_vectorization = TextVectorization(
    standardize=standardization_fn,
    split = split_fn
)

In [11]:
text_vectorization.adapt(data)

In [12]:
# Testing our layer with a text
text = "bugün ece çok güzel"
text_vectorization(text)

<tf.Tensor: shape=(4,), dtype=int64, numpy=array([12, 11,  2,  9])>

# Using TextVectorization in a model

In [13]:
# Creating a Dataset object
text_dataset = tf.data.Dataset.from_tensor_slices([
    "kedi", "aslan", "yunus"
])

In [14]:
# Creating the TextVectorization layer
vectorize_layer = tf.keras.layers.TextVectorization(
    max_tokens=5000,
    output_sequence_length=4
)

In [15]:
# Creating the vocabulary
vectorize_layer.adapt(text_dataset.batch(64))

In [16]:
vectorize_layer.get_vocabulary()

['', '[UNK]', 'yunus', 'kedi', 'aslan']

In [17]:
# Building the model
model = tf.keras.models.Sequential([
    tf.keras.Input(shape=(1,), dtype=tf.string),
    vectorize_layer
])

In [18]:
# Getting a data for testing
input_data=[["kedi kartal aslan"], ["fok yunus"]]

In [19]:
model.predict(input_data)



array([[3, 1, 4, 0],
       [1, 2, 0, 0]])

Let's connect [YouTube](http://youtube.com/tirendazacademy) | [Medium](http://tirendazacademy.medium.com) | [Twitter](http://twitter.com/tirendazacademy) | [Instagram](https://www.instagram.com/tirendazacademy)