# <b><div style='padding:15px;background-color:#850E35;color:white;border-radius:2px;font-size:110%;text-align: center'>Transformers for TensorFlow</div></b>
![](https://1.bp.blogspot.com/-qQryqABhdhA/XcC3lJupTKI/AAAAAAAAAzA/MOYu3P_DFRsmNkpjD9j813_SOugPgoBLACLcBGAsYHQ/s1600/h1.png)

This notebook walks you through how to work with Transformers using TensorFlow.

# <b><div style='padding:15px;background-color:#850E35;color:white;border-radius:2px;font-size:110%;text-align: center'>1. Loading Dataset</div></b>

First, let's install the `datasets` library. 

In [None]:
# Installing the "datasets" library
!pip install -q datasets

And then let's use this library to load the `rotten_tomatoes` dataset, which presumably contains movie reviews or related data. 

In [None]:
# Importing the necessary function to load a dataset
from datasets import load_dataset

# Loading the "rotten_tomatoes" dataset
dataset = load_dataset("rotten_tomatoes")

Let's explore this dataset: 

In [None]:
# Displaying the loaded dataset
dataset

Let's take a look at the first example from the "test" split of the dataset.

In [None]:
# Accessing the first example from the test split of the dataset
dataset["test"][0]

# <b><div style='padding:15px;background-color:#850E35;color:white;border-radius:2px;font-size:110%;text-align: center'>2. Data Preprocessing</div></b>

Let's initialize for the `distilbert-base-uncased` model. 

In [None]:
# Importing the tokenizer for a pre-trained model
from transformers import AutoTokenizer

# Initializing the tokenizer for the "distilbert-base-uncased" model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

For example, let's tokenize the text of the first training example in the dataset using this tokenizer.

In [None]:
# Tokenizing the text of the first training example
tokenizer(dataset["train"][0]["text"])

Let's define a preprocessing function named preprocess_function. This function takes a dictionary of examples as input and tokenizes the "text" field with truncation. 

In [None]:
# Preprocessing function for tokenization
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

After that let's use the map function to apply this preprocessing function to the entire dataset in batches.

In [None]:
# Applying the preprocessing function to the entire dataset in batches
dataset = dataset.map(preprocess_function, batched=True)

Let's create a batch of examples from the dataset.

In [None]:
# Importing the data collator with padding
from transformers import DataCollatorWithPadding

# Initializing the data collator with padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")

# <b><div style='padding:15px;background-color:#850E35;color:white;border-radius:2px;font-size:110%;text-align: center'>3. Model Loading</div></b>

Let's load the `distilbert` model for our sentiment analysis.

In [None]:
# Importing the TensorFlow version of the model for sequence classification
from transformers import TFAutoModelForSequenceClassification

# Initializing a model for sequence classification using "distilbert-base-uncased"
my_model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

Let's prepare training and validation datasets as TensorFlow datasets. 

In [None]:
# Preparing the training dataset as a TensorFlow dataset
tf_train_set = my_model.prepare_tf_dataset(
    dataset["train"],
    shuffle=True,
    batch_size=16,
    collate_fn=data_collator,
)

# Preparing the validation dataset as a TensorFlow dataset
tf_validation_set = my_model.prepare_tf_dataset(
    dataset["validation"],
    shuffle=False,
    batch_size=16,
    collate_fn=data_collator,
)

# <b><div style='padding:15px;background-color:#850E35;color:white;border-radius:2px;font-size:110%;text-align: center'>4. Model Training</div></b>

Let's compile the model using the Adam optimizer with a learning rate of 3e-5.

In [None]:
# Importing the Adam optimizer from TensorFlow
from tensorflow.keras.optimizers import Adam

# Compiling the model with the Adam optimizer and a specified learning rate
my_model.compile(optimizer=Adam(3e-5))  # No loss argument!

Let's train the model using the prepared training and validation datasets for 2 epochs.

In [None]:
# Training the model
my_model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2)

# <b><div style='padding:15px;background-color:#850E35;color:white;border-radius:2px;font-size:110%;text-align: center'>5. Prediction</div></b>

Let's get a sample text for inference.

In [None]:
# Defining a text for inference
text = "I love NLP. It's fun to analyze the NLP tasks with Hugging Face."

Let's preprocess our text for passing our model.

In [None]:
# Tokenizing the text for inference
tokenized_text = tokenizer(text, return_tensors="tf")
tokenized_text 

Next, let's compute the model's logits (raw output scores) for the tokenized text.

In [None]:
# Obtaining model logits for the tokenized text
logits = my_model(**tokenized_text).logits

Lastly, let's print the index of the class with the highest logit score.

In [None]:
# Importing the math module from TensorFlow
from tensorflow import math

# Finding the index of the class with the highest logit score
int(math.argmax(logits, axis=-1)[0])

Thanks for reading. If you enjoyed this notebook, don't forget to upvote ☺️

Let's connect [YouTube](http://youtube.com/tirendazacademy) | [Medium](http://tirendazacademy.medium.com) | [Twitter](http://twitter.com/tirendazacademy) | [Linkedin](https://www.linkedin.com/in/tirendaz-academy) 😎