<a href="https://colab.research.google.com/github/Mukilan-Krishnakumar/T5_Transformer/blob/main/T5_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T5_Transformer


# 0. Importing Libraries 

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
! pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

In [4]:
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small', return_dict=True)

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


# 1. Text Summarization

In [5]:
one_piece_sequence = ("The series focuses on Monkey D. Luffy, a young man made of rubber, who, inspired by his childhood idol," 
             "the powerful pirate Red-Haired Shanks, sets off on a journey from the East Blue Sea to find the mythical treasure," 
             "the One Piece, and proclaim himself the King of the Pirates. In an effort to organize his own crew, the Straw Hat Pirates," 
             "Luffy rescues and befriends a pirate hunter and swordsman named Roronoa Zoro, and they head off in search of the " 
             "titular treasure. They are joined in their journey by Nami, a money-obsessed thief and navigator; Usopp, a sniper "
             "and compulsive liar; and Sanji, a perverted but chivalrous cook. They acquire a ship, the Going Merry, and engage in confrontations"  
             "with notorious pirates of the East Blue. As Luffy and his crew set out on their adventures, others join the crew later in the series, "
             "including Tony Tony Chopper, an anthropomorphized reindeer doctor; Nico Robin, an archaeologist and former Baroque Works assassin; "
             "Franky, a cyborg shipwright; Brook, a skeleton musician and swordsman; and Jimbei, a fish-man helmsman and former member of the Seven "
             "Warlords of the Sea. Once the Going Merry is damaged beyond repair, Franky builds the Straw Hat Pirates a new ship, the Thousand Sunny," 
             "Together, they encounter other pirates, bounty hunters, criminal organizations, revolutionaries, secret agents, and soldiers of the" 
             "corrupt World Government, and various other friends and foes, as they sail the seas in pursuit of their dreams.")

In [6]:
inputs = tokenizer.encode("summarize: " + one_piece_sequence,
                          return_tensors='pt',
                          max_length=512,
                          truncation=True)

In [7]:
summarization_ids = model.generate(inputs, max_length=80, min_length=40, length_penalty=5., num_beams=2)

In [8]:
summarization = tokenizer.decode(summarization_ids[0])

In [9]:
summarization

'<pad> the Straw Hat Pirates befriends a pirate hunter and swordsman named Roronoa Zoro. they are joined in their journey by Nami, a money-obsessed thief and navigator. others join the crew later in the series, including Tony Tony Chopper, an anthropomorphized reinde'

# 2. Language Translation

In [10]:
language_sequence = ("You should definitely watch 'One Piece', it is so good, you will love the comic book")

In [11]:
input_ids = tokenizer("translate English to French: "+language_sequence, return_tensors="pt").input_ids 

In [12]:
language_ids = model.generate(input_ids)



In [13]:
language_translation = tokenizer.decode(language_ids[0],skip_special_tokens=True)

In [14]:
language_translation

"Vous devriez regarder 'One Piece', c'est si bon"

# 3. Text Classification: Textual Entailment

In [15]:
entailment_premise = ("I love One Piece.")
entailment_hypothesis = ("My feelings towards One Piece is filled with love")

In [16]:
input_ids = tokenizer("mnli premise: "+entailment_premise+" hypothesis: "+entailment_hypothesis, return_tensors="pt").input_ids 

In [17]:
entailment_ids = model.generate(input_ids)

In [18]:
entailment = tokenizer.decode(entailment_ids[0],skip_special_tokens=True)

In [19]:
entailment

'entailment'

# 4. Linguistic Acceptability 

In [20]:
sentence = ("Luffy is a great pirate.")

In [21]:
input_ids = tokenizer("cola: "+ sentence, return_tensors="pt").input_ids 

In [22]:
sentence_ids = model.generate(input_ids)

In [23]:
sentence = tokenizer.decode(sentence_ids[0],skip_special_tokens=True)

In [24]:
sentence

'acceptable'

# 5. Sentence Similarity

In [25]:
stsb_sentence_1 = ("Luffy was fighting in the war.")
stsb_sentence_2 = ("Luffy's fighting style is comical.")

In [26]:
input_ids = tokenizer("stsb sentence 1: "+stsb_sentence_1+" sentence 2: "+stsb_sentence_2, return_tensors="pt").input_ids 

In [27]:
stsb_ids = model.generate(input_ids)

In [28]:
stsb = tokenizer.decode(stsb_ids[0],skip_special_tokens=True)

In [29]:
stsb

'4.0'