<a href="https://colab.research.google.com/github/NLPiation/tutorial_notebooks/blob/main/paraphrasing/hf_T5_paraphrasing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A sample code to show how Diverse Beam Search can improve the paraphrasing quality.

The code is the supplementary material to the story published in NLPiation medium blog. Follow [the link](https://pub.towardsai.net/how-to-do-effective-paraphrasing-using-huggingface-and-diverse-beam-search-t5-pegasus-229ca998d229) for a detailed explanation of the diverse beam search and following code.

# Download, and Load the Libraries

Start by installing the Transformers library (by Huggingface) and then import the modules.

In [1]:
!pip install -q transformers
!pip install -q datasets
!pip install -q sentencepiece

# Load the Architecture and Weights

In [2]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

In [3]:
model = T5ForConditionalGeneration.from_pretrained('prithivida/parrot_paraphraser_on_T5')
tokenizer = T5Tokenizer.from_pretrained('prithivida/parrot_paraphraser_on_T5')

Downloading spiece.model:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/1.74k [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/1.84k [00:00<?, ?B/s]

# Tokenizing the Iinput Sequence

In [4]:
batch = tokenizer("Natural Language Processing can improve the quality life.", return_tensors='pt')

# Paraphrasing using Beam Search

In [15]:
generated_ids = model.generate( batch['input_ids'],
                                num_beams=5,
                                temperature=1.5,
                                no_repeat_ngram_size=2,
                                early_stopping=True,
                                length_penalty=2.0)

In [16]:
generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

In [17]:
print( generated_sentence )

['Natural language processing can improve the quality of life.']

# Paraphrasing using Diverse Beam Search

In [9]:
generated_ids = model.generate( batch['input_ids'],
                                num_beams=5,
                                num_return_sequences=5,
                                temperature=1.5,
                                num_beam_groups=5,
                                diversity_penalty=2.0,
                                no_repeat_ngram_size=2,
                                early_stopping=True,
                                length_penalty=2.0)

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect. "


In [10]:
generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

In [11]:
print( generated_sentence )

['Natural language processing can improve the quality of life.',
 'Natural Language Processing is a tool that improves quality of life.',
 'Natural Language Processing can improve quality of life.',
 'Nature can improve the quality of life.',
 'Natural language processing improves life.']