# **How to paraphrase text using transformers in Python**

Chanin Nantasenamat

[Data Professor](http://youtube.com/dataprofessor), http://youtube.com/dataprofessor

**Notes and References:**
- PEGASUS is an acronym for Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models
- [PEGASUS](https://huggingface.co/tuner007/pegasus_paraphrase) model used herein is from *Huggingface's* **transformers** library
- [PEGASUS model from Google Research](https://github.com/google-research/pegasus)
- Read the original paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)
- Check out the book [Transformers for Natural Language Processing](https://amzn.to/39IC6E6)


# **Install library**

In [1]:
# https://huggingface.co/tuner007/pegasus_paraphrase

import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

model_name = 'tuner007/pegasus_paraphrase'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

def get_response(input_text,num_return_sequences):
  batch = tokenizer.prepare_seq2seq_batch([input_text],truncation=True,padding='longest',max_length=60, return_tensors="pt").to(torch_device)
  translated = model.generate(**batch,max_length=60,num_beams=10, num_return_sequences=num_return_sequences, temperature=1.5)
  tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
  return tgt_text

Downloading:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/86.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

---

## **Processing a single sentence**

In [2]:
text = "In this video, I will be showing you how to build a stock price web application in Python using the Streamlit and yfinance library."

In [4]:
get_response(text, 3)

['In this video, I will show you how to use the Streamlit and yfinance libraries to build a stock price web application.',
 'In this video, I will show you how to build a stock price web application in Python using the Streamlit and yfinance libraries.',
 'In this video, I will show you how to build a stock price web application using the Streamlit and yfinance libraries.']

In [None]:
get_response(text, 1)



['In this video, I will show you how to use the Streamlit and yfinance libraries to build a stock price web application.']

## **Processing a paragraph of text**

In [15]:
# Paragraph of text
context = "In this video, I will be showing you how to build a stock price web application in Python using the Streamlit and yfinance library. The app will be able to retrieve company information as well as the stock price data for S and P 500 companies. All of this in less than 50 lines of code."
print(context)

In this video, I will be showing you how to build a stock price web application in Python using the Streamlit and yfinance library. The app will be able to retrieve company information as well as the stock price data for S and P 500 companies. All of this in less than 50 lines of code.


In [13]:
def token_count(text):
    tokens = text.split()
    return len(tokens)

In [14]:
print(token_count(context))
get_response(context,2)

56


['The app will be able to retrieve company information as well as the stock price data for S and P 500 companies in less than 50 lines of code.',
 'The app will be able to retrieve company information as well as the stock price data for S and P 500 companies, all in less than 50 lines of code.']

In [38]:
from transformers import PreTrainedTokenizerFast
from tqdm import tqdm
import random
from transformers import pipeline
summarizer = pipeline("text2text-generation", model="facebook/bart-large-cnn")
print(context)
print('-'*20)
summarizer(context, max_length=100, min_length=50,length_penalty=100,num_beams=2,num_return_sequences=3)

In this video, I will be showing you how to build a stock price web application in Python using the Streamlit and yfinance library. The app will be able to retrieve company information as well as the stock price data for S and P 500 companies. All of this in less than 50 lines of code.
--------------------


In [21]:
text_2='The Administration of Union Territory Daman and Diu has revoked its order that made it compulsory for women to tie rakhis to their male colleagues on the occasion of Rakshabandhan on August 7. The administration was forced to withdraw the decision within 24 hours of issuing the circular after it received flak from employees and was slammed on social media.'

In [32]:
text_2='the interest in anchoring phenomena and phenomena in confined nematic liquid crystals has largely been driven by their potential use in liquid crystal display devices .the twisted nematic liquid crystal cell serves as an example .it consists of a nematic liquid crystal confined between two parallel walls , both providing homogeneous planar anchoring but with mutually perpendicular easy directions . in this casethe orientation of the nematic director is tuned by the application of an external electric or magnetic field .a precise control of the surface alignment extending over large areas is decisive for the functioning of such devices .most studies have focused on nematic liquid crystals in contact with laterally uniform substrates . on the other hand substrate inhomogeneitiesarise rather naturally as a result of surface treatments such as rubbing .thus the nematic texture near the surface is in fact non - uniform .this non - uniformity , however , is smeared out beyond a decay length proportional to the periodicity of the surface pattern .very often the thickness of the non - uniform surface layer is considerably smaller than both the wavelength of visible light and the thickness of the nematic cell , i.e. , the distance between the two confining parallel walls. hence optical properties of the nematic liquid crystal confined between such substrates correspond to those resulting from effective , uniform substrates .more recent developments have demonstrated that surfaces patterned with a large periodicity of some micrometers are of considerable interest from a technological point of view ( see , e.g. , ref .@xcite and references therein ) .'

In [33]:
token_count(text_2)

263

In [46]:
print(text_2)
print('-'*20)
text_test=summarizer(text_2, max_length=400, min_length=300,length_penalty=100,num_beams=2)
print(text_test)

the interest in anchoring phenomena and phenomena in confined nematic liquid crystals has largely been driven by their potential use in liquid crystal display devices .the twisted nematic liquid crystal cell serves as an example .it consists of a nematic liquid crystal confined between two parallel walls , both providing homogeneous planar anchoring but with mutually perpendicular easy directions . in this casethe orientation of the nematic director is tuned by the application of an external electric or magnetic field .a precise control of the surface alignment extending over large areas is decisive for the functioning of such devices .most studies have focused on nematic liquid crystals in contact with laterally uniform substrates . on the other hand substrate inhomogeneitiesarise rather naturally as a result of surface treatments such as rubbing .thus the nematic texture near the surface is in fact non - uniform .this non - uniformity , however , is smeared out beyond a decay length 

In [47]:
print(token_count(text_2))
print('-'*20)
for x in text_test:
    print(token_count(x['generated_text']))

263
--------------------
256


In [None]:
pipeline("text2text-generation", model="facebook/bart-large-cnn")

In [None]:
# Do a for loop to iterate through the list of sentences and paraphrase each sentence in the iteration
paraphrase = []

for i in sentence_list:
  a = get_response(i,1)
  paraphrase.append(a)



In [None]:
# This is the paraphrased text
paraphrase

[['In this video, I will show you how to use the Streamlit and yfinance libraries to build a stock price web application.'],
 ['The stock price data for S and P 500 companies will be retrieved by the app.'],
 ['This is in less than 50 lines of code.']]

In [None]:
paraphrase2 = [' '.join(x) for x in paraphrase]
paraphrase2

['In this video, I will show you how to use the Streamlit and yfinance libraries to build a stock price web application.',
 'The stock price data for S and P 500 companies will be retrieved by the app.',
 'This is in less than 50 lines of code.']

In [None]:
# Combines the above list into a paragraph
paraphrase3 = [' '.join(x for x in paraphrase2) ]
paraphrased_text = str(paraphrase3).strip('[]').strip("'")
paraphrased_text

'In this video, I will show you how to use the Streamlit and yfinance libraries to build a stock price web application. The stock price data for S and P 500 companies will be retrieved by the app. This is in less than 50 lines of code.'

In [None]:
# Comparison of the original (context variable) and the paraphrased version (paraphrase3 variable)

print(context)
print(paraphrased_text)

In this video, I will be showing you how to build a stock price web application in Python using the Streamlit and yfinance library. The app will be able to retrieve company information as well as the stock price data for S and P 500 companies. All of this in less than 50 lines of code.
In this video, I will show you how to use the Streamlit and yfinance libraries to build a stock price web application. The stock price data for S and P 500 companies will be retrieved by the app. This is in less than 50 lines of code.
