In [4]:
!pip install sentence-splitter --quiet
!pip install transformers --quiet
!pip install SentencePiece --quiet

[K     |████████████████████████████████| 44 kB 3.4 MB/s 
[K     |████████████████████████████████| 4.9 MB 16.1 MB/s 
[K     |████████████████████████████████| 6.6 MB 49.5 MB/s 
[K     |████████████████████████████████| 120 kB 72.6 MB/s 
[K     |████████████████████████████████| 1.3 MB 29.4 MB/s 
[?25h

In [84]:
#importing the PEGASUS Transformer model
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
 
model_name = 'tuner007/pegasus_paraphrase'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
 
#setting up the model
def get_response(input_text,num_return_sequences):
  batch = tokenizer.prepare_seq2seq_batch([input_text],truncation=True,padding='longest',max_length=60, return_tensors="pt").to(torch_device)
  translated = model.generate(**batch,max_length=60,num_beams=20, num_return_sequences=num_return_sequences, temperature=1.5)
  tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
  return tgt_text

In [6]:
#test input sentence
text = "I will be showing you how to build a web application in Python using the SweetViz and its dependent library."

#printing response
get_response(text, 5)

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and targets.

Here is a short example:

model_inputs = tokenizer(src_texts, text_target=tgt_texts, ...)

If you either need to use different keyword arguments for the source and target texts, you should do two calls like
this:

model_inputs = tokenizer(src_texts, ...)
labels = tokenizer(text_target=tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



['I will show you how to use the SweetViz and its dependent library to build a web application.',
 'I will show you how to use the SweetViz library to build a web application.',
 'I will show you how to build a web application using the SweetViz and its dependent library.',
 'I will show you how to use the SweetViz and its dependent library to build a web application in Python.',
 'I will show you how to build a web application in Python using the SweetViz library.']

In [7]:
context = 'The Blue Whales just played their first baseball game of the new season; I believe there is much to be excited about. Although they lost, it was against an excellent team that had won the championship last year. The Blue Whales fell behind early but showed excellent teamwork and came back to tie the game. The team had 15 hits and scored 8 runs. That’s excellent! Unfortunately, they had 5 fielding errors, which kept the other team in the lead the entire game. The game ended with the umpire making a bad call, and if the call had gone the other way, the Blue Whales might have actually won the game. It wasn’t a victory, but I say the Blue Whales look like they have a shot at the championship, especially if they continue to improve.'

In [8]:
print(context)

The Blue Whales just played their first baseball game of the new season; I believe there is much to be excited about. Although they lost, it was against an excellent team that had won the championship last year. The Blue Whales fell behind early but showed excellent teamwork and came back to tie the game. The team had 15 hits and scored 8 runs. That’s excellent! Unfortunately, they had 5 fielding errors, which kept the other team in the lead the entire game. The game ended with the umpire making a bad call, and if the call had gone the other way, the Blue Whales might have actually won the game. It wasn’t a victory, but I say the Blue Whales look like they have a shot at the championship, especially if they continue to improve.


In [9]:
#Takes the input paragraph and splits it into a list of sentences
from sentence_splitter import SentenceSplitter, split_text_into_sentences
 
splitter = SentenceSplitter(language='en')
 
sentence_list = splitter.split(context)
sentence_list

['The Blue Whales just played their first baseball game of the new season; I believe there is much to be excited about.',
 'Although they lost, it was against an excellent team that had won the championship last year.',
 'The Blue Whales fell behind early but showed excellent teamwork and came back to tie the game.',
 'The team had 15 hits and scored 8 runs.',
 'That’s excellent!',
 'Unfortunately, they had 5 fielding errors, which kept the other team in the lead the entire game.',
 'The game ended with the umpire making a bad call, and if the call had gone the other way, the Blue Whales might have actually won the game.',
 'It wasn’t a victory, but I say the Blue Whales look like they have a shot at the championship, especially if they continue to improve.']

In [10]:
paraphrase = []
 
for i in sentence_list:
  a = get_response(i,1)
  paraphrase.append(a)

# Generating the paraphrased text
paraphrase

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and targets.

Here is a short example:

model_inputs = tokenizer(src_texts, text_target=tgt_texts, ...)

If you either need to use different keyword arguments for the source and target texts, you should do two calls like
this:

model_inputs = tokenizer(src_texts, ...)
labels = tokenizer(text_target=tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



[['The first baseball game of the new season was played by the Blue Whales.'],
 ['They lost, but it was against a team that won the title last year.'],
 ['The Blue Whales came back to tie the game after falling behind.'],
 ['The team scored 8 runs.'],
 ['That is excellent!'],
 ['They had 5 fielding errors that kept the other team in the lead.'],
 ['The umpire made a bad call at the end of the game and the Blue Whales could have won the game.'],
 ['I think the Blue Whales have a chance at the championship if they continue to improve.']]

In [11]:
#creating the second split
paraphrase2 = [' '.join(x) for x in paraphrase]
paraphrase2

['The first baseball game of the new season was played by the Blue Whales.',
 'They lost, but it was against a team that won the title last year.',
 'The Blue Whales came back to tie the game after falling behind.',
 'The team scored 8 runs.',
 'That is excellent!',
 'They had 5 fielding errors that kept the other team in the lead.',
 'The umpire made a bad call at the end of the game and the Blue Whales could have won the game.',
 'I think the Blue Whales have a chance at the championship if they continue to improve.']

In [12]:
# Combine the above splitted lists into a paragraph
paraphrase3 = [' '.join(x for x in paraphrase2) ]
paraphrased_text = str(paraphrase3).strip('[]').strip("'")
paraphrased_text

'The first baseball game of the new season was played by the Blue Whales. They lost, but it was against a team that won the title last year. The Blue Whales came back to tie the game after falling behind. The team scored 8 runs. That is excellent! They had 5 fielding errors that kept the other team in the lead. The umpire made a bad call at the end of the game and the Blue Whales could have won the game. I think the Blue Whales have a chance at the championship if they continue to improve.'

In [13]:
# Comparison of the original (context variable) and the paraphrased version (paraphrase3 variable)
 
print(context)
print(paraphrased_text)

The Blue Whales just played their first baseball game of the new season; I believe there is much to be excited about. Although they lost, it was against an excellent team that had won the championship last year. The Blue Whales fell behind early but showed excellent teamwork and came back to tie the game. The team had 15 hits and scored 8 runs. That’s excellent! Unfortunately, they had 5 fielding errors, which kept the other team in the lead the entire game. The game ended with the umpire making a bad call, and if the call had gone the other way, the Blue Whales might have actually won the game. It wasn’t a victory, but I say the Blue Whales look like they have a shot at the championship, especially if they continue to improve.
The first baseball game of the new season was played by the Blue Whales. They lost, but it was against a team that won the title last year. The Blue Whales came back to tie the game after falling behind. The team scored 8 runs. That is excellent! They had 5 fiel

In [14]:
import pandas as pd

In [16]:
all_texts = pd.read_csv('/content/df_all_texts.csv')
all_texts.head()


Unnamed: 0,utterance,response
0,"Well yeah, the relationship... O.K. I have to...","I kinda missed that, you don’t operate like t..."
1,Ah-ha.,I just kind of....
2,Um-um.,"Missed the point Yeah, OK."
3,"You know, I guess I feel I need to clarify wh...",Ah-ha.
4,"A good girl,"" or something I feel like, Oh go...",(short laugh).


In [17]:
all_texts_test = all_texts[:5]
all_texts_test.values.tolist()

[[' Well yeah, the relationship... O.K. I have to clarify that we have only known each other for two weeks, O.K. And its probably premature, but we spent a whole lot of time with each other in the past week and a half and I was basically on cloud nine about the whole thing. And we were talking about going camping together which just sounded like a great idea. And he seemed to really be enjoying me and I was enjoying him and I was kind of wondering gee when is the honeymoon going to end. But... feeling like I was ready to enjoy it as long as it lasted And then he reported to me that he had talked to his mom and dad and they had inquired as to whether he had spent the night at my place or not and he told them that he had slept on the sofa which indeed he did. But then he was feeling like he couldn’t do that any more. And I felt hurt a little bit at that. And I’m not sure . .. I’m not sure, . .. I’m a little angry, where my anger is directed at. I’m a little confused . . as to whether it’

In [18]:
all_texts_test.shape

(5, 2)

In [20]:
#Testing the paraphaser on our own data


from sentence_splitter import SentenceSplitter, split_text_into_sentences
 
splitter = SentenceSplitter(language='en')
 
sentence_list = splitter.split(str(all_texts_test.values))
sentence_list

["[[' Well yeah, the relationship...",
 'O.K. I have to clarify that we have only known each other for two weeks, O.K. And its probably premature, but we spent a whole lot of time with each other in the past week and a half and I was basically on cloud nine about the whole thing.',
 'And we were talking about going camping together which just sounded like a great idea.',
 'And he seemed to really be enjoying me and I was enjoying him and I was kind of wondering gee when is the honeymoon going to end.',
 'But... feeling like I was ready to enjoy it as long as it lasted And then he reported to me that he had talked to his mom and dad and they had inquired as to whether he had spent the night at my place or not and he told them that he had slept on the sofa which indeed he did.',
 'But then he was feeling like he couldn’t do that any more.',
 'And I felt hurt a little bit at that.',
 'And I’m not sure . ..',
 'I’m not sure, . ..',
 'I’m a little angry, where my anger is directed at.',
 'I

In [77]:
sample1 = ['I do not blame you for any of the problems we had when we were trying to navigate our way through my diagnoses. You guys learned all you could in a time before the Internet had the answers and before self-helps books were readily available. You were not bad parents just because you could not fix what was going on in me. You got me help, again, and again, and again and it’s OK it took more than one try to find the right person to help me because along the way I had two people who didn’t give up.']

In [85]:
#paraphrase test of mock clinical questions

paraphrase = []
 
for i in sample1:
  a = get_response(i,20)
  paraphrase.append(a)

# Generating the paraphrased text
paraphrase

[["I don't think you should be blamed for the problems we had when we were trying to navigate our way through my diagnoses.",
  "I don't blame you for what happened when we were trying to navigate our way through my diagnoses.",
  "I don't think you should be blamed for the things we had when we were trying to navigate our way through my diagnoses.",
  "I don't think you should be blamed for any of the problems we had when we were trying to navigate our way through my diagnoses.",
  "I don't think you should be blamed for the issues we had when we were trying to navigate our way through my diagnoses.",
  "I don't think you should be blamed for the problems we had when we were trying to understand my diagnoses.",
  "I don't think you should be blamed for the difficulties we had when we were trying to navigate our way through my diagnoses.",
  "I don't think you should be blamed for the problems we had when we were trying to get through my diagnoses.",
  "I don't think you should be blam