gpt2-paraphraser-comparisons

Finetune GPT-2 Models for paraphrasing and compare them to PEGASUS and BART

Create Datasets

Use the script create_dataset.ipynb to create the dataset in the file combined.txt. Each line contains the following: <s>S</s>>>>><p>P</p>, where S and P are paraphrased sentences. Sentences pairs are gathered from three different datasets available on huggingface.co

TaPaCo (en) https://huggingface.co/datasets/tapaco
Google PAWS https://huggingface.co/datasets/paws
Quora https://huggingface.co/datasets/quora

Finetune GPT-2 Models

Finetuned three different sized GPT 2 models for sentence level paraphrasing using the Trainer() API. Models available on huggingface:

SRM47/gpt2-paraphraser
SRM47/gpt2-medium-paraphraser
SRM47/gpt2-large-paraphraser

Evaluate Models

To evaluate the finetuned GPT-2 models and other models, use the eval_models.ipynb script

Results Analysis

See the paper final.pdf to read about the results of this investigation.

As of recent, large language models, particularly a part of the Generative Pre-Trained series, have demonstrated themselves to be powerful text generation models. Models such as GPT-2 (Radford et al., 2018) reveal that large language models have strong zero-shot capabilities in a variety of downstream natural language pro- cessing tasks. Other models, built for sequence to sequence modeling, such as PEGASUS, and BART have profound text summarization capa- bilities which can be adapted to paraphrasing. In this paper, I present an effective method for adapting GPT-2 for paraphrasing, and compare its paraphrasing outputs to fine tuned BART and PEGASUS based models from huggingface. Results show that GPT-2 based models produce less diverse paraphrases than PEGASUS and BART; GPT-2 based paraphrases do not alter lexical form as much as PEGASUS does.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
diagrams		diagrams
results		results
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
analyse_results.ipynb		analyse_results.ipynb
combined.txt		combined.txt
create_dataset.ipynb		create_dataset.ipynb
create_eval_dataset.ipynb		create_eval_dataset.ipynb
eval.txt		eval.txt
eval_models.ipynb		eval_models.ipynb
final.pdf		final.pdf
finetune_gpt2.ipynb		finetune_gpt2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diagrams

diagrams

results

results

.DS_Store

.DS_Store

LICENSE

LICENSE

README.md

README.md

analyse_results.ipynb

analyse_results.ipynb

combined.txt

combined.txt

create_dataset.ipynb

create_dataset.ipynb

create_eval_dataset.ipynb

create_eval_dataset.ipynb

eval.txt

eval.txt

eval_models.ipynb

eval_models.ipynb

final.pdf

final.pdf

finetune_gpt2.ipynb

finetune_gpt2.ipynb

Repository files navigation

gpt2-paraphraser-comparisons

Create Datasets

Finetune GPT-2 Models

Evaluate Models

Results Analysis

About

Releases

Packages

Languages

License

SRM47/gpt2-paraphraser-comparisons

Folders and files

Latest commit

History

Repository files navigation

gpt2-paraphraser-comparisons

Create Datasets

Finetune GPT-2 Models

Evaluate Models

Results Analysis

About

Topics

Resources

License

Stars

Watchers

Forks

Languages