Semantic Paraphrase Detection

By Pranav Goyal and Saransh Rajput

Task

Given two sentences, the task is to classify if the the sentences are paraphrase of each other. This is a binary classification. The task is a generalization of the semantic textual similarity which would be about finding the level of similarity among two sentences.

Datasets

Quora Question Pairs (QQP) - 404300 sentence pairs
Microsoft Paraphrase COrpus (MSRP) - 5800 sentence pairs

Models Implemented

Saved Model Checkpoints

1. Bidirectional Multi-Perspective Matching (BiMPM)

Details:

Uses pretrained Glove embeddings (6B)
Bidirectional LSTM to encode the sentences
Three Distinct multi-perspective matching mechanisms to measure the relations across sentences
Final LSTM to generate fixed-size vector and a feed-forward network to predict the output

2. Multiway Attention

Details:

GRU for contextual representatoins.
Four attention mechanism to get matching vectors representation from the other sentence.
Further aggregated using GRU and attention applied based on inputs to make the final predictions.

3. Finetune BERT-Base

Details:

Fine-tuned Bert Base Model (12 layers of stacked transformers with 768 hidden dimention) with a binary classification dense layer on top.
Original Pretrained Model: bert_en_uncased_L-12_H-768_A-12_3

Results

No	Models	Validation Accuracy on QQP	Validation Accuracy on MSRP
1	BiMPM	85.28 %	72.52 %
2	Multiway Attention	83.47 %	72.56 %
3	Bert Finetuning	90.10 %	84.12 %

Report

Project_Report.pdf

Additional Links

Project Presentation

Repository Link

References

“Fine-Tuning a BERT Model : TensorFlow Core.” TensorFlow, www.tensorflow.org/official_models/fine_tuning_bert.
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Wang, Zhiguo, Wael Hamza, and Radu Florian. "Bilateral multi-perspective matching for natural language sentences." arXiv preprint arXiv:1702.03814 (2017).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
model_images		model_images
training Notebooks		training Notebooks
.gitignore		.gitignore
Project_Report.pdf		Project_Report.pdf
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Paraphrase Detection

Task

Datasets

Models Implemented

1. Bidirectional Multi-Perspective Matching (BiMPM)

2. Multiway Attention

3. Finetune BERT-Base

Results

Report

Additional Links

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Paraphrase Detection

Task

Datasets

Models Implemented

1. Bidirectional Multi-Perspective Matching (BiMPM)

2. Multiway Attention

3. Finetune BERT-Base

Results

Report

Additional Links

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages