In this project, I explore the implementation of simaese network and cosine similarity to detect similarity among question pairs from the Kaggle Quora dataset. The raw texts are preprocessed using nltk, and the word embedding weights are from GloVe-wiki-Gigaword-300. The whole process can be read in the 'Quora Question Pair Siamese Network.ipynb' notebook.
We achieve 82.78% accuracy on testing set and 89.73% accuracy on training set on the 4th epoch. Training the model for more epochs does not improve the prediction accuracy any further.