Skip to content
This repository contains various ways to calculate sentence vector similarity using NLP models
Python
Branch: master
Clone or download
Latest commit f452c97 Nov 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
fig change the result img Nov 14, 2019
models apply flake8 to all the files Nov 9, 2019
utils change color of heatmap Nov 9, 2019
.gitignore add BERTScore Nov 5, 2019
README.md add article Nov 14, 2019
corpus.txt add matrix operation Nov 8, 2019
requirements.txt change the requirement.txt Nov 8, 2019
sensim.py add matrix operation Nov 8, 2019

README.md

Sentence Similarity Calculator

This repo contains various ways to calculate the similarity between source and target sentences. You can choose the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).

And you can also choose the method to be used to get the similarity:

1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF

You can experiment with (The number of models) x (The number of methods) combinations!


Installation

  • After cloning this repository, you can simply install all the dependent libraries described in requirements.txt with pip install -r requirements.txt.
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
pip install -r requirements.txt

Usage

  • To test your sentences, you should fill out corpus.txt with sentences as below.
I ate an apple.
I went to the Apple.
I ate an orange.
...
  • Then, choose the model and method to be used to calculate the similarity between source and target sentences.
python sensim.py
    --model    MODEL_NAME
    --method   METHOD_NAME
    --verbose  LOG_OPTION (bool)

Examples

  • In the following section, you can see the result of sentence-similarity.
  • As you guys know, there is a no silver-bullet which can calculate perfect similarity between sentences. You should conduct various experiments with your dataset.
    • Caution: TS-SS score might not fit with short-sentence similarity task, since this method originally devised to calculate the similarity between documents.
  • Result:


Requirements

  • Python version should be higher than 3.6.x
  • You should install PyTorch via official Installation guide
  • To use spaCy model which is used to tokenize input sentence, download English model by running python -m spacy download en_core_web_sm.
allennlp==0.9.0
bert-score==0.2.1
numpy==1.17.3
scikit-learn==0.21.3
scipy==1.3.1
seaborn==0.9.0
sentence-transformers==0.2.3
spacy==2.1.9
tensorflow==1.15.0
tensorflow-hub==0.7.0
torch==1.3.0

TODO

  • Upgrade TF to TF2.0 to use USE 3
  • Add pairwise cosine similarity method in use_elmo.
  • Add InferSent, Sent2Vec, plain GloVe as models.

References

Papers

Libraries

Articles

You can’t perform that action at this time.