Skip to content

dallal9/ArNLPdallal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

semsim

Semantic Similarity For English language

Calculates a score for semantic similarity

Features

  • LSTM model for calculating semantic similarity with Keras library and Quora dataset.
  • Sentiment Analysis included
  • Negation identification
  • Extract keywords including noun chunks with multiple words, in addition to stop-words removal
  • Classify questions types, questions starting with question words such as what, when, how much, ..etc
  • Named entities detection using NLTK's tree2conlltags and Spacy
  • Extract numbers
  • WordNet Similarity is used with the special case of having two sentences with only one word difference which has a special function for "mini similarity check"
  • Text normalization and part-of-speech tagging

Input

Two English text strings

Output

Dictionary containing:

  • Flag indicating whether the two sentences are similar or not sim
  • Total similarity score sim_per
  • Keywords similarity score keywords_sim
  • Semantic similarity score keras
  • Keywords keywords
  • Maximum number of keywords max_keywords
  • Named entities entities
  • Sentiment scores sentiment
  • Numbers numbers
  • Question types class and their flag f_class

Example:

from semsim import Semsim

model = Semsim()

q1="what is the cost of the shirt"

q2="how much does the shirt cost"

model.similar(q1,q2) 

Output

  • similar
{'numbers': [[], []], 'keywords': [['cost', 'shirt'], ['cost', 'shirt']], 'max_keywords': 4,
'f_class': True, 'entities': [[], []], 'sentiment': [0.0, 0.0, 0.0], 'sim': 1, 'class': [5, 5],
'keras': 98.558107145608687, 'sim_per': 86.779053572804344, 'keywords_sim': 75.0}

q1="what is the cost of the shirt"

q2="how much does the shirt weigh"

model.similar(q1,q2) 

Output

  • not similar
{'keywords': [['shirt', 'cost'], ['shirt', 'weigh']], 'sim_per': 45.730595498683286, 'max_keywords':
 5, 'keywords_sim': 40.0, 'f_class': True, 'class': [5, 5], 'sentiment': [0.0, 0.0, 0.0], 'numbers':
 [[], []], 'entities': [[], []], 'keras': 51.461190997366565, 'sim': 0}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published