Skip to content

One of the Kaggle challenges, where we compared the similarity between two sentences using cosine similarity and machine learning algorithm (Naïve Bayes) and compared the result on the basis of accuracy, precision, f-measure.

Notifications You must be signed in to change notification settings

NavneetPrakashSingh/natural-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using Data Science with Natural Language Processing

Abstract

The main objective is to determine the similarity between two sentences from different aspects. Based on the corpus received from Quora, word similarity between two sentences is determined using four different aspects. We determined that more measures are required to determine the efficiency than just accuracy. Experiments show that using Naïve Bayes to determine the similarity between two sentences is closer to the people’s comprehension to the meaning of the sentence and gives a higher accuracy and efficiency as compared to a cosine similarity.

Conclusion

In the work we have done here, we use two methods for detecting duplicate questions. We compare these two approaches in depth using measures of accuracy, precision, recall, and f-measure. We found the accuracy of the Naïve Bayes classifier to be slightly more accurate than the cosine similarity approach. Looking at the confusion matrices for both approaches during our experiments led us to determine that accuracy alone is not the best measure for this task. We then experimented on quite a few measures and finally settled on f-measure, precision, and recall. We found that they, along with accuracy, provide a good measure for our experiment. Comparing the two approaches used, we found that Naïve Bayes has significantly better recall value than cosine similarity. Consequently, it also has a higher fmeasure value. This led us to determine that Naïve Bayes is much better for this classification than cosine similarity.

Links

Complete report related to the project can be accessed here : https://github.com/NavneetPrakashSingh/natural-python/blob/master/report.pdf

Complte code related tot he project can be accessed here : https://github.com/NavneetPrakashSingh/natural-python/tree/master/code

Presentation slides can be accessed here : https://github.com/NavneetPrakashSingh/natural-python/tree/master/presentation

About

One of the Kaggle challenges, where we compared the similarity between two sentences using cosine similarity and machine learning algorithm (Naïve Bayes) and compared the result on the basis of accuracy, precision, f-measure.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages