Skip to content

A HackerEarth’s ML Challenge based on Detecting sentiments of image quote using unsupervised Machine Learning techniques and knowledge of OCR and NLP to build a model that analyzes sentiments of a quote and classifies them into positive, negative, or random.

Notifications You must be signed in to change notification settings

aaqibmehrban/Detecting-sentiments-of-image-quote-HackerEarth-s-ML-Challenge

 
 

Repository files navigation

Detecting-sentiments-of-image-quote (HackerEarth's ML Challenge Love_is-Love)

A HackerEarth’s ML Challenge based on Detecting sentiments of image quote using unsupervised Machine Learning techniques and knowledge of OCR and NLP to build a model that analyzes sentiments of a quote and classifies them into positive, negative, or random.

My Approach

  • Used glob to extract all the paths for the images
  • In one approach i used cv2 and pytesseract library (images needs to be preprocessed using cv2 library) to read the text from the .jpg files.
  • And in other i used easyocr library (gave better results) to read the text from the .jpg files.
  • Removed \n and contraction then applied some spelling correction and created Sentiment scores using SentimentIntensityAnalyzer & TextBlob Sentiment polarity score
  • Removed Non alphabets, Stopwords and applied lemmatization
  • Created TFIDF vectors for the text
  • Created vector represenation for each comment
  • Stacked all these features i.e. Sentiment scores, TFIDF Vectors, Comment Vectors And used these as features to cluster the comments in 3 groups using Kmeans.
  • Further by inspection detected the comment types in the cluster and mapped Random, Negative, Positive accordingly.

About

A HackerEarth’s ML Challenge based on Detecting sentiments of image quote using unsupervised Machine Learning techniques and knowledge of OCR and NLP to build a model that analyzes sentiments of a quote and classifies them into positive, negative, or random.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.3%
  • Python 0.7%