This project performs sentiment analysis of text to classify it into positive/negative categories (tested over Twitter and IMDB data) using supervised learning technques. For classification of data, Logistic Regression and Naive Bayes Classifier has been used seperately.
In order to generate feature vectors, two approaches have been used:
- A more traditional NLP technique where features are "important" words (based on certain criteria) and feature vectors are corresponding binary vectors.
- Doc2Vec technique where document vectors are learned via artificial neural networks. (ref)
Note: The project was part of coursework in 'Algorithms for Data Guided Business Intelligence' at NCSU, Raleigh. The content must not be used for illicit purposes.