Skip to content

[Natural Language Processing] Using NLTK-3 and Sklearn to train different machine learning classifiers and then using an average system to produce the best optimized sentiment analysis of Twitter feeds.

Notifications You must be signed in to change notification settings

aalind0/NLP-Sentiment-Analysis-Twitter

Repository files navigation

Movie Reviews - Sentiment Analysis

Python 3.5 classification of tweets (positive or negative) using NLTK-3 and sklearn.

An analysis of the twitter data set included in the nltk corpus.


What is in this repo

  • An implementation of nltk.NaiveBayesClassifier trained against 1000 tweets. Implemented in Train_Classifiers.py.
  • Using sklearn
    • Naive Bayes:
      • MultinomialNB:
      • BernoulliNB:
    • Linear Model
      • LogisticRegression:
      • SGDClassifier:
    • SVM
      • SVC:
      • LinearSVC:
      • NuSVC:

Implemented in Scikit_Learn_Classifiers.py

  • Implemented a voting system to choose the best out of all the learning methods. Implemented in sentiment_mod.py

Accuracy achieved

Classifiers Accuracy achieved
nltk.NaiveBayesClassifier 73.0%
ScikitLearn Implementations
BernoulliNB 72.0%
MultinomialNB 75.0%
LogisticRegression 71.0%
SGDClassifier 69.0%
SVC 48.0%
LinearSVC 74.0%
NuSVC 75.0%

Requirements

The simplest way(and the suggested way) would be to install the the required packages and the dependencies by using either anaconda or miniconda

After that you can do

$ conda update conda
$ conda install scikit-learn nltk

Downloading the dataset

The dataset used in this package is bundled along with the nltk package.

Run your python interpreter

>>> import nltk
>>> nltk.download('stopwords')
>>> nltk.download('movie_reviews') 

NOTE: You can check system specific installation instructions from the official nltk website

Check if everything is good till now by running your interpreter again and importing these

>>> import nltk
>>> from nltk.corpus import stopwords, movie_reviews
>>> import sklearn
>>> 

If these imports work for you. Then you are good to go!


Running it

  1. Clone the repo
$ git clone https://github.com/aalind0/Movie_Reviews-Sentiment_Analysis
$ cd Movie_Reviews-Sentiment_Analysis
  1. Order of running

  2. NLTK_Naive_Bayes.py

  3. Scikit_Learn_Classifiers.py

  4. Voting_Algos.py

  5. Hack away!


So

"So what, Well this is pretty basic!"

Yes, it is but hey we all do start somewhere right?

Coming Up. I am working on a Twitter Sentiment Analysis project which first trains on a given data-set and then takes in the live twitter feeds, analyses them plus plots them for data visualization.

You can follow me on twitter @singh_aalind to keep tabs on it.


End

Hacked together by Aalind Singh.

About

[Natural Language Processing] Using NLTK-3 and Sklearn to train different machine learning classifiers and then using an average system to produce the best optimized sentiment analysis of Twitter feeds.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages