Skip to content

Bayesian and Markovian sentiment classification for movie reviews

Notifications You must be signed in to change notification settings

SamDaQueen/Bayesian-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayesian-Sentiment-Analysis

Created as part of course CS5100-Foundations of Artificial Intelligence at Northeastern University

The base code in 'bt.py' other than the 'naive_bayes_classify()' and 'markov_model_classify()' function was provided by the professor of the course.


Sentiment analysis of movie reviews

The model is trained on data from www.kaggle.com/c/sentiment-analysis-on-movie-reviews. It predicts the most likely sentiment of a movie rating using a unigram (Naive Bayes) and bigram model (Markov model).


Input

Input file pattern

1 1 happy happy joy joy 4
2 2 happy meh joy meh 3
3 3 meh meh meh meh 2
4 4 blah meh blah meh 1
5 5 blah ugh blah ugh 0
---
happy happy happy
happy meh

Explanation

Lines before '---' are the training sentences with 'PhraseId', 'SentenceId', 'Phrase' and 'Sentiment'. Lines after it are to be classified.


Output

Output file pattern:

4
-3.6888794541139363
4
-3.6888794541139363
3
-3.6888794541139363
3
-2.995732273553991

Explanation

For each sentence to be classified from input file, there is one line each for the sentiment class and the logarithm probability obtained from bayesian model and the same obtained from markovian model respectively. Hence, there are 4 lines for each sentence.


Files included

  1. by.py - Python code for the bayesian and markovian models
  2. Text files 'bayesTest.txt', 'smallTest.txt', 'train.tsv' - Training data with sentences to be tested
  3. Output files 'bayesTest.out', 'smallTest.out', 'train.out' - Output of the sentiment classification.