Browse files

Added bib entries

  • Loading branch information...
1 parent fc14223 commit d56c0b285030a6d9656a1ec0648976ac30c8391f @pranjalv123 pranjalv123 committed Feb 5, 2012
Showing with 15 additions and 91 deletions.
  1. +4 −2 egpaper_final.tex
  2. +11 −89 fpbib.bib
View
6 egpaper_final.tex
@@ -43,14 +43,16 @@
%%%%%%%%% ABSTRACT
\begin{abstract}
-We implement a series of classifiers (Naive Bayes, Maximum Entropy, and SVM) to distinguish positive and negative sentiment in critic and user reviews. We apply various processing methods, including negation tagging, part-of-speech tagging, and position tagging to achieve maximum accuracy. We test our classifiers on an external dataset to see how well they generalize. Finally, we use a majority-voting technique to combine classifiers and achieve accuracy of close to 90\% in 3-fold cross-validation\cite{Authors11}.
+We implement a series of classifiers (Naive Bayes, Maximum Entropy, and SVM) to distinguish positive and negative sentiment in critic and user reviews. We apply various processing methods, including negation tagging, part-of-speech tagging, and position tagging to achieve maximum accuracy. We test our classifiers on an external dataset to see how well they generalize. Finally, we use a majority-voting technique to combine classifiers and achieve accuracy of close to 90\% in 3-fold cross-validation.
\end{abstract}
%%%%%%%%% BODY TEXT
\section{Introduction}
Sentiment analysis, broadly speaking, is the set of techniques that allows detection of emotional content in text. This has a variety of applications: it is commonly used by trading algorithms to process news articles, as well as by corporations to better respond to consumer service needs. Similar techniques can also be applied to other text analysis problems, like spam filtering.
+The source code described in this paper is available at https://github.com/cathywu/Sentiment-Analysis.
+
\section{Previous Work}
We set out to replicate Pang’s work from 2002 on using classical knowledge-free supervised machine learning techniques to perform sentiment classification. They used the machine learning methods (Naive Bayes, maximum entropy classification, and support vector machines), methods commonly used for topic classification, to explore the difference between and sentiment classification in documents. Pang cited a number of related works, but they mostly pertain to classifying documents on criteria weakly tied to sentiment or using knowledge-based sentiment classification methods. We used a similar dataset, as released by the authors, and did our best to use the same libraries and pre-processing techniques.
@@ -138,7 +140,7 @@ \subsection{Feature Counting Method}
\subsection{Conditional Independence Assumption}
-The Bayes classifier depends on a conditional independence assumption, meaning that the model it predicts assumes that the probability of a given word is independent of the other words. Clearly, this assumption does not hold. Nevertheless, the Bayes classifier functions well, in part because the positive and negative correlations between features tend to cancel each other out [http://www.cs.unb.ca/profs/hzhang/publications/FLAIRS04ZhangH.pdf].
+The Bayes classifier depends on a conditional independence assumption, meaning that the model it predicts assumes that the probability of a given word is independent of the other words. Clearly, this assumption does not hold. Nevertheless, the Bayes classifier functions well, in part because the positive and negative correlations between features tend to cancel each other out [Zhang].
We found a huge difference between results of Naive Bayes and Maximum Entropy for positive testing accuracy and negative testing accuracy. Maximum Entropy, which makes no unfounded assumptions about the data, gave very similar results for positive tests and negative tests with a 0.2\% difference on average. On the other hand, positive and negative results from Naive Bayes, which assumes conditional independence, varies by 27.5\% on average, with the worst cases performing on test configurations using frequency, averaging 40\% difference. These disparities suggest evidence that the movie dataset does not satisfy the conditional independence assumption.
View
100 fpbib.bib
@@ -1,94 +1,16 @@
-@inproceedings{Gordon,
-author = "G. Gordon, T.Darrell, M. Harville, and J. Woodfill",
-title = {Background estimation and removal based on range and color},
-booktitle = {Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
-address = {Fort Collins, Colorado},
-pages = {459--454},
-year = 1999
-}
-
-@inproceedings{Jones,
-author = "D. Jones and J. Malik",
-title = {Determining three-dimensional shape from orientation and spatial frequency disparities},
-booktitle = {Proceeding of ECCV},
-address = {Genoa},
-year = 1992
-}
-
-@article{Martin,
-author = "D. Martin, C. Fowlkes, J. Malik",
-title = {Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues},
-journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
-year = 2004,
-volume = 26,
-number = 5,
-pages = {530--549}
-}
-
-@inproceedings{McIvor,
-author = "A. McIvor",
-title = {Background subtraction techniques},
-booktitle = {Proceedings of Image \& Vision Computing New Zealand 2000 IVCNZ’00},
-address = {Auckland, New Zealand},
-year = 2000
-}
-
-@inproceedings{Scott,
-author = "G. Scott and H Longuet-Higgins",
-title = {Feature grouping by relocalisation of eigenvectors of the proximity matrix},
-booktitle = {Proceeding of British Machine Vision Conference},
-pages = {103--108},
-year = 1990
-}
-
-@article{Seitz,
-author = "P. Seitz",
-title = {Using local orientation information as image primitive
-for robust object recognition},
-journal = {SPIE Visual Communications and Image Processing IV},
-pages = {1630--1639},
-volume = {1199},
-number = 1,
-year = 1989
-}
-
-@electronic{Vance,
- author = "A. Vance",
- title = "Microsoft's Ambivalence About Kinect Hackers",
- note = {http://www.businessweek.com/magazine/ content/11\_04/b4212028870272.htm},
- month = jan,
- year = "2011"
-}
-
-@article{Wren,
-author = "C. Wren and Y. Ivanov",
-title = {Volumetric Operations with Surface Margins},
-journal = {IEEE Computer Vision and Pattern Recognition Technical Sketches},
-year = 2002
-}
-
-@article{Zabih,
- author = "R. Zabih and J. Woodfill",
- title = {Non-parametric local transforms for computing visual correspondence.},
- journal = {Lecture Notes in Computer Science 800},
- year = 1994,
- pages = {151-158}
+@InProceedings{Pang+Lee+Vaithyanathan:02a,
+ author = {Bo Pang and Lillian Lee and Shivakumar Vaithyanathan},
+ title = {Thumbs up? {Sentiment} Classification using Machine Learning Techniques},
+ booktitle = "Proceedings of the 2002 Conference on Empirical Methods in Natural
+Language Processing (EMNLP)",
+ pages = {79--86},
+ year = 2002
}
@inproceedings{Zhang,
- author = "L. Zhang, B. Curless, and S. M. Seitz",
- title = "Rapid Shape Acquisition Using Color
-Structured Light and Multi-pass Dynamic Programming",
- intype = "presented at the",
- booktitle = "Proceedings of the 1st
-International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT)",
- address = "Padova, Italy",
- year = "2002",
- pages = "24-36",
-}
-
-@misc{depthmap,
- title = {Screenshot.png},
- note = {http://www.vislab.usyd.edu.au/blogs/media/ blogs/baz/Screenshot.png},
+author = "Harry Zhang",
+title = {The Optimality of Naive Bayes},
+booktitle = {American Association for Artificial Intelligence},
+year = 2004
}

0 comments on commit d56c0b2

Please sign in to comment.