Browse files

Merge branch 'master' of github.com:cathywu/Sentiment-Analysis

Conflicts:
	egpaper_final.tex
  • Loading branch information...
2 parents 95d6d2d + 3a9d8de commit a2daf50a9269258d2755e3c244287269c7b3d548 @pranjalv123 pranjalv123 committed Feb 6, 2012
Showing with 287 additions and 13 deletions.
  1. +10 −4 egpaper_final.aux
  2. +2 −2 egpaper_final.blg
  3. +15 −7 egpaper_final.log
  4. BIN egpaper_final.pdf
  5. +128 −0 egpaper_final.tex
  6. +132 −0 egpaper_final.tex~
View
14 egpaper_final.aux
@@ -31,10 +31,16 @@
\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Example of a short caption, which should be centered.}}{4}}
\newlabel{fig:short}{{2}{4}}
\@writefile{toc}{\contentsline {subsection}{\numberline {3.3}\hskip -1em.\nobreakspace {}Footnotes}{4}}
-\@writefile{toc}{\contentsline {subsection}{\numberline {3.4}\hskip -1em.\nobreakspace {}References}{4}}
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Results. Ours is better.}}{4}}
-\@writefile{toc}{\contentsline {subsection}{\numberline {3.5}\hskip -1em.\nobreakspace {}Illustrations, graphs, and photographs}{4}}
-\@writefile{toc}{\contentsline {subsection}{\numberline {3.6}\hskip -1em.\nobreakspace {}Color}{4}}
-\@writefile{toc}{\contentsline {section}{\numberline {4}\hskip -1em.\nobreakspace {}Final copy}{4}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.4}\hskip -1em.\nobreakspace {}Appendix A}{4}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.5}\hskip -1em.\nobreakspace {}Appendix B}{4}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.6}\hskip -1em.\nobreakspace {}References}{4}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.7}\hskip -1em.\nobreakspace {}Illustrations, graphs, and photographs}{4}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.8}\hskip -1em.\nobreakspace {}Color}{4}}
\bibstyle{ieee}
\bibdata{egbib}
+\@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces 3-fold cross validation results on movie dataset. Values repesent positive, negative, or overall accuracy.}}{5}}
+\@writefile{toc}{\contentsline {section}{\numberline {4}\hskip -1em.\nobreakspace {}Final copy}{5}}
+\@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Test results on Yelp dataset with Naive Bayes classifier. Values repesent percent of reviews classified as positive for a given star rating.}}{6}}
+\@writefile{lof}{\contentsline {figure}{\numberline {5}{\ignorespaces Test results on Yelp dataset with Maximum Entropy classifier. Values repesent percent of reviews classified as positive for a given star rating.}}{6}}
+\@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces Test results on Yelp dataset with SVM classifier. Values repesent percent of reviews classified as positive for a given star rating.}}{6}}
View
4 egpaper_final.blg
@@ -1,12 +1,12 @@
This is BibTeX, Version 0.99c (TeX Live 2009/Debian)
The top-level auxiliary file: egpaper_final.aux
I couldn't open style file ieee.bst
----line 39 of file egpaper_final.aux
+---line 40 of file egpaper_final.aux
: \bibstyle{ieee
: }
I'm skipping whatever remains of this command
I couldn't open database file egbib.bib
----line 40 of file egpaper_final.aux
+---line 41 of file egpaper_final.aux
: \bibdata{egbib
: }
I'm skipping whatever remains of this command
View
22 egpaper_final.log
@@ -1,4 +1,4 @@
-This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.11.2) 5 FEB 2012 19:32
+This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.11.2) 5 FEB 2012 21:15
entering extended mode
%&-line parsing enabled.
**egpaper_final.tex
@@ -264,20 +264,28 @@ LaTeX Warning: Citation `Alpher03' on page 3 undefined on input line 277.
LaTeX Warning: Citation `Authors06' on page 3 undefined on input line 277.
[3]
+Overfull \hbox (4.21208pt too wide) in paragraph at lines 369--414
+[][]
+ []
-LaTeX Warning: Citation `Authors06' on page 4 undefined on input line 372.
-[4] (./egpaper_final.bbl) [5
+LaTeX Warning: Citation `Authors06' on page 4 undefined on input line 498.
-] (./egpaper_final.aux)
+
+Underfull \vbox (badness 5652) has occurred while \output is active []
+
+ [4]
+(./egpaper_final.bbl) [5
+
+] [6] (./egpaper_final.aux)
LaTeX Warning: There were undefined references.
)
Here is how much of TeX's memory you used:
2173 strings out of 495061
26125 string characters out of 1182621
- 83749 words of memory out of 3000000
+ 139749 words of memory out of 3000000
5307 multiletter control sequences out of 15000+50000
33404 words of font info for 80 fonts, out of 3000000 for 9000
28 hyphenation exceptions out of 8191
@@ -293,9 +301,9 @@ nts/type1/public/amsfonts/cm/cmti10.pfb></usr/share/texmf-texlive/fonts/type1/u
rw/courier/ucrr8a.pfb></usr/share/texmf-texlive/fonts/type1/urw/times/utmb8a.pf
b></usr/share/texmf-texlive/fonts/type1/urw/times/utmr8a.pfb></usr/share/texmf-
texlive/fonts/type1/urw/times/utmri8a.pfb>
-Output written on egpaper_final.pdf (5 pages, 151207 bytes).
+Output written on egpaper_final.pdf (6 pages, 162579 bytes).
PDF statistics:
- 67 PDF objects out of 1000 (max. 8388607)
+ 70 PDF objects out of 1000 (max. 8388607)
0 named destinations out of 1000 (max. 500000)
1 words of extra memory for PDF output out of 10000 (max. 10000000)
View
BIN egpaper_final.pdf
Binary file not shown.
View
128 egpaper_final.tex
@@ -177,6 +177,7 @@ \subsection{Part of Speech Tagging}
\subsection{Adjectives}
Intuitively, adjectives like ``beautiful'', ``wonderful'', and ``great'' hold valuable sentiment information, so we trained our classifiers after filtering out only the adjectives within reviews. On average, adjective tests performed about 6\% worse than their unfiltered negation-tagged counterparts, with no notable difference between the 3 classifiers. These results suggest that the limited information conveyed in adjectives is not representative of the full review itself.
+
\subsection{Verbs}
As in the motivating example for the use of POS tagging, it was in the case of the verb use of ``love'' (``I love this movie'') that conveyed sentimental information, rather than the adjective use of the word. Interestingly, Pang did not include results for training only on verbs. Even more interestingly, despite the motivating example, verbs under-performed all other tests, while still being consistently better than random. The tests ranged from 60\% to 67\% accuracy, even sometimes doing worse than the 64\% accurate human-based classifier from Pang 2002. We suspect this is in part due to the sparsity of features when only using verbs, as there were on average 37.2 verbs and 55.7 adjectives per review.
@@ -192,6 +193,133 @@ \subsection{Neighboring Domain Data}
We expected to see worse results, given the difference in vocabulary, subject matter, tone, etc., but all configurations performed better than random. We also saw strong positive trends across all test configurations, classifying reviews with more stars more positively.
+
+%-------------------------------------------------------------------------
+\begin{figure*}
+\begin{tabular}{{|l}*{11}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{3}{|c|}{Naive Bayes} & \multicolumn{3}{|c|}{MaxEnt} & \multicolumn{3}{|c|}{SVM}\\
+\hline
+Domain & Features & \# of features & Frequency & + & - & $\pm$& + & - & $\pm$& + & - & $\pm$\\
+\hline
+No-negation & Unigrams & 16165 & Frequency & 0.94 & 0.62 & 0.78 & - & - & - & 0.82 & 0.82 & 0.82 \\
+No-negation & Unigrams & 16165 & Presence & 0.87 & 0.72 & 0.82 & 0.85 & 0.87 & 0.86 & 0.85 & 0.84 & 0.84 \\
+No-negation & Bigrams & 16165 & Frequency & 0.92 & 0.64 & 0.78 & - & - & - & 0.77 & 0.81 & 0.79 \\
+No-negation & Bigrams & 16165 & Presence & 0.89 & 0.73 & 0.81 & 0.79 & 0.82 & 0.81 & 0.8 & 0.81 & 0.80 \\
+adjectives & Unigrams & 16165 & Frequency & 0.95 & 0.52 & 0.73 & - & - & - & 0.75 & 0.77 & 0.76 \\
+default & Bigrams & 2633 & Frequency & 0.91 & 0.46 & 0.69 & - & - & - & 0.74 & 0.75 & 0.75 \\
+default & Bigrams & 16165 & Frequency & 0.92 & 0.64 & 0.78 & - & - & - & 0.78 & 0.79 & 0.78 \\
+default & Unigrams & 2633 & Frequency & 0.96 & 0.5 & 0.74 & - & - & - & 0.81 & 0.79 & 0.80 \\
+default & Unigrams & 16165 & Frequency & 0.93 & 0.59 & 0.76 & - & - & - & 0.82 & 0.81 & 0.82 \\
+default & Unigrams & maximum & Frequency & 0.95 & 0.49 & 0.72 & - & - & - & 0.82 & 0.81 & 0.82 \\
+partofspeech & Bigrams & 16165 & Frequency & 0.96 & 0.47 & 0.71 & - & - & - & 0.82 & 0.82 & 0.82 \\
+partofspeech & Unigrams & 16165 & Frequency & 0.96 & 0.54 & 0.75 & - & - & - & 0.82 & 0.81 & 0.81 \\
+position & Bigrams & 16165 & Frequency & 0.96 & 0.49 & 0.73 & - & - & - & 0.77 & 0.78 & 0.78 \\
+position & Unigrams & 16165 & Frequency & 0.93 & 0.58 & 0.76 & - & - & - & 0.81 & 0.82 & 0.82 \\
+verbs & Unigrams & maximum & Frequency & 0.8 & 0.55 & 0.67 & - & - & - & 0.61 & 0.65 & 0.63 \\
+adjectives & Unigrams & 16165 & Presence & 0.93 & 0.59 & 0.76 & 0.79 & 0.77 & 0.78 & 0.75 & 0.73 & 0.74 \\
+default & Bigrams & 2633 & Presence & 0.86 & 0.64 & 0.75 & 0.75 & 0.75 & 0.75 & 0.73 & 0.75 & 0.74 \\
+default & Bigrams & 16165 & Presence & 0.89 & 0.74 & 0.81 & 0.81 & 0.82 & 0.81 & 0.78 & 0.79 & 0.78 \\
+default & Unigrams & 2633 & Presence & 0.84 & 0.8 & 0.82 & 0.84 & 0.82 & 0.83 & 0.78 & 0.82 & 0.8 \\
+default & Unigrams & 16165 & Presence & 0.87 & 0.77 & 0.82 & 0.84 & 0.85 & 0.85 & 0.83 & 0.82 & 0.83 \\
+default & Unigrams & maximum & Presence & 0.91 & 0.7 & 0.81 & 0.84 & 0.86 & 0.85 & 0.83 & 0.85 & 0.84 \\
+partofspeech & Bigrams & 16165 & Presence & 0.89 & 0.73 & 0.81 & 0.84 & 0.84 & 0.84 & 0.79 & 0.82 & 0.8 \\
+partofspeech & Unigrams & 16165 & Presence & 0.86 & 0.76 & 0.81 & 0.85 & 0.85 & 0.85 & 0.84 & 0.83 & 0.84 \\
+position & Bigrams & 16165 & Presence & 0.87 & 0.66 & 0.76 & 0.82 & 0.83 & 0.82 & 0.73 & 0.76 & 0.74 \\
+position & Unigrams & 16165 & Presence & 0.86 & 0.78 & 0.82 & 0.84 & 0.85 & 0.85 & 0.80 & 0.80 & 0.80 \\
+verbs & Unigrams & maximum & Presence & 0.80 & 0.54 & 0.67 & 0.65 & 0.65 & 0.65 & 0.64 & 0.63 & 0.635 \\
+adjectives & Unigrams & 16165 & TF-IDF & 0.82 & 0.60 & 0.71 & - & - & - & 0.79 & 0.76 & 0.77 \\
+default & Bigrams & 2633 & TF-IDF & 0.92 & 0.46 & 0.69 & - & - & - & 0.76 & 0.71 & 0.74 \\
+default & Bigrams & 16165 & TF-IDF & 0.90 & 0.68 & 0.79 & - & - & - & 0.83 & 0.74 & 0.79 \\
+default & Unigrams & 2633 & TF-IDF & 0.85 & 0.52 & 0.74 & - & - & - & 0.81 & 0.79 & 0.80 \\
+default & Unigrams & 16165 & TF-IDF & 0.88 & 0.68 & 0.78 & - & - & - & 0.83 & 0.77 & 0.80 \\
+default & Unigrams & maximum & TF-IDF & 0.86 & 0.65 & 0.76 & - & - & - & 0.83 & 0.78 & 0.81 \\
+partofspeech & Bigrams & 16165 & TF-IDF & 0.89 & 0.67 & 0.78 & - & - & - & 0.79 & 0.74 & 0.76 \\
+partofspeech & Unigrams & 16165 & TF-IDF & 0.89 & 0.63 & 0.76 & - & - & - & 0.81 & 0.78 & 0.79 \\
+position & Bigrams & 16165 & TF-IDF & 0.89 & 0.59 & 0.74 & - & - & - & 0.79 & 0.69 & 0.74 \\
+position & Unigrams & 16165 & TF-IDF & 0.91 & 0.61 & 0.76 & - & - & - & 0.81 & 0.71 & 0.76 \\
+verbs & Unigrams & maximum & TF-IDF & 0.64 & 0.57 & 0.60 & - & - & - & 0.62 & 0.66 & 0.64 \\
+\hline
+\end{tabular}
+\caption{3-fold cross validation results on movie dataset. Values repesent positive, negative, or overall accuracy.}
+\end{figure*}
+
+%-------------------------------------------------------------------------
+
+\begin{figure*}
+\begin{tabular}{{|l}*{8}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{6}{|c|}{Naive Bayes} \\
+\hline
+Domain & Features & \# of features & Frequency & ***** & **** & *** & ** & * & score \\
+\hline
+default & Unigrams & 16165 & Frequency & 0.72 & 0.68 & 0.53 & 0.34 & 0.24 & 0.74 \\
+default & Unigrams & 16165 & Presence & 0.49 & 0.41 & 0.24 & 0.14 & 0.08 & 0.71 \\
+default & Bigrams & 16165 & Presence & 0.50 & 0.42 & 0.26 & 0.13 & 0.10 & 0.70 \\
+position & Unigrams & 16165 & Presence & 0.35 & 0.29 & 0.14 & 0.08 & 0.04 & 0.65 \\
+partofspeech & Unigrams & 16165 & Presence & 0.45 & 0.37 & 0.20 & 0.11 & 0.06 & 0.69 \\
+adjectives & Unigrams & 16165 & Presence & 0.76 & 0.73 & 0.61 & 0.45 & 0.36 & 0.70 \\
+verbs & Unigrams & 16165 & Presence & 0.44 & 0.43 & 0.41 & 0.37 & 0.32 & 0.56 \\
+default & Unigrams & maximum & Presence & 0.59 & 0.55 & 0.36 & 0.23 & 0.15 & 0.72 \\
+position & Unigrams & maximum & Presence & 0.54 & 0.50 & 0.33 & 0.22 & 0.14 & 0.70 \\
+partofspeech & Unigrams & maximum & Presence & 0.56 & 0.52 & 0.35 & 0.22 & 0.14 & 0.71 \\
+adjectives & Unigrams & maximum & Presence & 0.76 & 0.73 & 0.61 & 0.45 & 0.36 & 0.70 \\
+verbs & Unigrams & maximum & Presence & 0.44 & 0.43 & 0.41 & 0.37 & 0.32 & 0.56 \\
+\hline
+\end{tabular}
+\caption{Test results on Yelp dataset with Naive Bayes classifier. Values repesent percent of reviews classified as positive for a given star rating.}
+\end{figure*}
+
+\begin{figure*}
+\begin{tabular}{{|l}*{20}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{6}{|c|}{MaxEnt}\\
+\hline
+Domain & Features & \# of features & Frequency & ***** & **** & *** & ** & * & score \\
+\hline
+default & Unigrams & 16165 & Frequency & - & - & - & - & - & - \\
+default & Unigrams & 16165 & Presence & 0.61 & 0.57 & 0.39 & 0.23 & 0.11 & 0.75 \\
+default & Bigrams & 16165 & Presence & 0.63 & 0.59 & 0.45 & 0.28 & 0.26 & 0.68 \\
+position & Unigrams & 16165 & Presence & 0.46 & 0.43 & 0.28 & 0.17 & 0.11 & 0.67 \\
+partofspeech & Unigrams & 16165 & Presence & 0.55 & 0.50 & 0.32 & 0.20 & 0.10 & 0.72 \\
+adjectives & Unigrams & 16165 & Presence & 0.75 & 0.72 & 0.62 & 0.45 & 0.37 & 0.69 \\
+verbs & Unigrams & 16165 & Presence & 0.43 & 0.41 & 0.38 & 0.34 & 0.30 & 0.56 \\
+default & Unigrams & maximum & Presence & 0.59 & 0.54 & 0.36 & 0.20 & 0.11 & 0.74 \\
+position & Unigrams & maximum & Presence & 0.44 & 0.40 & 0.26 & 0.15 & 0.09 & 0.68 \\
+partofspeech & Unigrams & maximum & Presence & 0.52 & 0.47 & 0.30 & 0.18 & 0.09 & 0.72 \\
+adjectives & Unigrams & maximum & Presence & 0.75 & 0.72 & 0.62 & 0.45 & 0.37 & 0.69 \\
+verbs & Unigrams & maximum & Presence & 0.43 & 0.41 & 0.38 & 0.34 & 0.30 & 0.56 \\
+\hline
+\end{tabular}
+\caption{Test results on Yelp dataset with Maximum Entropy classifier. Values repesent percent of reviews classified as positive for a given star rating.}
+\end{figure*}
+
+\begin{figure*}
+\begin{tabular}{{|l}*{20}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{6}{|c|}{SVM}\\
+\hline
+Domain & Features & \# of features & Frequency & ***** & **** & *** & ** & * & score \\
+\hline
+default & Unigrams & 16165 & Frequency & 0.78 & 0.76 & 0.62 & 0.42 & 0.30 & 0.74 \\
+default & Unigrams & 16165 & Presence & 0.58 & 0.54 & 0.38 & 0.25 & 0.14 & 0.72 \\
+default & Bigrams & 16165 & Presence & 0.62 & 0.58 & 0.48 & 0.30 & 0.29 & 0.67 \\
+position & Unigrams & 16165 & Presence & 0.42 & 0.39 & 0.27 & 0.39 & 0.42 & 0.50 \\
+partofspeech & Unigrams & 16165 & Presence & 0.52 & 0.48 & 0.31 & 0.21 & 0.01 & 0.75 \\
+adjectives & Unigrams & 16165 & Presence & 0.71 & 0.71 & 0.61 & 0.46 & 0.37 & 0.67 \\
+verbs & Unigrams & 16165 & Presence & 0.45 & 0.45 & 0.42 & 0.38 & 0.32 & 0.57 \\
+default & Unigrams & maximum & Presence & - & - & - & - & - & - \\
+position & Unigrams & maximum & Presence & - & - & - & - & - & - \\
+partofspeech & Unigrams & maximum & Presence & - & - & - & - & - & - \\
+adjectives & Unigrams & maximum & Presence & 0.71 & 0.71 & 0.61 & 0.46 & 0.37 & 0.67 \\
+verbs & Unigrams & maximum & Presence & 0.45 & 0.45 & 0.42 & 0.38 & 0.32 & 0.57 \\
+\hline
+\end{tabular}
+\caption{Test results on Yelp dataset with SVM classifier. Values repesent percent of reviews classified as positive for a given star rating.}
+\end{figure*}
+
+%-------------------------------------------------------------------------
+
{\small
\bibliographystyle{ieee}
\bibliography{egbib}
View
132 egpaper_final.tex~
@@ -55,6 +55,12 @@ We implement a series of classifiers (Naive Bayes, Maximum Entropy, and SVM) to
Sentiment analysis, broadly speaking, is the set of techniques that allows detection of emotional content in text. This has a variety of applications: it is commonly used by trading algorithms to process news articles, as well as by corporations to better respond to consumer service needs. Similar techniques can also be applied to other text analysis problems, like spam filtering.
+\section{Previous Work}
+
+We set out to replicate Pang’s work from 2002 on using classical knowledge-free supervised machine learning techniques to perform sentiment classification. They used the machine learning methods (Naive Bayes, maximum entropy classification, and support vector machines), methods commonly used for topic classification, to explore the difference between and sentiment classification in documents. Pang cited a number of related works, but they mostly pertain to classifying documents on criteria weakly tied to sentiment or using knowledge-based sentiment classification methods. We used a similar dataset, as released by the authors, and did our best to use the same libraries and pre-processing techniques.
+
+In addition to replicating Pang’s work as closely as we could, we extended the work by exploring an additional dataset, additional preprocessing techniques, and combining classifiers. We tested how well classifiers trained on Pang’s dataset extended to reviews in another domain. Although Pang limited many of his tests to use only the 16165 most common ngrams, advanced processors have lifted this computational constraint, and so we additionally tested on all ngrams. We use a newer parameter estimation algorithm called Limited-Memory Variable Metric (L-BFGS) for maximum entropy classification. Pang used the Improved Iterative Scaling method. We also implemented and tested the effect of term frequency-inverse document frequency (TF-IDF) on classification results.
+
%-------------------------------------------------------------------------
\subsection{Language}
@@ -356,6 +362,132 @@ the text (within parentheses, if you prefer, as in this sentence). If you
wish to use a footnote, place it at the bottom of the column on the page on
which it is referenced. Use Times 8-point type, single-spaced.
+%-------------------------------------------------------------------------
+\subsection{Appendix A}
+
+\begin{figure*}
+\begin{tabular}{{|l}*{11}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{3}{|c|}{Naive Bayes} & \multicolumn{3}{|c|}{MaxEnt} & \multicolumn{3}{|c|}{SVM}\\
+\hline
+Domain & Features & \# of features & Frequency & + & - & $\pm$& + & - & $\pm$& + & - & $\pm$\\
+\hline
+No-negation & Unigrams & 16165 & Frequency & 0.94 & 0.62 & 0.78 & - & - & - & 0.82 & 0.82 & 0.82 \\
+No-negation & Unigrams & 16165 & Presence & 0.87 & 0.72 & 0.82 & 0.85 & 0.87 & 0.86 & 0.85 & 0.84 & 0.84 \\
+No-negation & Bigrams & 16165 & Frequency & 0.92 & 0.64 & 0.78 & - & - & - & 0.77 & 0.81 & 0.79 \\
+No-negation & Bigrams & 16165 & Presence & 0.89 & 0.73 & 0.81 & 0.79 & 0.82 & 0.81 & 0.8 & 0.81 & 0.8 \\
+adjectives & Unigrams & 16165 & Frequency & 0.95 & 0.52 & 0.73 & - & - & - & 0.75 & 0.77 & 0.76 \\
+default & Bigrams & 2633 & Frequency & 0.91 & 0.46 & 0.69 & - & - & - & 0.74 & 0.75 & 0.75 \\
+default & Bigrams & 16165 & Frequency & 0.92 & 0.64 & 0.78 & - & - & - & 0.78 & 0.79 & 0.78 \\
+default & Unigrams & 2633 & Frequency & 0.96 & 0.5 & 0.74 & - & - & - & 0.81 & 0.79 & 0.8 \\
+default & Unigrams & 16165 & Frequency & 0.93 & 0.59 & 0.76 & - & - & - & 0.82 & 0.81 & 0.82 \\
+default & Unigrams & maximum & Frequency & 0.95 & 0.49 & 0.72 & - & - & - & 0.82 & 0.81 & 0.82 \\
+partofspeech & Bigrams & 16165 & Frequency & 0.96 & 0.47 & 0.71 & - & - & - & 0.82 & 0.82 & 0.82 \\
+partofspeech & Unigrams & 16165 & Frequency & 0.96 & 0.54 & 0.75 & - & - & - & 0.82 & 0.81 & 0.81 \\
+position & Bigrams & 16165 & Frequency & 0.96 & 0.49 & 0.73 & - & - & - & 0.77 & 0.78 & 0.78 \\
+position & Unigrams & 16165 & Frequency & 0.93 & 0.58 & 0.76 & - & - & - & 0.81 & 0.82 & 0.82 \\
+verbs & Unigrams & maximum & Frequency & 0.8 & 0.55 & 0.67 & - & - & - & 0.61 & 0.65 & 0.63 \\
+adjectives & Unigrams & 16165 & Presence & 0.93 & 0.59 & 0.76 & 0.79 & 0.77 & 0.78 & 0.75 & 0.73 & 0.74 \\
+default & Bigrams & 2633 & Presence & 0.86 & 0.64 & 0.75 & 0.75 & 0.75 & 0.75 & 0.73 & 0.75 & 0.74 \\
+default & Bigrams & 16165 & Presence & 0.89 & 0.74 & 0.81 & 0.81 & 0.82 & 0.81 & 0.78 & 0.79 & 0.78 \\
+default & Unigrams & 2633 & Presence & 0.84 & 0.8 & 0.82 & 0.84 & 0.82 & 0.83 & 0.78 & 0.82 & 0.8 \\
+default & Unigrams & 16165 & Presence & 0.87 & 0.77 & 0.82 & 0.84 & 0.85 & 0.85 & 0.83 & 0.82 & 0.83 \\
+default & Unigrams & maximum & Presence & 0.91 & 0.7 & 0.81 & 0.84 & 0.86 & 0.85 & 0.83 & 0.85 & 0.84 \\
+partofspeech & Bigrams & 16165 & Presence & 0.89 & 0.73 & 0.81 & 0.84 & 0.84 & 0.84 & 0.79 & 0.82 & 0.8 \\
+partofspeech & Unigrams & 16165 & Presence & 0.86 & 0.76 & 0.81 & 0.85 & 0.85 & 0.85 & 0.84 & 0.83 & 0.84 \\
+position & Bigrams & 16165 & Presence & 0.87 & 0.66 & 0.76 & 0.82 & 0.83 & 0.82 & 0.73 & 0.76 & 0.74 \\
+position & Unigrams & 16165 & Presence & 0.86 & 0.78 & 0.82 & 0.84 & 0.85 & 0.85 & 0.8 & 0.8 & 0.8 \\
+verbs & Unigrams & maximum & Presence & 0.8 & 0.54 & 0.67 & 0.65 & 0.65 & 0.65 & 0.64 & 0.63 & 0.635 \\
+adjectives & Unigrams & 16165 & TF-IDF & 0.82 & 0.6 & 0.71 & - & - & - & 0.79 & 0.76 & 0.77 \\
+default & Bigrams & 2633 & TF-IDF & 0.92 & 0.46 & 0.69 & - & - & - & 0.76 & 0.71 & 0.74 \\
+default & Bigrams & 16165 & TF-IDF & 0.9 & 0.68 & 0.79 & - & - & - & 0.83 & 0.74 & 0.79 \\
+default & Unigrams & 2633 & TF-IDF & 0.85 & 0.52 & 0.74 & - & - & - & 0.81 & 0.79 & 0.8 \\
+default & Unigrams & 16165 & TF-IDF & 0.88 & 0.68 & 0.78 & - & - & - & 0.83 & 0.77 & 0.8 \\
+default & Unigrams & maximum & TF-IDF & 0.86 & 0.65 & 0.76 & - & - & - & 0.83 & 0.78 & 0.81 \\
+partofspeech & Bigrams & 16165 & TF-IDF & 0.89 & 0.67 & 0.78 & - & - & - & 0.79 & 0.74 & 0.76 \\
+partofspeech & Unigrams & 16165 & TF-IDF & 0.89 & 0.63 & 0.76 & - & - & - & 0.81 & 0.78 & 0.79 \\
+position & Bigrams & 16165 & TF-IDF & 0.89 & 0.59 & 0.74 & - & - & - & 0.79 & 0.69 & 0.74 \\
+position & Unigrams & 16165 & TF-IDF & 0.91 & 0.61 & 0.76 & - & - & - & 0.81 & 0.71 & 0.76 \\
+verbs & Unigrams & maximum & TF-IDF & 0.64 & 0.57 & 0.6 & - & - & - & 0.62 & 0.66 & 0.64 \\
+\hline
+\end{tabular}
+\caption{3-fold cross validation results on movie dataset. Values repesent positive, negative, or overall accuracy.}
+\end{figure*}
+
+%-------------------------------------------------------------------------
+\subsection{Appendix B}
+
+\begin{figure*}
+\begin{tabular}{{|l}*{8}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{6}{|c|}{Naive Bayes} \\
+\hline
+Domain & Features & \# of features & Frequency & ***** & **** & *** & ** & * & score \\
+\hline
+default & Unigrams & 16165 & Frequency & 0.72 & 0.68 & 0.53 & 0.34 & 0.24 & 0.74 \\
+default & Unigrams & 16165 & Presence & 0.49 & 0.41 & 0.24 & 0.14 & 0.08 & 0.71 \\
+default & Bigrams & 16165 & Presence & 0.50 & 0.42 & 0.26 & 0.13 & 0.10 & 0.70 \\
+position & Unigrams & 16165 & Presence & 0.35 & 0.29 & 0.14 & 0.08 & 0.04 & 0.65 \\
+partofspeech & Unigrams & 16165 & Presence & 0.45 & 0.37 & 0.20 & 0.11 & 0.06 & 0.69 \\
+adjectives & Unigrams & 16165 & Presence & 0.76 & 0.73 & 0.61 & 0.45 & 0.36 & 0.70 \\
+verbs & Unigrams & 16165 & Presence & 0.44 & 0.43 & 0.41 & 0.37 & 0.32 & 0.56 \\
+default & Unigrams & maximum & Presence & 0.59 & 0.55 & 0.36 & 0.23 & 0.15 & 0.72 \\
+position & Unigrams & maximum & Presence & 0.54 & 0.50 & 0.33 & 0.22 & 0.14 & 0.70 \\
+partofspeech & Unigrams & maximum & Presence & 0.56 & 0.52 & 0.35 & 0.22 & 0.14 & 0.71 \\
+adjectives & Unigrams & maximum & Presence & 0.76 & 0.73 & 0.61 & 0.45 & 0.36 & 0.70 \\
+verbs & Unigrams & maximum & Presence & 0.44 & 0.43 & 0.41 & 0.37 & 0.32 & 0.56 \\
+\hline
+\end{tabular}
+\caption{Test results on Yelp dataset with Naive Bayes classifier. Values repesent percent of reviews classified as positive for a given star rating.}
+\end{figure*}
+
+\begin{figure*}
+\begin{tabular}{{|l}*{20}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{6}{|c|}{MaxEnt}\\
+\hline
+Domain & Features & \# of features & Frequency & ***** & **** & *** & ** & * & score \\
+\hline
+default & Unigrams & 16165 & Frequency & - & - & - & - & - & - \\
+default & Unigrams & 16165 & Presence & 0.61 & 0.57 & 0.39 & 0.23 & 0.11 & 0.75 \\
+default & Bigrams & 16165 & Presence & 0.63 & 0.59 & 0.45 & 0.28 & 0.26 & 0.68 \\
+position & Unigrams & 16165 & Presence & 0.46 & 0.43 & 0.28 & 0.17 & 0.11 & 0.67 \\
+partofspeech & Unigrams & 16165 & Presence & 0.55 & 0.50 & 0.32 & 0.20 & 0.10 & 0.72 \\
+adjectives & Unigrams & 16165 & Presence & 0.75 & 0.72 & 0.62 & 0.45 & 0.37 & 0.69 \\
+verbs & Unigrams & 16165 & Presence & 0.43 & 0.41 & 0.38 & 0.34 & 0.30 & 0.56 \\
+default & Unigrams & maximum & Presence & 0.59 & 0.54 & 0.36 & 0.20 & 0.11 & 0.74 \\
+position & Unigrams & maximum & Presence & 0.44 & 0.40 & 0.26 & 0.15 & 0.09 & 0.68 \\
+partofspeech & Unigrams & maximum & Presence & 0.52 & 0.47 & 0.30 & 0.18 & 0.09 & 0.72 \\
+adjectives & Unigrams & maximum & Presence & 0.75 & 0.72 & 0.62 & 0.45 & 0.37 & 0.69 \\
+verbs & Unigrams & maximum & Presence & 0.43 & 0.41 & 0.38 & 0.34 & 0.30 & 0.56 \\
+\hline
+\end{tabular}
+\caption{Test results on Yelp dataset with Maximum Entropy classifier. Values repesent percent of reviews classified as positive for a given star rating.}
+\end{figure*}
+
+\begin{figure*}
+\begin{tabular}{{|l}*{20}{|c}|r|}
+\hline
+\multicolumn{4}{|c|}{Test configurations} & \multicolumn{6}{|c|}{SVM}\\
+\hline
+Domain & Features & \# of features & Frequency & ***** & **** & *** & ** & * & score \\
+\hline
+default & Unigrams & 16165 & Frequency & 0.78 & 0.76 & 0.62 & 0.42 & 0.30 & 0.74 \\
+default & Unigrams & 16165 & Presence & 0.58 & 0.54 & 0.38 & 0.25 & 0.14 & 0.72 \\
+default & Bigrams & 16165 & Presence & 0.62 & 0.58 & 0.48 & 0.30 & 0.29 & 0.67 \\
+position & Unigrams & 16165 & Presence & 0.42 & 0.39 & 0.27 & 0.39 & 0.42 & 0.50 \\
+partofspeech & Unigrams & 16165 & Presence & 0.52 & 0.48 & 0.31 & 0.21 & 0.01 & 0.75 \\
+adjectives & Unigrams & 16165 & Presence & 0.71 & 0.71 & 0.61 & 0.46 & 0.37 & 0.67 \\
+verbs & Unigrams & 16165 & Presence & 0.45 & 0.45 & 0.42 & 0.38 & 0.32 & 0.57 \\
+default & Unigrams & maximum & Presence & - & - & - & - & - & - \\
+position & Unigrams & maximum & Presence & - & - & - & - & - & - \\
+partofspeech & Unigrams & maximum & Presence & - & - & - & - & - & - \\
+adjectives & Unigrams & maximum & Presence & 0.71 & 0.71 & 0.61 & 0.46 & 0.37 & 0.67 \\
+verbs & Unigrams & maximum & Presence & 0.45 & 0.45 & 0.42 & 0.38 & 0.32 & 0.57 \\
+\hline
+\end{tabular}
+\caption{Test results on Yelp dataset with SVM classifier. Values repesent percent of reviews classified as positive for a given star rating.}
+\end{figure*}
%-------------------------------------------------------------------------
\subsection{References}

0 comments on commit a2daf50

Please sign in to comment.