Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Latexed paper

  • Loading branch information...
commit 95d6d2ddcb80990e99a192550af06910d1844b90 1 parent 6367e9c
@pranjalv123 pranjalv123 authored
Showing with 124 additions and 356 deletions.
  1. +124 −356 egpaper_final.tex
480 egpaper_final.tex
@@ -61,367 +61,135 @@ \section{Previous Work}
In addition to replicating Pang’s work as closely as we could, we extended the work by exploring an additional dataset, additional preprocessing techniques, and combining classifiers. We tested how well classifiers trained on Pang’s dataset extended to reviews in another domain. Although Pang limited many of his tests to use only the 16165 most common ngrams, advanced processors have lifted this computational constraint, and so we additionally tested on all ngrams. We use a newer parameter estimation algorithm called Limited-Memory Variable Metric (L-BFGS) for maximum entropy classification. Pang used the Improved Iterative Scaling method. We also implemented and tested the effect of term frequency-inverse document frequency (TF-IDF) on classification results.
-All manuscripts must be in English.
-\subsection{Dual submission}
-By submitting a manuscript to CVPR, the authors assert that it has not been
-previously published in substantially similar form. Furthermore, no paper
-which contains significant overlap with the contributions of this paper
-either has been or will be submitted during the CVPR 2011 review period to
-{\bf either a journal} or any conference (including CVPR 2011) or any
-workshop (including CVPR2011 workshops)
- {\bf Note that
- this is consistent with CVPR2010 but a strengthening from some previous CVPR
- policy}. Papers violating this condition will be rejected and a list of violating authors may be included in the proceedings.
-If there are papers that may appear to the reviewers
-to violate this condition, then it is your responsibility to: (1)~cite
-these papers (preserving anonymity as described in Section 1.6 below),
-(2)~argue in the body of your paper why your CVPR paper is non-trivially
-different from these concurrent submissions, and (3)~include anonymized
-versions of those papers in the supplemental material.
-\subsection{Paper length}
-CVPR papers may be between 6 pages and 8 pages, with a \$100 per page added
-fee. Overlength papers will simply not be reviewed. This includes papers
-where the margins and formatting are deemed to have been significantly
-altered from those laid down by this style guide. Note that this
-\LaTeX\ guide already sets figure captions and references in a smaller font.
-The reason such papers will not be reviewed is that there is no provision for
-supervised revisions of manuscripts. The reviewing process cannot determine
-the suitability of the paper for presentation in eight pages if it is
-reviewed in eleven. If you submit 8 for review expect to pay the added page
-charges for them.
-\subsection{The ruler}
-The \LaTeX\ style defines a printed ruler which should be present in the
-version submitted for review. The ruler is provided in order that
-reviewers may comment on particular lines in the paper without
-circumlocution. If you are preparing a document using a non-\LaTeX\
-document preparation system, please arrange for an equivalent ruler to
-appear on the final output pages. The presence or absence of the ruler
-should not change the appearance of any other content on the page. The
-camera ready copy should not contain a ruler. (\LaTeX\ users may uncomment
-the \verb'\cvprfinalcopy' command in the document preamble.) Reviewers:
-note that the ruler measurements do not align well with lines in the paper
---- this turns out to be very difficult to do well when the paper contains
-many figures and equations, and, when done, looks ugly. Just use fractional
-references (e.g.\ this line is $095.5$), although in most cases one would
-expect that the approximate location will be adequate.
-Please number all of your sections and displayed equations. It is
-important for readers to be able to refer to any particular equation. Just
-because you didn't refer to it in the text doesn't mean some future reader
-might not need to refer to it. It is cumbersome to have to use
-circumlocutions like ``the equation second from the top of page 3 column
-1''. (Note that the ruler will not be present in the final copy, so is not
-an alternative to equation numbers). All authors will benefit from reading
-Mermin's description of how to write mathematics.%: \url{}.
-\subsection{Blind review}
-Many authors misunderstand the concept of anonymizing for blind
-review. Blind review does not mean that one must remove
-citations to one's own work---in fact it is often impossible to
-review a paper unless the previous citations are known and
-Blind review means that you do not use the words ``my'' or ``our''
-when citing previous work. That is all. (But see below for
-Saying ``this builds on the work of Lucy Smith [1]'' does not say
-that you are Lucy Smith, it says that you are building on her
-work. If you are Smith and Jones, do not say ``as we show in
-[7]'', say ``as Smith and Jones show in [7]'' and at the end of the
-paper, include reference 7 as you would any other cited work.
-An example of a bad paper just asking to be rejected:
- An analysis of the frobnicatable foo filter.
- In this paper we present a performance analysis of our
- previous paper [1], and show it to be inferior to all
- previously known methods. Why the previous paper was
- accepted without this analysis is beyond me.
- [1] Removed for blind review
-An example of an acceptable paper:
- An analysis of the frobnicatable foo filter.
- In this paper we present a performance analysis of the
- paper of Smith \etal [1], and show it to be inferior to
- all previously known methods. Why the previous paper
- was accepted without this analysis is beyond me.
- [1] Smith, L and Jones, C. ``The frobnicatable foo
- filter, a fundamental contribution to human knowledge''.
- Nature 381(12), 1-213.
-If you are making a submission to another conference at the same time,
-which covers similar or overlapping material, you may need to refer to that
-submission in order to explain the differences, just as you would if you
-had previously published related work. In such cases, include the
-anonymized parallel submission~\cite{Authors11} as additional material and
-cite it as
-[1] Authors. ``The frobnicatable foo filter'', F\&G 2011 Submission ID 324,
-Supplied as additional material {\tt fg324.pdf}.
-Finally, you may feel you need to tell the reader that more details can be
-found elsewhere, and refer them to a technical report. For conference
-submissions, the paper must stand on its own, and not {\em require} the
-reviewer to go to a techreport for further details. Thus, you may say in
-the body of the paper ``further details may be found
-in~\cite{Authors11b}''. Then submit the techreport as additional material.
-Again, you may not assume the reviewers will read this material.
-Sometimes your paper is about a problem which you tested using a tool which
-is widely known to be restricted to a single institution. For example,
-let's say it's 1969, you have solved a key problem on the Apollo lander,
-and you believe that the CVPR11 audience would like to hear about your
-solution. The work is a development of your celebrated 1968 paper entitled
-``Zero-g frobnication: How being the only people in the world with access to
-the Apollo lander source code makes us a wow at parties'', by Zeus \etal.
-You can handle this paper like any other. Don't write ``We show how to
-improve our previous work [Anonymous, 1968]. This time we tested the
-algorithm on a lunar lander [name of lander removed for blind review]''.
-That would be silly, and would immediately identify the authors. Instead
-write the following:
- We describe a system for zero-g frobnication. This
- system is new because it handles the following cases:
- A, B. Previous systems [Zeus et al. 1968] didn't
- handle case B properly. Ours handles it by including
- a foo term in the bar integral.
- ...
- The proposed system was integrated with the Apollo
- lunar lander, and went all the way to the moon, don't
- you know. It displayed the following behaviours
- which show how well we solved cases A and B: ...
-As you can see, the above text follows standard scientific convention,
-reads better than the first version, and does not explicitly name you as
-the authors. A reviewer might think it likely that the new paper was
-written by Zeus \etal, but cannot make any decision based on that guess.
-He or she would have to be sure that no other authors could have been
-contracted to solve problem B.
-FAQ: Are acknowledgements OK? No. Leave them for the final copy.
-\fbox{\rule{0pt}{2in} \rule{0.9\linewidth}{0pt}}
- %\includegraphics[width=0.8\linewidth]{egfigure.eps}
- \caption{Example of caption. It is set in Roman so that mathematics
- (always set in Roman: $B \sin A = A \sin B$) may be included without an
- ugly clash.}
-Compare the following:\\
- \verb'$conf_a$' & $conf_a$ \\
- \verb'$\mathit{conf}_a$' & $\mathit{conf}_a$
-See The \TeX book, p165.
-The space after \eg, meaning ``for example'', should not be a
-sentence-ending space. So \eg is correct, {\em e.g.} is not. The provided
-\verb'\eg' macro takes care of this.
-When citing a multi-author paper, you may save space by using ``et alia'',
-shortened to ``\etal'' (not ``{\em et.\ al.}'' as ``{\em et}'' is a complete word.)
-However, use it only when there are three or more authors. Thus, the
-following is correct: ``
- Frobnication has been trendy lately.
- It was introduced by Alpher~\cite{Alpher02}, and subsequently developed by
- Alpher and Fotheringham-Smythe~\cite{Alpher03}, and Alpher \etal~\cite{Alpher04}.''
-This is incorrect: ``... subsequently developed by Alpher \etal~\cite{Alpher03} ...''
-because reference~\cite{Alpher03} has just two authors. If you use the
-\verb'\etal' macro provided, then you need not worry about double periods
-when used at the end of a sentence as in Alpher \etal.
-For this citation style, keep multiple citations in numerical (not
-chronological) order, so prefer \cite{Alpher03,Alpher02,Authors06} to
-\fbox{\rule{0pt}{2in} \rule{.9\linewidth}{0pt}}
- \caption{Example of a short caption, which should be centered.}
-\section{Formatting your paper}
-All text must be in a two-column format. The total allowable width of the
-text area is $6\frac78$ inches (17.5 cm) wide by $8\frac78$ inches (22.54
-cm) high. Columns are to be $3\frac14$ inches (8.25 cm) wide, with a
-$\frac{5}{16}$ inch (0.8 cm) space between them. The main title (on the
-first page) should begin 1.0 inch (2.54 cm) from the top edge of the
-page. The second and following pages should begin 1.0 inch (2.54 cm) from
-the top edge. On all pages, the bottom margin should be 1-1/8 inches (2.86
-cm) from the bottom edge of the page for $8.5 \times 11$-inch paper; for A4
-paper, approximately 1-5/8 inches (4.13 cm) from the bottom edge of the
-\subsection{Margins and page numbering}
-All printed material, including text, illustrations, and charts, must be
-kept within a print area 6-7/8 inches (17.5 cm) wide by 8-7/8 inches
-(22.54 cm) high.
-\subsection{Type-style and fonts}
-Wherever Times is specified, Times Roman may also be used. If neither is
-available on your word processor, please use the font closest in
-appearance to Times to which you have access.
-MAIN TITLE. Center the title 1-3/8 inches (3.49 cm) from the top edge of
-the first page. The title should be in Times 14-point, boldface type.
-Capitalize the first letter of nouns, pronouns, verbs, adjectives, and
-adverbs; do not capitalize articles, coordinate conjunctions, or
-prepositions (unless the title begins with such a word). Leave two blank
-lines after the title.
-AUTHOR NAME(s) and AFFILIATION(s) are to be centered beneath the title
-and printed in Times 12-point, non-boldface type. This information is to
-be followed by two blank lines.
-The ABSTRACT and MAIN TEXT are to be in a two-column format.
-MAIN TEXT. Type main text in 10-point Times, single-spaced. Do NOT use
-double-spacing. All paragraphs should be indented 1 pica (approx. 1/6
-inch or 0.422 cm). Make sure your text is fully justified---that is,
-flush left and flush right. Please do not place any additional blank
-lines between paragraphs.
-Figure and table captions should be 9-point Roman type as in
-Figures~\ref{fig:onecol} and~\ref{fig:short}. Short captions should be centred.
-\noindent Callouts should be 9-point Helvetica, non-boldface type.
-Initially capitalize only the first word of section titles and first-,
-second-, and third-order headings.
-FIRST-ORDER HEADINGS. (For example, {\large \bf 1. Introduction})
-should be Times 12-point boldface, initially capitalized, flush left,
-with one blank line before.
-SECOND-ORDER HEADINGS. (For example, { \bf 1.1. Database elements})
-should be Times 11-point boldface, initially capitalized, flush left,
-with one blank line before, and one after. If you require a third-order
-heading (we discourage it), use 10-point Times, boldface, initially
-capitalized, flush left, preceded by one blank line, followed by a period
-and your text on the same line.
-Please use footnotes\footnote {This is what a footnote looks like. It
-often distracts the reader from the main flow of the argument.} sparingly.
-Indeed, try to avoid footnotes altogether and include necessary peripheral
-observations in
-the text (within parentheses, if you prefer, as in this sentence). If you
-wish to use a footnote, place it at the bottom of the column on the page on
-which it is referenced. Use Times 8-point type, single-spaced.
-List and number all bibliographical references in 9-point Times,
-single-spaced, at the end of your paper. When referenced in the text,
-enclose the citation number in square brackets, for
-example~\cite{Authors06}. Where appropriate, include the name(s) of
-editors of referenced books.
-Method & Frobnability \\
-Theirs & Frumpy \\
-Yours & Frobbly \\
-Ours & Makes one's heart Frob\\
-\caption{Results. Ours is better.}
-\subsection{Illustrations, graphs, and photographs}
-All graphics should be centered. Please ensure that any point you wish to
-make is resolvable in a printed copy of the paper. Resize fonts in figures
-to match the font in the body text, and choose line widths which render
-effectively in print. Many readers (and reviewers), even of an electronic
-copy, will choose to print your paper in order to read it. You cannot
-insist that they do otherwise, and therefore must not assume that they can
-zoom in to see tiny details on a graphic.
-When placing figures in \LaTeX, it's almost always best to use
-\verb+\includegraphics+, and to specify the figure width as a multiple of
-the line width as in the example below
- \usepackage[dvips]{graphicx} ...
- \includegraphics[width=0.8\linewidth]
- {myfile.eps}
+\section{The User Review Domain}
+For our experiments, we worked with movie reviews. Our data source was Pang’s released dataset ( from their 2004 publication. The dataset contains 1000 positive reviews and 1000 negative reviews, each labeled with their true sentiment. The original data source was the Internet Movie Database (IMDb).
+Pang applied the bag-of-words method to positive and negative sentiment classification, but the same method can be extended to various other domains, including topic classification. We additionally chose to work with a set of 5000 Yelp reviews, 1000 for each of their five “star” rating. Yelp is a popular online urban city guide that houses reviews of restaurants, shopping areas, and businesses. Although a movie review and a Yelp review will differ in specialized vocabulary, audience, tone, etc., the ways that people convey sentiment (e.g. I loved it!) may not differ entirely. We wished to explore how training classifiers in one domain might generalize to neighbor domains.
+The domain of reviews is experimentally convenient because there are largely available on-line and because reviewers often summarize their overall sentiment with a machine-extractable rating indicator; hence, there was no need for hand-labeling of data.
+\section{Machine Learning Methods}
+\subsection{The Naive Bayes Classifier}
+The Naive Bayes classifier is an extremely simple classifier that relies on Bayesian probability and the assumption that feature probabilities are independent of one another.
+Baye's Rule gives:
+P(C | F_1, F_2, \ldots, F_n)
+= \frac{P(C)P(F_1, F_2, \ldots, F_n | C)}{P(F_1, F_2, \ldots, F_n)} \\
+Simplifying the numerator gives:
+$$P(C)P(F_1, F_2, \ldots, F_n | C)\\$$
+$$= P(C)P(F_1 | C)P( F_2, F_3, \ldots, F_n| C, F_1) \\$$
+$$= P(C)P(F_1 | C)P(F_2 | C, F_1)P(F_3, F_4, \ldots, F_n | C, F_1, F_2) \\$$
+Then, assuming the probabilities are independent gives
+$$P(F_i | F_j\ldots F_k) = F(F_i)$$
+$$P(F_i | C, F_j\ldots F_k) = P(F_i | C)$$
+$$P(C | F_1\ldots F_n) = P(C) [\prod_{i=0}^n P(F_i | C) ]$$
+$P(Fi | C)$ is estimated through plus-one smoothing on a labeled training set, that is:
+$$\frac{(1+count(C, F_i))}{\sum_i count(C_j, F_i))}$$
+where $count(C, F_j)$ is the number of times that $F_j$ appears over all training documents in class $C$.
+The class a feature vector belongs to is given by
+$$C^* = \operatorname*{arg\,max}_C P(C | F_1...F_n)$$
+Taking the logarithm of both sides gives
+$$C^* = \operatorname*{arg\,max}_C (P(C) + \sum_i [F_i (\lg count (C, F_i)$$
+$$ - \lg (\sum_j count C_j, F_i))])$$
+While the Naive Bayes classifier seems very simple, it is observed to have high predictive power; in our tests, it performed competitively with the more sophisticated classifiers we used. The Bayes classifier can also be implemented very efficiently. Its independence assumption means that it does not fall prey to the curse of dimensionality, and its running time is linear in the size of the input.
+\subsection{The Maximum Entropy Classifier}
+Maximum Entropy is a general-purpose machine learning technique that provides the least biased estimate possible based on the given information. In other words, “it is maximally noncommittal with regards to missing information” [src]. Importantly, it makes no conditional independence assumption between features, as the Naive Bayes classifier does.
+Maximum entropy’s estimate of $P(c|d)$ takes the following exponential form:
+$$P(c|d) = \frac{1}{Z(d)} \exp(\sum_i(\lambda_{i,c} F_{i,c}(d,c)))$$
+The $\lambda_{i,c}$’s are feature-weigh parameters, where a large $\lambda_{i,c}$ means that $f_i$ is considered a strong indicator for class $c$. We use 30 iterations of the Limited-Memory Variable Metric (L-BFGS) parameter estimation. Pang used the Improved Iterative Scaling (IIS) method, but L-BFGS, a method that was invented after their paper was published, was found to out-perform both IIS and generalized iterative scaling (GIS), yet another parameter estimation method.
+We used Zhang Le’s (2004) Package Maximum Entropy Modeling Toolkit for Python and C++ [link] [src], with no special configuration.
+\subsection{The Support Vector Machine Classifier}
+Support Vector Machines (SVMs) operate by separating points in a d-dimensional space using a (d-1)-dimensional hyperplane, unlike Max-Ent and Naive Bayes classifiers, which use probabilistic measures to classify points. Given a set of training data, the SVM classifier finds a hyperplane with the largest possible margin; that is, it tries finds the hyperplane such that each training point is correctly classified and the hyperplane is as far as possible from the points closest to it. In practice, it is usually not possible to find a hyperplane that separates the classes perfectly, so points are permitted to be inside the margin or on the wrong side of the hyperplane. Any point on or inside the margin is referred to as a support vector, and the hyperplane, given by
+$$f(\vec{B}, B_0) = \{\vec{x} | \vec{x}^T \cdot \vec{B} + B_0 = 0\}$$
+is selected through a constrained quadratic optimization to minimize
+$$ \frac{1}{2} |\vec{B}|^2 + C\sum_i \zeta_i$$
+$$\forall i, \zeta_i \ge 0$$
+$$\forall i, y_i (\vec{x}_i^T \cdot \vec{B} + B0) \ge 1 - \zeta_i $$
+For this paper, we use the PyML implementation of SVMs, which uses the liblinear optimizer to actually find the separating hyperplane. Of the three classifiers, this was the slowest to train, as it suffers from the curse of dimensionalit
+\section{Experimental Setup}
+We used documents from the movie review dataset and ran 3-fold cross validation in a number of test configurations. We ignored case and treated punctuation marks as separate lexical items.
+Our testbed supported testing various parameters: frequency vs. presence of features vs. term frequency-inverse document frequency, unigrams vs. bigrams vs. both, number of features, and type of feature tagging. The types of feature tagging were negation, part of speech (POS), and position. We additionally supported training and testing on only adjectives and verbs. We additionally supported the ability to use the full movie dataset as a training set and using the yelp dataset as a test set.
+\subsection{Feature Counting Method}
+There are several ways to construct a probability model for a set of document n-grams. The most obvious is to use feature frequency. The value of a feature in a given document is simply the number of times it appears in that document.
+As a whole (across all other parameters), training on presence rather than frequency performed on average 5.5\% better for Naive Bayes, ranging from 0\% to 10\% improvement, with no particular outliers in other test configurations, from 73.1\% accuracy with frequency to 78.5\% accuracy with presence. There was no significant difference for SVMs and applying TF-IDF did not provide any improvement from using frequency for either. Both of these comparisons do not apply to Maximum Entropy.
+Interestingly, for Naive Bayes, the positive and negative tests performed very differently between presence and frequency tests. Excluding verb tests, which did not exhibit this disparity, positive tests averaged 6.5\% worse (up to 12\% worse in the case) while negative tests averaged 18.9\% better (up to 30\% better) -- with an average aggregate difference of 25.4\% between positive and negative results. By comparison, SVMs exhibited an average aggregate difference of 0.7\%. These results provide evidence that training on presence rather than frequency yields models with less bias.
+\subsection{Conditional Independence Assumption}
+The Bayes classifier depends on a conditional independence assumption, meaning that the model it predicts assumes that the probability of a given word is independent of the other words. Clearly, this assumption does not hold. Nevertheless, the Bayes classifier functions well, in part because the positive and negative correlations between features tend to cancel each other out [].
+We found a huge difference between results of Naive Bayes and Maximum Entropy for positive testing accuracy and negative testing accuracy. Maximum Entropy, which makes no unfounded assumptions about the data, gave very similar results for positive tests and negative tests with a 0.2\% difference on average. On the other hand, positive and negative results from Naive Bayes, which assumes conditional independence, varies by 27.5\% on average, with the worst cases performing on test configurations using frequency, averaging 40\% difference. These disparities suggest evidence that the movie dataset does not satisfy the conditional independence assumption.
+\subsection{Number of Features}
+One key decision in a bag-of-words feature set is which words to include. Using more words provides more information, but harms the performance of the classifiers, and words that appear only infrequently in the training data may not present accurate information due to the law of small numbers. We examine results with the entire training data, as well as with only the top 16165 and 2633 unigrams and bigrams.
+Using the most frequent unigrams is an extremely simple method of feature selection, and in this case, not a particularly robust one, since feature selection should look for words that identify a given class. Choosing frequent words does not discriminate between the two classes and will select common words like ``the'' and ``it'', which likely are weak sentiment indicators. On the other hand, uncommon words that only appear in a handful or less of reviews will not contribute much to sentiment indication. However, Pang’s motivation for limiting the number of features was for improve testing performance, but our classifiers and processors were fast enough that this was not particularly noticeable.
+On average, limiting the number of features from 16165 to 2633, as in the original Pang paper, caused accuracy to drop by 5.2\%, 4.0\%, and 2.8\% for Naive Bayes, Maximum Entropy, and SVM, respectively. These results indicate that valuable sentiment information was lost in the restriction of features.
+However, when restricting from all features down to 16165, the results were a wash. Naive Bayes did vaguely worse, Maximum Entropy remained unchanged, and SVMs did vaguely better. These results suggest that uncommon features do not carry much sentiment information. Additionally, this validated Pang’s use of limited features, as they did not significantly impact the results but satisfied their performance constraints.
+\subsection{Negation Tagging}
+In an effort to preserve the potential value of negation information while using dead-simple features, we tagged words between those expressing negation and the next punctuation mark with a postfix ``\_NOT.'' This distinguishes sentences like ``That movie was very good'' and ``That movie was not very good.'' Diverging from Pang, we also added negation tags to bigrams.
+Negation tagging did not appear to have a significant effect on the data. For all the classifiers, the results from negation tagged data were almost the same as the results from the raw data. Nevertheless, we used negation tagging for the remainder of the tests, as it did not seem to hurt performance or accuracy.
+The ineffectiveness of negation tagging probably comes from a few sources. First, it increases the number of uncommon features, which, as discussed previously, harms effectiveness and cancels out the increase in semantic awareness. Second, the presence of a “not” does not always indicate negation. Rather, it is often used idiomatically, as in the example fragment ``with his distinctive, more often than not ingenious dialogue''. Finally, the method of tagging all words up to the next punctuation mark is suspect. Only a few words after the not are actually negated, and these often occur after a comma or other punctuation mark.
+\subsection{Position Tagging}
+Reviews are split into a beginning, middle, and end, so to see if one section carries more sentiment than another, we split the reviews into a first quarter, a middle half, and a last quarter and tagged the words in each section.
+Position tagging was not helpful. For bigrams, it harmed performance by around 5\% in most cases, and for unigrams, it was not helpful. If reviews end up not actually following the model specified or if the model has no bearing on where the relevant data is, position tagging will be harmful because it increases the dimensionality of the input without increasing the information content.
+\subsection{Part of Speech Tagging}
+We appended POS tags to every word using Oliver Mason’s Qtag program [src]. This serves as a rough way to disambiguate words that may hold different meanings in different contexts. For example, it would distinguish the different uses of “love” in ``I love this movie'' versus ``This is a love story.'' However, it turns out that word disambiguation is a much more complicated problem, as POS says nothing to distinguish between the meaning of cold in ``I was a bit cold during the movie'' and ``The cold murderer chilled my heart.''
+Part of speech tagging was not very helpful for unigram results; in fact, the NB classifier did slightly worse with parts of speech tagged when using unigrams. However, when using bigrams, the MaxEnt and SVM classifiers did significantly better, achieving 3-4\% better accuracy with part of speech tagging when measuring frequency and presence information.
+Intuitively, adjectives like ``beautiful'', ``wonderful'', and ``great'' hold valuable sentiment information, so we trained our classifiers after filtering out only the adjectives within reviews. On average, adjective tests performed about 6\% worse than their unfiltered negation-tagged counterparts, with no notable difference between the 3 classifiers. These results suggest that the limited information conveyed in adjectives is not representative of the full review itself.
+As in the motivating example for the use of POS tagging, it was in the case of the verb use of ``love'' (``I love this movie'') that conveyed sentimental information, rather than the adjective use of the word. Interestingly, Pang did not include results for training only on verbs. Even more interestingly, despite the motivating example, verbs under-performed all other tests, while still being consistently better than random. The tests ranged from 60\% to 67\% accuracy, even sometimes doing worse than the 64\% accurate human-based classifier from Pang 2002. We suspect this is in part due to the sparsity of features when only using verbs, as there were on average 37.2 verbs and 55.7 adjectives per review.
+\subsection{Majority Voting}
+Given a large ensemble of classifiers, an easy way to combine them is with a simple majority voting scheme. This tends to eliminate weaknesses that exist in only one classifier, but can also eliminate strengths that exist in only one classifier.
+Majority voting in some cases provided a small but significant improvement over the classifiers alone; combining Bayes, MaxEnt, and SVM classifiers over the same data provided a three to four percent boost over the best of the individual classifiers alone.
-Color is valuable, and will be visible to readers of the electronic copy.
-However ensure that, when printed on a monochrome printer, no important
-information is lost by the conversion to grayscale.
+\subsection{Neighboring Domain Data}
+Mostly out of curiosity, we wanted to see how our test configurations will perform when training on the movie dataset and testing on the Yelp dataset, an external out-of-domain dataset. We preprocessed the Yelp dataset such that it matched the format of the movie dataset and selected 1000 of each of the 1-5 star rating reviews. For evaluation purposes, we scored the accuracy on only 1-star and 5-star reviews, giving our testbed only high-confidence negative and positive reviews, respectively. The score was simply the average of the two accuracies.
-\section{Final copy}
+Across the board, the classifiers has a harder time with the Yelp dataset as compared to the movie dataset, performing between 56.0\% and 75.2\%. The respective lowest and highest performing configurations scored at 67.0\% and 84.0\% on the movie dataset.
-You must include your signed IEEE copyright release form when you submit
-your finished paper. We MUST have this form before your paper can be
-published in the proceedings.
+We expected to see worse results, given the difference in vocabulary, subject matter, tone, etc., but all configurations performed better than random. We also saw strong positive trends across all test configurations, classifying reviews with more stars more positively.
Please sign in to comment.
Something went wrong with that request. Please try again.