Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 94 lines (82 sloc) 4.122 kb
bf01a7d @pranjalv123 Added web version of paper
pranjalv123 authored
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2
3 <!--Converted with LaTeX2HTML 2008 (1.71)
4 original version by: Nikos Drakos, CBLU, University of Leeds
5 * revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan
6 * with significant contributions from:
7 Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
8 <HTML>
9 <HEAD>
10 <TITLE>Feature Counting Method</TITLE>
11 <META NAME="description" CONTENT="Feature Counting Method">
12 <META NAME="keywords" CONTENT="egpaper_final">
13 <META NAME="resource-type" CONTENT="document">
14 <META NAME="distribution" CONTENT="global">
15
16 <META NAME="Generator" CONTENT="LaTeX2HTML v2008">
17 <META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
18
19 <LINK REL="STYLESHEET" HREF="egpaper_final.css">
20
21 <LINK REL="next" HREF="node11.html">
22 <LINK REL="previous" HREF="node9.html">
23 <LINK REL="up" HREF="node9.html">
24 <LINK REL="next" HREF="node11.html">
25 </HEAD>
26
27 <BODY >
28
29 <DIV CLASS="navigation"><!--Navigation Panel-->
30 <A NAME="tex2html138"
31 HREF="node11.html">
32 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
33 SRC="/usr/share/latex2html/icons/next.png"></A>
34 <A NAME="tex2html136"
35 HREF="node9.html">
36 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
37 SRC="/usr/share/latex2html/icons/up.png"></A>
38 <A NAME="tex2html130"
39 HREF="node9.html">
40 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
41 SRC="/usr/share/latex2html/icons/prev.png"></A>
42 <BR>
43 <B> Next:</B> <A NAME="tex2html139"
44 HREF="node11.html">Conditional Independence Assumption</A>
45 <B> Up:</B> <A NAME="tex2html137"
46 HREF="node9.html">Results</A>
47 <B> Previous:</B> <A NAME="tex2html131"
48 HREF="node9.html">Results</A>
49 <BR>
50 <BR></DIV>
51 <!--End of Navigation Panel-->
52
53 <H2><A NAME="SECTION00061000000000000000">
54 Feature Counting Method</A>
55 </H2>
56 There are several ways to construct a probability model for a set of document n-grams. The most obvious is to use feature frequency. The value of a feature in a given document is simply the number of times it appears in that document. Presence, on the other hand, attributes a value of 1 if a feature exists in a document and 0 otherwise.
57
58 <P>
59 As a whole (across all other parameters), training on presence rather than frequency performed on average 5.5% better for Naive Bayes, ranging from 0% to 10% improvement, with no particular outliers in other test configurations, from 73.1% accuracy with frequency to 78.5% accuracy with presence. There was no significant difference for SVMs and applying TF-IDF did not provide any improvement from using frequency for either. Both of these comparisons do not apply to Maximum Entropy.
60
61 <P>
62 Interestingly, for Naive Bayes, the positive and negative tests performed very differently between presence and frequency tests. Excluding verb tests, which did not exhibit this disparity, positive tests averaged 6.5% worse (up to 12% worse in the case) on presence tests while negative tests averaged 18.9% better (up to 30% better). There was an average aggregate difference of 25.4% between positive and negative results. By comparison, SVMs exhibited an average aggregate difference of 0.7%. These results provide evidence that training on presence rather than frequency yields models with less bias.
63
64 <P>
65
66 <DIV CLASS="navigation"><HR>
67 <!--Navigation Panel-->
68 <A NAME="tex2html138"
69 HREF="node11.html">
70 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
71 SRC="/usr/share/latex2html/icons/next.png"></A>
72 <A NAME="tex2html136"
73 HREF="node9.html">
74 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
75 SRC="/usr/share/latex2html/icons/up.png"></A>
76 <A NAME="tex2html130"
77 HREF="node9.html">
78 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
79 SRC="/usr/share/latex2html/icons/prev.png"></A>
80 <BR>
81 <B> Next:</B> <A NAME="tex2html139"
82 HREF="node11.html">Conditional Independence Assumption</A>
83 <B> Up:</B> <A NAME="tex2html137"
84 HREF="node9.html">Results</A>
85 <B> Previous:</B> <A NAME="tex2html131"
86 HREF="node9.html">Results</A></DIV>
87 <!--End of Navigation Panel-->
88 <ADDRESS>
89 Pranjal Vachaspati
90 2012-02-05
91 </ADDRESS>
92 </BODY>
93 </HTML>
Something went wrong with that request. Please try again.