-
Notifications
You must be signed in to change notification settings - Fork 961
/
overview.html
174 lines (157 loc) · 8.3 KB
/
overview.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Author" content="Doug Cutting">
<meta name="Author" content="Ted Husted">
<meta name="GENERATOR" content="Mozilla/4.72 [en] (Win98; U) [Netscape]">
<title>Jakarta Lucene API Documentation</title>
</head>
<body>
<h1>Jakarta Lucene API Documentation</h1>
The <a href="http://jakarta.apache.org/lucene">Jakarta Lucene</a> API is divided into several
packages:
<ul>
<li>
<b><a href="org/apache/lucene/util/package-summary.html">com.lucene.util</a></b>
contains a few handy data structures, e.g., <a href="org/apache/lucene/util/BitVector.html">BitVector</a>
and <a href="org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>.</li>
<li>
<b><a href="org/apache/lucene/store/package-summary.html">com.lucene.store</a></b>
defines an abstract class for storing persistent data, the <a href="org/apache/lucene/store/Directory.html">Directory</a>,
a collection of named files written by an <a href="org/apache/lucene/store/OutputStream.html">OutputStream</a>
and read by an <a href="org/apache/lucene/store/InputStream.html">InputStream</a>.
Two implementations are provided, <a href="org/apache/lucene/store/FSDirectory.html">FSDirectory</a>,
which uses a file system directory to store files, and <a href="org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a>
which implements files as memory-resident data structures.</li>
<li>
<b><a href="org/apache/lucene/document/package-summary.html">com.lucene.document</a></b>
provides a simple <a href="org/apache/lucene/document/Document.html">Document</a>
class. A document is simply a set of named <a href="org/apache/lucene/document/Field.html">Field</a>'s,
whose values may be strings or instances of <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>.</li>
<li>
<b><a href="org/apache/lucene/analysis/package-summary.html">com.lucene.analysis</a></b>
defines an abstract <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>
API for converting text from a <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>
into a <a href="org/apache/lucene/analysis/TokenStream.html">TokenStream</a>,
an enumeration of <a href="org/apache/lucene/analysis/Token.html">Token</a>'s.
A TokenStream is composed by applying <a href="org/apache/lucene/analysis/TokenFilter.html">TokenFilter</a>'s
to the output of a <a href="org/apache/lucene/analysis/Tokenizer.html">Tokenizer</a>.
A few simple implemenations are provided, including <a href="org/apache/lucene/analysis/StopAnalyzer.html">StopAnalyzer</a>
and the grammar-based <a href="org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a>.</li>
<li>
<b><a href="org/apache/lucene/index/package-summary.html">com.lucene.index</a></b>
provides two primary classes: <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>,
which creates and adds documents to indices; and <a href="org/apache/lucene/index/IndexReader.html">IndexReader</a>,
which accesses the data in the index.</li>
<li>
<b><a href="org/apache/lucene/search/package-summary.html">com.lucene.search</a></b>
provides data structures to represent queries (<a href="org/apache/lucene/search/TermQuery.html">TermQuery</a>
for individual words, <a href="org/apache/lucene/search/PhraseQuery.html">PhraseQuery</a>
for phrases, and <a href="org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>
for boolean combinations of queries) and the abstract <a href="org/apache/lucene/search/Searcher.html">Searcher</a>
which turns queries into <a href="org/apache/lucene/search/Hits.html">Hits</a>.
<a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
implements search over a single IndexReader.</li>
<li>
<b><a href="org/apache/lucene/queryParser/package-summary.html">com.lucene.queryParser</a></b>
uses <a href="http://www.suntest.com/JavaCC/">JavaCC</a> to implement a
<a href="org/apache/lucene/queryParser/QueryParser.html">QueryParser</a>.</li>
</ul>
To use Lucene, an application should:
<ol>
<li>
Create <a href="org/apache/lucene/document/Document.html">Document</a>'s by
adding
<a href="org/apache/lucene/document/Field.html">Field</a>'s.</li>
<li>
Create an <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>
and add documents to to it with <a href="org/apache/lucene/index/IndexWriter.html#addDocument(com.lucene.document.Document)">addDocument()</a>;</li>
<li>
Call <a href="org/apache/lucene/queryParser/QueryParser.html#parse(java.lang.String)">QueryParser.parse()</a>
to build a query from a string; and</li>
<li>
Create an <a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
and pass the query to it's <a href="org/apache/lucene/search/Searcher.html#search(com.lucene.search.Query)">search()</a>
method.</li>
</ol>
Some simple examples of code which does this are:
<ul>
<li>
<a href="../demo/FileDocument.java">FileDocument.java</a> contains
code to create a Document for a file.</li>
<li>
<a href="../demo/IndexFiles.java">IndexFiles.java</a> creates an
index for all the files contained in a directory.</li>
<li>
<a href="../demo/DeleteFiles.java">DeleteFiles.java</a> deletes some
of these files from the index.</li>
<li>
<a href="../demo/SearchFiles.java">SearchFiles.java</a> prompts for
queries and searches an index.</li>
</ul>
To demonstrate these, try:
<blockquote><tt>F:\> <b>java demo.IndexFiles rec.food.recipes\soups</b></tt>
<br><tt>adding rec.food.recipes\soups\abalone-chowder</tt>
<br><tt> </tt>[ ... ]
<p><tt>F:\> <b>java demo.SearchFiles</b></tt>
<br><tt>Query: <b>chowder</b></tt>
<br><tt>Searching for: chowder</tt>
<br><tt>34 total matching documents</tt>
<br><tt>0. rec.food.recipes\soups\spam-chowder</tt>
<br><tt> </tt>[ ... thirty-four documents contain the word "chowder",
"spam-chowder" with the greatest density.]
<p><tt>Query: <b>path:chowder</b></tt>
<br><tt>Searching for: path:chowder</tt>
<br><tt>31 total matching documents</tt>
<br><tt>0. rec.food.recipes\soups\abalone-chowder</tt>
<br><tt> </tt>[ ... only thrity-one have "chowder" in the "path"
field. ]
<p><tt>Query: <b>path:"clam chowder"</b></tt>
<br><tt>Searching for: path:"clam chowder"</tt>
<br><tt>10 total matching documents</tt>
<br><tt>0. rec.food.recipes\soups\clam-chowder</tt>
<br><tt> </tt>[ ... only ten have "clam chowder" in the "path" field.
]
<p><tt>Query: <b>path:"clam chowder" AND manhattan</b></tt>
<br><tt>Searching for: +path:"clam chowder" +manhattan</tt>
<br><tt>2 total matching documents</tt>
<br><tt>0. rec.food.recipes\soups\clam-chowder</tt>
<br><tt> </tt>[ ... only two also have "manhattan" in the contents.
]
<br> [ Note: "+" and "-" are canonical, but "AND", "OR"
and "NOT" may be used. ]</blockquote>
The <a href="../demo/IndexHTML.java">IndexHtml</a> demo is more sophisticated.
It incrementally maintains an index of HTML files, adding new files as
they appear, deleting old files as they disappear and re-indexing files
as they change.
<blockquote><tt>F:\><b>java demo.IndexHTML -create java\jdk1.1.6\docs\relnotes</b></tt>
<br><tt>adding java/jdk1.1.6/docs/relnotes/SMICopyright.html</tt>
<br><tt> </tt>[ ... create an index containing all the relnotes ]
<p><tt>F:\><b>del java\jdk1.1.6\docs\relnotes\smicopyright.html</b></tt>
<p><tt>F:\><b>java demo.IndexHTML java\jdk1.1.6\docs\relnotes</b></tt>
<br><tt>deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html</tt></blockquote>
HTML indexes are searched using SUN's <a href="http://jserv.javasoft.com/products/webserver/index.html">JavaWebServer</a>
(JWS) and <a href="../demo/Search.jhtml">Search.jhtml</a>. To use
this:
<ul>
<li>
copy <tt>Search.html</tt> and <tt>Search.jhtml</tt> to JWS's <tt>public_html</tt>
directory;</li>
<li>
copy lucene.jar to JWS's lib directory;</li>
<li>
create and maintain your indexes with demo.IndexHTML in JWS's top-level
directory;</li>
<li>
launch JWS, with the <tt>demo</tt> directory on CLASSPATH (only one class
is actually needed);</li>
<li>
visit <a href="../demo/Search.html">Search.html</a>.</li>
</ul>
Note that indexes can be updated while searches are going on. <tt>Search.jhtml</tt>
will re-open the index when it is updated so that the latest version is
immediately available.
<br>
</body>
</html>