Skip to content

Commit

Permalink
LUCENE-7438: Renovate benchmark module's support for highlighting
Browse files Browse the repository at this point in the history
  • Loading branch information
dsmiley committed Oct 7, 2016
1 parent 6aa28bd commit 5ef60af
Show file tree
Hide file tree
Showing 20 changed files with 351 additions and 700 deletions.
2 changes: 2 additions & 0 deletions build.xml
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@
// excludes:
exclude(name: '**/build/**')
exclude(name: '**/dist/**')
exclude(name: 'lucene/benchmark/work/**')
exclude(name: 'lucene/benchmark/temp/**')
exclude(name: '**/CheckLoggingConfiguration.java')
exclude(name: 'build.xml') // ourselves :-)
}
Expand Down
3 changes: 3 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ Other
* LUCENE-7452: Block join query exception suggests how to find a doc, which
violates orthogonality requirement. (Mikhail Khludnev)

* LUCENE-7438: Renovate the Benchmark module's support for benchmarking highlighting. All
highlighters are supported via SearchTravRetHighlight. (David Smiley)

Build

* LUCENE-7292: Fix build to use "--release 8" instead of "-release 8" on
Expand Down
4 changes: 2 additions & 2 deletions lucene/benchmark/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
temp/
work/
/temp
/work
11 changes: 7 additions & 4 deletions lucene/benchmark/README.enwiki
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,13 @@ writing, there is a page file in
http://download.wikimedia.org/enwiki/20070402/. You can download this
file manually and put it in temp. Note that the file you download will
probably have the date in the name, e.g.,
http://download.wikimedia.org/enwiki/20070402/enwiki-20070402-pages-articles.xml.bz2. When
you put it in temp, rename it to enwiki-latest-pages-articles.xml.bz2.
http://download.wikimedia.org/enwiki/20070402/enwiki-20070402-pages-articles.xml.bz2.

If you use the EnwikiContentSource then the data will be decompressed on the fly
during the benchmark. If you want to benchmark indexing, you should probably decompress
it beforehand using the "enwiki" Ant target which will produce a work/enwiki.txt, after
which you can use LineDocSource in your benchmark.

After that, ant enwiki should process the data set and run a load
test. Ant targets get-enwiki, expand-enwiki, and extract-enwiki can
also be used to download, decompress, and extract (to individual files
test. Ant target enwiki will download, decompress, and extract (to individual files
in work/enwiki) the dataset, respectively.
80 changes: 0 additions & 80 deletions lucene/benchmark/conf/highlight-vs-vector-highlight.alg

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -14,55 +14,52 @@
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
# -------------------------------------------------------------------------------------
# multi val params are iterated by NewRound's, added to reports, start with column name.

ram.flush.mb=flush:32:32
compound=cmpnd:true:false
# For postings-offsets with light term-vectors

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
work.dir=work/enwikiPostings
ram.flush.mb=64
compound=false

doc.stored=true
doc.tokenized=true
# offsets in postings:
doc.body.offsets=true
# term vector, but no positions/offsets with it
doc.term.vector=true
doc.term.vector.offsets=true
doc.term.vector.positions=true
log.step=2000

docs.dir=reuters-out
content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
docs.file=temp/enwiki-20070527-pages-articles.xml.bz2

content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource
query.maker=org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker
file.query.maker.file=conf/query-phrases.txt
log.queries=false
log.step.SearchTravRetHighlight=-1

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker
highlighter=HlImpl:NONE:SH_A:UH_A:PH_P:UH_P:UH_PV

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# -------------------------------------------------------------------------------------
{ "Populate"
CreateIndex
{ "MAddDocs" AddDoc } : 20000
ForceMerge(1)
[{ "MAddDocs" AddDoc > : 50000] : 4
CloseIndex
}
{ "Rounds"
} : 0

ResetSystemSoft
{
"Rounds"

ResetSystemSoft

OpenReader
{ "SearchVecHlgtSameRdr" SearchTravRetVectorHighlight(maxFrags[10],fields[body]) > : 1000
OpenReader

CloseReader
{ "Warm" SearchTravRetHighlight > : 1000

RepSumByPref MAddDocs
{ "HL" SearchTravRetHighlight > : 500

NewRound
CloseReader

} : 4
NewRound
} : 6

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
RepSumByPrefRound HL
Original file line number Diff line number Diff line change
Expand Up @@ -14,55 +14,51 @@
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
# -------------------------------------------------------------------------------------
# multi val params are iterated by NewRound's, added to reports, start with column name.

ram.flush.mb=flush:32:32
compound=cmpnd:true:false
# This is a full-term vector configuration.

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
work.dir=work/enwikiTermVec
ram.flush.mb=64
compound=false

doc.stored=true
doc.tokenized=true
doc.term.vector=true
doc.term.vector.offsets=true
doc.term.vector.positions=true
log.step=2000

docs.dir=reuters-out
doc.term.vector.offsets=true

content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource
content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
docs.file=temp/enwiki-20070527-pages-articles.xml.bz2

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker
query.maker=org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker
file.query.maker.file=conf/query-terms.txt
log.queries=false
log.step.SearchTravRetHighlight=-1

# task at this depth or less would print when they start
task.max.depth.log=2
highlighter=HlImpl:NONE:SH_V:FVH_V:UH_V

log.queries=true
# -------------------------------------------------------------------------------------
{ "Populate"
CreateIndex
{ "MAddDocs" AddDoc } : 20000
ForceMerge(1)
[{ "MAddDocs" AddDoc > : 50000] : 4
CloseIndex
}
{ "Rounds"
} : 0

ResetSystemSoft
{
"Rounds"

ResetSystemSoft

OpenReader
{ "SearchHlgtSameRdr" SearchTravRetHighlight(maxFrags[10],fields[body]) > : 1000
OpenReader

CloseReader
{ "Warm" SearchTravRetHighlight > : 1000

RepSumByPref MAddDocs
{ "HL" SearchTravRetHighlight > : 500

NewRound
CloseReader

NewRound
} : 4

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
RepSumByPrefRound HL
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ log.queries=true
{ "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
CloseReader
OpenReader
{ "SearchHlgtSameRdr" SearchTravRetHighlight(size[10],highlight[10],mergeContiguous[true],maxFrags[3],fields[body]) > : 1000
{ "SearchHlgtSameRdr" SearchTravRetHighlight(type[UH]) > : 1000

CloseReader

Expand Down
10 changes: 10 additions & 0 deletions lucene/benchmark/conf/query-phrases.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"Abraham Lincoln"
"Union Wisconsin"
"court of law"
"Field Theory" OR "Set Theory"
"Top 100"
"red hot chili"
"greatest guitarists"
"Planes, Trains & Automobiles" OR ships
"international airport"
"Xbox 360"
10 changes: 10 additions & 0 deletions lucene/benchmark/conf/query-terms.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Abraham AND Lincoln
Union AND Wisconsin
court AND law
top AND 100
(field OR set) AND theory
red AND hot AND chili
greatest AND guitarists
(planes AND trains AND automobiles) OR ships
international AND airport
xbox AND 360
7 changes: 7 additions & 0 deletions lucene/benchmark/conf/query-wildcards.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
abrah* AND linc*
court* AND law*
(field OR set) AND theor*
red AND hot AND chili*
great* AND guitar*
(plan* AND train* AND automob*) OR ship*
international AND airport*
69 changes: 0 additions & 69 deletions lucene/benchmark/conf/standard-highlights-tv.alg

This file was deleted.

Loading

0 comments on commit 5ef60af

Please sign in to comment.