Terrier 4.0 mod.
Java Roff HTML Shell Batchfile GAP CSS
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
build
doc
etc
lib
licenses
share
src
LICENSE.txt
NOTES.txt
README-Terrier-4.0.txt
README.txt
TMPL.java
build.xml
models

README.txt

TTR

Rup Palchowdhury
rup.palchowdhury [at] gmail [dot] com

Terrier-4.0's README has been copied over to
README-Terrier-4.0.txt. Any rights, responsibilities and credits stem
from the contents of that file.

----------------------------------------------------------------------
DESCRIPTION

This is Terrier-4.0 with some additions and modification for doing IR
experiments using TREC data. The purpose of distributing this piece of
software is to augment Terrier with better documentation. See
NOTES.txt.

To run the commands described below you will need the sample TREC data
from: http://kak.tx0.org/IR/

----------------------------------------------------------------------
COMPILING

Type "ant" in the shell.

----------------------------------------------------------------------
INDEXING

bin/trec_terrier.sh -i                                \
		    -Dcollection.spec=filelist.txt    \
		    -Dterrier.index.path=ap/AP        \
		    -Dstopwords.filename=ap/ser17.txt \
		    -Dtermpipelines=Stop,SStemmer     \
		    -DTrecDocTags.doctag=DOC          \
		    -DTrecDocTags.idtag=DOCNO         \
		    -DTrecDocTags.process=            \
		    -DTrecDocTags.skip=		      \
		    -DTrecDocTags.casesensitive=false

filelist.txt - A file containing a list of paths pointing to files of
the corpus. This can be generated by typing this in the shell:

find -L corpus/* -type f >file.txt

ap/AP - This is a directory. In the sample test-collection ap.txt is
the only file in the corpus and it has been placed inside a directory
named 'AP' because the script expects a path to a directory to look
for a corpus in.

----------------------------------------------------------------------
RETRIEVAL

bin/trec_terrier.sh -r                                   \
		    -q                                   \
		    -c i                                 \
		    -Dterrier.index.path=ap/AP           \
		    -Dtrec.topics=ap/query.txt           \
		    -DTrecQueryTags.doctag=TOP           \
		    -DTrecQueryTags.idtag=NUM            \
		    -DTrecQueryTags.process=TOP,NUM,DESC \
		    -DTrecQueryTags.skip=TITLE,NARR      \
		    -DTrecQueryTags.casesensitive=false  \
		    -Dstopwords.filename=ap/ser17.txt    \
		    -Dtermpipelines=Stop,SStemmer        \
		    -Dtrec.model=TF_IDF                  \
		    -Dquerying.postprocesses.controls=qe:QueryExpansion            \
		    -Dquerying.postprocesses.order=QueryExpansion                  \
		    -Dtrec.qe.model=org.terrier.matching.models.queryexpansion.Bo1 \
 		    -Dexpansion.terms=10                 \
		    -Dexpansion.documents=3              \
		    -Dtrec.results=./runs                \
		    -Dtrec.results.file=run.txt

The trec.results parameter is pointed to a directory named 'runs'.

run.txt has the retrieval output in TREC format.