Skip to content

Latest commit

 

History

History
54 lines (31 loc) · 1.4 KB

list-high-freq-terms.md

File metadata and controls

54 lines (31 loc) · 1.4 KB

list-high-freq-terms

Name

index-list-high-freq-terms - Lists the top N most frequent terms by document frequency.

Synopsis

lucene index list-high-freq-terms [<INDEX_DIRECTORY>] [-t|--total-term-frequency] [-n|--number-of-terms] [-f|--field] [?|-h|--help]

Description

Extracts the top N most frequent terms (by document frequency) from an existing Lucene index and reports their document frequency.

Arguments

INDEX_DIRECTORY

The directory of the index. If omitted, it defaults to the current working directory.

Options

?|-h|--help

Prints out a short help for the command.

-t|--total-term-frequency

Specifies that both the document frequency and term frequency are reported, ordered by descending total term frequency.

-n|--number-of-terms <NUMBER>

The number of terms to consider. If omitted, defaults to 100.

-f|--field <FIELD>

The field to consider. If omitted, considers all fields.

Examples

List the high frequency terms in the index located at F:\product-index\ on the description field, reporting both document frequency and term frequency:

lucene index list-high-freq-terms F:\product-index --total-term-frequency --field description

List the high frequency terms in the index located at C:\lucene-index\ on the name field, tracking 30 terms:

lucene index list-high-freq-terms C:\lucene-index --f name -n 30