Skip to content

Commit

Permalink
Added a simple readme.
Browse files Browse the repository at this point in the history
  • Loading branch information
dweiss committed Aug 8, 2014
1 parent 86252bd commit c60ca91
Showing 1 changed file with 33 additions and 0 deletions.
33 changes: 33 additions & 0 deletions README.txt
@@ -0,0 +1,33 @@

folder2index
------------

Converts PDF, TXT or HTML documents to a Lucene index (for use with Carrot2 Clustering Workbench)

Quick usage guide
-----------------

- install Apache Maven.

- run:
mvn clean package

- cd target

- prepare a folder FOO with your PDF, HTML or plain text files. Prepare an empty folder BAR
for the index.

- run:

java -jar folder2index-0.0.2.jar --folder FOO --index BAR --use-tika

The index will be created. Download and open Carrot2 Workbench.

http://project.carrot2.org/download.html

Select Lucene as the document source and pick the correct fields for the title, content and URL (pick file path as
the URL field).

http://download.carrot2.org/head/manual/index.html#section.getting-started.lucene

Select other input options (how many results to cluster, query or *:*) and run your clustering.

0 comments on commit c60ca91

Please sign in to comment.