Skip to content

Commit

Permalink
Switched to Markdown formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
algoriffic committed Mar 31, 2010
1 parent 2b30cb4 commit ea4ee6a
Showing 1 changed file with 55 additions and 50 deletions.
105 changes: 55 additions & 50 deletions README.rdoc → README.md
@@ -1,4 +1,5 @@
= lsa4solr
lsa4solr
========

A clustering engine for Solr based on Latent Semantic Indexing. The engine
constructs a term frequency matrix which it stores in memory. When requests for
Expand All @@ -15,8 +16,8 @@ small document sets will work. Development goals include determining the optima
number of clusters, interfacing with Apache Mahout matrix algebra packages, optimizing
the reduced rank, etc.


== Building:
Building
--------

lsa4solr depends on the 3.1 development version of Solr and the
1.2 development version of Clojure. In order to build lsa4solr,
Expand All @@ -27,74 +28,78 @@ maven repository. Then
lein deps
lein jar

== Installing:
Installing
----------

Due to some Clojure classloader requirements, you will need to install the
lsa4solr jar and its dependencies into the Solr webapp/WEB-INF/lib directory
rather than using the solrconfig.xml file to configure the path to the
lsa4solr dependencies. The dependencies that need to be in the System
classloader include:

arpack-combo-0.1.jar
clojure-1.2.0.jar
clojure-contrib-1.2.0-master-20100122.191106-1.jar
incanter-io-1.0.0.jar
incanter-full-1.0.0.jar
incanter-core-1.0.0.jar
incanter-chrono-1.0.0.jar
incanter-charts-1.0.0.jar
apache-solr-clustering-3.1-dev.jar
parallelcolt-0.7.2.jar
lsa4solr.jar
netlib-java-0.9.1.jar

== Configuring Solr:
arpack-combo-0.1.jar
clojure-1.2.0.jar
clojure-contrib-1.2.0-master-20100122.191106-1.jar
incanter-io-1.0.0.jar
incanter-full-1.0.0.jar
incanter-core-1.0.0.jar
incanter-chrono-1.0.0.jar
incanter-charts-1.0.0.jar
apache-solr-clustering-3.1-dev.jar
parallelcolt-0.7.2.jar
lsa4solr.jar
netlib-java-0.9.1.jar

Configuring Solr
----------------

Add the following to your solrconfig.xml

<searchComponent
name="lsa4solr"
enable="${solr.clustering.enabled:false}"
class="org.apache.solr.handler.clustering.ClusteringComponent" >
<lst name="engine">
<str name="classname">lsa4solr.cluster.LSAClusteringEngine</str>
<str name="name">lsa4solr</str>
<str name="narrative-field">Summary</str>
</lst>
</searchComponent>
<requestHandler name="/lsa4solr"
enable="${solr.clustering.enabled:false}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">lsa4solr</str>
<bool name="clustering.results">true</bool>
</lst>
<arr name="last-components">
<str>lsa4solr</str>
</arr>
</requestHandler>

<searchComponent
name="lsa4solr"
enable="${solr.clustering.enabled:false}"
class="org.apache.solr.handler.clustering.ClusteringComponent" >
<lst name="engine">
<str name="classname">lsa4solr.cluster.LSAClusteringEngine</str>
<str name="name">lsa4solr</str>
<str name="narrative-field">Summary</str>
</lst>
</searchComponent>
<requestHandler name="/lsa4solr"
enable="${solr.clustering.enabled:false}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">lsa4solr</str>
<bool name="clustering.results">true</bool>
</lst>
<arr name="last-components">
<str>lsa4solr</str>
</arr>
</requestHandler>
Configure the narrative-field parameter to be the text field of the
schema you are working with.

== Using
Using
-----

Start Solr with the -Dsolr.clustering.enabled=true option. Once the server
has started, cluster your documents using an URL like

http://localhost:8983/solr/lsa4solr?nclusters=2&q=Summary:.*&rows=100&k=10
http://localhost:8983/solr/lsa4solr?nclusters=2&q=Summary:.*&rows=100&k=10

where

k - the rank of the reduced SVD matrix
ncluster - the number of clusters to group the documents into
q - the standard Solr query parameter
rows - the standard Solr rows parameter
k - the rank of the reduced SVD matrix
ncluster - the number of clusters to group the documents into
q - the standard Solr query parameter
rows - the standard Solr rows parameter

The cluster information will be at the bottom of the response.

== Testing
Testing
-------

On the Downloads page, there is a Usenet dataset which can be found "here":http://people.csail.mit.edu/jrennie/20Newsgroups/ Import some documents from two or more
of the newsgroups into your Solr instance and access the lsa4solr URL.
On the Downloads page, there is a Usenet dataset which can be found [here](http://people.csail.mit.edu/jrennie/20Newsgroups/)
Import some documents from two or more of the newsgroups into your Solr instance and access the lsa4solr URL.

0 comments on commit ea4ee6a

Please sign in to comment.