Skip to content

Latest commit

 

History

History
235 lines (168 loc) · 10.2 KB

README_SOLR.rdoc

File metadata and controls

235 lines (168 loc) · 10.2 KB

Solr in Blacklight

Setting up Solr

Blacklight uses Solr as its “search engine”. More information about Solr is available at the Solr web site ( lucene.apache.org/solr/)

There are three sections to this document:

  • Getting Solr

  • Configuring Solr

** schema.xml ** solrconfig.xml

  • SolrMARC

Getting Solr

Create a directory for a solr distribution and download the latest Solr nightly build:
  cd <my-new-solr-dir>
  wget http://people.apache.org/builds/lucene/solr/nightly/solr-2009-01-27.tgz

Uncompress the file:
  tar -xzvf solr-2009-01-27.tgz

You now have a usable copy of Solr.

Configuring Solr

Solr Schema.xml

Solr uses a schema.xml file to define document fields (among other things). These fields store data for searching and for result display. You can find the example/solr/conf/schema.xml file in the Solr distribution you just downloaded and uncompressed.

Documentation about the Solr schema.xml file is available at (wiki.apache.org/solr/SchemaXml).

The default schema.xml file comes with some preset fields made to work with
the example data. If you don't already have a schema.xml setup, we 
recommend using a simplified "fields" section like this:

<fields> <field name=“id” type=“string” indexed=“true” stored=“true” required=“true” /> <field name=“text” type=“text” indexed=“true” stored=“false” multiValued=“true”/> <field name=“timestamp” type=“date” indexed=“true” stored=“true” default=“NOW” multiValued=“false”/> <field name=“spell” type=“textSpell” indexed=“true” stored=“true” multiValued=“true”/> <dynamicField name=“*_i” type=“sint” indexed=“true” stored=“true”/> <dynamicField name=“*_s” type=“string” indexed=“true” stored=“true” multiValued=“true”/> <dynamicField name=“*_l” type=“slong” indexed=“true” stored=“true”/> <dynamicField name=“*_t” type=“text” indexed=“true” stored=“true” multiValued=“true”/> <dynamicField name=“*_b” type=“boolean” indexed=“true” stored=“true”/> <dynamicField name=“*_f” type=“sfloat” indexed=“true” stored=“true”/> <dynamicField name=“*_d” type=“sdouble” indexed=“true” stored=“true”/> <dynamicField name=“*_dt” type=“date” indexed=“true” stored=“true”/> <dynamicField name=“random*” type=“random” /> <dynamicField name=“*_facet” type=“string” indexed=“true” stored=“true” multiValued=“true” /> <dynamicField name=“*_display” type=“string” indexed=“false” stored=“true” /> </fields>

Simply replace the "fields" section in the schema.xml with the block above.

Additionally, replace all of the tags after the "fields" section, and before 
the </schema> tag with this:

<uniqueKey>id</uniqueKey> <defaultSearchField>text</defaultSearchField> <solrQueryParser defaultOperator=“OR”/> <copyField source=“*_facet” dest=“text”/>

Now you have a basic schema.xml file ready. The required fields used by the 
built in Blacklight rails views are:
  id

Other fields are used, but specified in a configuration file:  app/config/initializers/blacklight_config.rb

Fields that are "indexed" are searchable.

Fields that are "stored" are can be viewed/displayed from the Solr search 
results. 

The fields with asterisks ('*') in their names are "dynamic" fields. These 
allow you to create arbitrary tags at index time. 

The *_facet field can be used for creating your facets. When you index, 
simply define a field with _facet on the end:
  category_facet

The *_display field can be used for storing text that doesn't need to be 
indexed. An example would be the raw MARC for a record's detail view:
  raw_marc_display

For text that will be queried (and possibly displayed), use the *_t type 
field for tokenized text (text broken into pieces/words) or the *_s type 
for queries that should exactly match the field contents:
  description_t
  url_s

The Blacklight application is generic enough to work with any Solr schema, but to
manipulate the search results and single record displays, you'll need to know the 
stored fields in your indexed documents.

For more information, refer to the Solr documentation: 
  http://wiki.apache.org/solr/SchemaXml

Solr Config.xml

Solr uses the solrconfig.xml file to define searching configurations, set cache options, etc. You can find the examples/solr/conf/solrconfig.xml in the distribution directory you just uncompressed.

Documentation about the solrconfig.xml file is available at (wiki.apache.org/solr/SolrConfigXml).

Blacklight expects a few things to be setup in the solrconfig.xml file, 
namely two special request handler definitions. 

You MUST set up these two request handlers.
Solr Search Request Handlers
When Blacklight does a collection search, it sends a request to a Solr 
request handler named "search". The most important settings in this handler 
definition are the "fl" param (field list) and the facet params.

The "fl" param specifies which fields are returned in a Solr response.
The facet related params set up the faceting mechanism.

Find out more about the basic params: 
   http://wiki.apache.org/solr/DisMaxRequestHandler

Find out more about the faceting params: 
  http://wiki.apache.org/solr/SimpleFacetParameters
How the “fl” param works in Blacklight’s request handlers
Blacklight comes with a set of "default" views for rendering each document 
in a search results page. This view simply loops through all of the fields 
returned in each document in the Solr response. The "fl" (field list) param
tells Solr which fields to include in the documents in the response ... 
and these are the fields rendered in the Blacklight default views.  
Thus, the fields you want rendered must be specified in "fl".  Note that 
only "stored" fields will be available;  if you want a field to be rendered 
in the result, it must be "stored" per the field definition in schema.xml.

The "fl" parameter definition in the "search" handler looks like this:
  <str name="fl">*,score</str>

The asterisk could be replaced by a list of specific field names:
  <str name="fl">id,title_display,score</str>
How the facet params work in Blacklight’s request handlers
In the search results view, Blacklight will look into the Solr response for 
facets. If you specify any facet.field params in your "search" handler, 
they will automatically get displayed in the facets list:
  <str name="facet.field">format</str>
  <str name="facet.field">language_facet</str>
Blacklight’s “search” request handler: for search results
When Blacklight displays a list of search results, it uses a Solr request 
handler named "search." Thus, the field list (fl param) for the "search"
request handler should be tailored to what will be displayed in a search
results page.  Generally, this will not include fields containing a large
quantity of text.  The facet param should contain the facets to be 
displayed with the search results.

<requestHandler name=“search” class=“solr.SearchHandler” > <lst name=“defaults”> <str name=“defType”>dismax</str> <str name=“echoParams”>explicit</str> <!– list fields to be returned in the “fl” param –> <str name=“fl”>*,score</str> <str name=“facet”>on</str> <str name=“facet.mincount”>1</str> <str name=“facet.limit”>10</str> <!– list fields to be displayed as facets here. –> <str name=“facet.field”>format</str> <str name=“facet.field”>language_facet</str> <str name=“q.alt”>:</str> </lst> </requestHandler>

Blacklight’s “document” request handler: for a single record
When Blacklight displays a single record it uses a Solr request handler 
named "document".  The "document" handler doesn't necessarily need to be 
different than the "search" handler, but it can be used to control which 
fields are available to display a single document. In the example below, 
there is no faceting set (facets are not displayed with a single record) 
and the "rows" param is set to 1 (since there will only be a single record).
Also, the field list ("fl" param) could include fields containing large
text values if they are desired for record display. Is is acceptable to
include large amounts of data, because this handler should only be used 
to query for one document:

<requestHandler name=“document” class=“solr.SearchHandler”> <lst name=“defaults”> <str name=“echoParams”>explicit</str> <str name=“fl”>*</str> <str name=“rows”>1</str> <str name=“q”>{!raw f=id v=$id}</str> <!– use id=blah instead of q=id:blah –> </lst> </requestHandler>

A Solr query for a single record might look like this:
 http://(yourSolrBaseUrl)/solr/select?id=my_doc_id&qt=document

Blacklight Solr Schema and Solrconfig File Templates

Blacklight provides schema.xml and solrconfig.xml files as starting points:

http://github.com/projectblacklight/blacklight-jetty/blob/master/solr/conf/schema.xml

http://github.com/projectblacklight/blacklight-jetty/blob/master/solr/conf/solrconfig.xml

SolrMARC: from Marc data to Solr documents

The SolrMARC project is designed to create a Solr index from raw MARC data.

It can be configured easily and used with the basic parsing and indexing supplied. It is also readily customized for a site’s unique requirements.

The project software and documentation is available at (code.google.com/p/solrmarc)

Blacklight comes with an embedded SolrMarc, with some default config that matches the default Blacklight setup, and provides some rake tasks to easily index docs with SolrMarc according to your app’s environment. There is no need to manually install/configure SolrMarc yourself. From your application’s home directory simply run:

rake solr:marc:index:info

to see options. Run “rake solr:marc:index” to actually do indexing. Like all rake tasks, by default this will use your ‘development’ environment; add “RAILS_ENV=production” to instead index to the solr you’ve labelled production in your config/solr.yml file.

The solrmarc config files are in your app’s config/SolrMarc directory, you can edit them there for local config.

If you’d like to use a different or more recent version of SolrMarc.jar, you can put it in your app at ./solr_marc/SolrMarc.jar, and the built-in rake tasks will use your local SolrMarc.jar instead of the one bundled with Blacklight.