Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A mirror of MediaWiki's Subversion repository produced with git-svn(1)
branch: ls2.1@24317

This branch is 37 commits ahead, 66814 commits behind trunk

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
sql
src
test-data
webinterface
.classpath
.project
OVERVIEW.txt
README.txt
build.xml
hostname
lsearch-global.conf
lsearch.conf
lsearch.log4j-example
lsearchd
rsyncd.conf-example

README.txt

  Lucene Search 2.0: extension for MediaWiki
  ==========================================

Requirements:

 - Java 5.0
 - Lucene 2.0dev (modified Lucene 2.0)
 - MediaWiki 1.9 with MWSearch extension
 
 Optionally:
 - Rsync (for distributed architecture)
 - Apache XMLRPC 3.0 (for XMLRPC interface)
 - Apache Ant 1.6 (for building from source, etc)

Setup:

 - Edit mwsearch-global.conf and make it available at some URL
 - At each host: 
 	* properly setup hostname (otherwise JavaVM gets confused)
 	* make and set permissions of local directory for indexes
 	* edit mwsearch.conf:
 	 	+ MWConfig.global to point to URL of mwsearch-global.conf
 	 	+ MWConfig.lib to point to local library path (ie with unicode-data etc)
 		+ Localization.url to point to URL of latest message files from MediaWiki
 		+ Indexes.path - base path where you want the deamon to store the indexes, 
		+ Logging.logconfig - local path to log4j configuration file, e.g. /etc/lsearch.log4j (the lsearch package has a sample log4j file you can use)
   	* setup rsync daemon (see rsyncd.conf-example)
  	* setup log4j logging subsystem (see mwsearch.log4j-example)
 	
Running:

 - start rsync daemon (if distributed architecture)
 - "./lsearchd" or "ant run" (setup hostname in file "hostname")
 
Features: 

 - distributed architecture, indexes can be either single file (single), 
   split between main namespace and rest (mainsplit) or split into some 
   number of subindexes (split). Indexer makes periodic snapshots of
   index, and searchers check for this snapshots to update their local
   copy.
   
 - incremental updater using oai interface. Periodically checks wikis
   for new updates, and enqueues them on the indexer.
   
 - wiki syntax parser, articles are parsed for basic wiki syntax and are 
   stripped of accents. Localization for wiki syntax can be read from 
   MediaWiki message files. Categories are extracted and put into 
   separate field. Additionaly, template names (but not parameters), 
   table parameters, image parameters (except caption) are not indexed.
   
 - query parser, faster search query parsing, enables prefixes for namespaces,
   e.g. 'help:editing pages'. Prefixes are localized within MediaWiki. Can
   do category searches e.g. 'smoked category:cheeses'. Rewrites all of these
   so that stemmed present are present but add less to document score. 
   
 - (hopefully) robust architecture, with threads pinging hosts that are down,
   and search daemons trying alternatives if host holding part of the 
   index is down.
Something went wrong with that request. Please try again.