DDTH's utilities and extensions for Apache Lucene.
Project home: https://github.com/DDTH/ddth-lucext
ddth-lucext
requires Java 11+ since v1.0.0, for Java 8, use v0.x.y
Latest release version: 1.0.0
. See RELEASE-NOTES.md.
Maven dependency: if only a sub-set of ddth-lucext
functionality is used, choose the corresponding
dependency artifact(s) to reduce the number of unused jar files.
ddth-lucext-core: (not much functionality at the moment)
<dependency>
<groupId>com.github.ddth</groupId>
<artifactId>ddth-lucext-core</artifactId>
<version>1.0.0</version>
</dependency>
ddth-lucext-cassandra: include ddth-lucext-core and Apache's Cassandra dependencies.
<dependency>
<groupId>com.github.ddth</groupId>
<artifactId>ddth-lucext-cassandra</artifactId>
<version>1.0.0</version>
<type>pom</type>
</dependency>
ddth-lucext-redis: include ddth-lucext-core and Redis dependencies.
<dependency>
<groupId>com.github.ddth</groupId>
<artifactId>ddth-lucext-redis</artifactId>
<version>1.0.0</version>
<type>pom</type>
</dependency>
Help to manage index-objects (IndexWriter
, IndexSearcher
and DirectoryReader
) associated with a Directory
.
import com.github.ddth.lucext.directory.IndexManager;
// first open a Lucene directory
org.apache.lucene.store.Directory directory = FSDirectory.open("./temp");
// then create an IndexManager instance
IndexManager indexManager = new IndexManager(directory);
// customize the IndexManager
indexManager.setIndexWriterConfig(iwc)
.setScheduledExecutorService(ses) //supply a custom ScheduledExecutorService for background jobs
.setBackgroundRefreshIndexSearcherPeriodMs(10000) //automatically refresh DirectoryReader and IndexSearcher per 10 seconds
.setBackgroundCommitIndexPeriodMs(1000) //automatically call IndexWriter.commit() per 1 second
.setNrtIndexSearcher(true) //enable near-real-time IndexSearcher
;
// remember to initialize the IndexManager
indexManager.init();
From this point, application obtains IndexWriter
, IndexSearcher
and DirectoryReader
and works with them as usual.
IndexWriter indexWriter = indexManager.getIndexWriter();
indexWriter.addDocument(...);
indexWriter.updateDocument(...);
indexWriter.removeDocument(...);
indexWriter.commit();
IndexSearcher indexSearcher = indexManager.getIndexSearcher();
indexSearcher.search(...);
Finally, do not forget to close the IndexManager
when done:
indexManager.destroy(); // or indexManager.close();
Notes:
- Do not close the obtained
IndexWriter
orDirectoryReader
!IndexManager.close()
will take care of closing those instances. - Application is free to call
IndexWriter.commit()
. In most cases, however, letIndexManager
do that in the background:IndexManager.setBackgroundCommitIndexPeriodMs(1000)
should be sufficient. - In near-real-time mode (which is turned on by default),
IndexManager.getDirectoryReader()
andIndexManager.getIndexSeacher()
always return the most up-to-date instances. If near-real-time mode is too costly for application (which is a rare case, however), application can turn off near-real-time mode (IndexManager.setNrtIndexSearcher(false)
) and enable background refresh ofIndexSearcher
(andDirectoryReader
) viaIndexManager.setBackgroundRefreshIndexSearcherPeriodMs(...)
. - Warning: if both near-real-time mode and background IndexSeacher refresh are turned off, all index changes (document added/deleted/updated)
occurred after
IndexManager.init()
is called will not be read. - After
IndexManager.init()
is invoked:setIndexWriterConfig(IndexWriterConfig)
will NOT take effect and a warning message will be logged.setScheduledExecutorService(ScheduledExecutorService)
will NOT take effect and a warning message will be logged.setBackgroundRefreshIndexSearcherPeriodMs(long)
will take effect on-the-fly.setBackgroundCommitIndexPeriodMs(long)
will take effect on-the-fly.setNrtIndexSearcher(boolean)
will take effect on-the-fly.
Store Lucene's data in Redis.
import com.github.ddth.commons.redis.*;
import com.github.ddth.lucext.directory.redis.*;
import redis.clients.jedis.*;
// 1. create a JedisConnector instance
JedisConnector jc = new JedisConnector();
jc.setRedisHostsAndPorts("localhost:6379").setRedisPassword("secret").init();
// 2. create RedisDirectory
Directory DIR = new RedisDirectory(jc).init();
// 3. use the directory normally with Lucene
IndexReader ir = DirectoryReader.open(DIR);
IndexSearcher is = new IndexSearcher(ir);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(null, analyzer);
Query q = parser.parse("...");
TopDocs result = is.search(q, 10);
System.out.println("Hits:" + result.totalHits);
for (ScoreDoc sDoc : result.scoreDocs) {
Document doc = is.doc(sDoc.doc);
System.out.println(doc);
}
ir.close();
// 4. close the directory when done
DIR.close();
(See more about JedisConnector
here)
Store Lucene's data in Cassandra.
import com.github.ddth.cql.*;
import com.github.ddth.lucext.directory.cassandra.*;
import com.datastax.oss.driver.api.core.config.*;
// 1. create a SessionManager instance
ProgrammaticDriverConfigLoaderBuilder dclBuilder = DriverConfigLoader.programmaticBuilder();
dclBuilder.withString(DefaultDriverOption.LOAD_BALANCING_LOCAL_DATACENTER, "datacenter1")
.withString(DefaultDriverOption.AUTH_PROVIDER_USER_NAME, "cassandra")
.withString(DefaultDriverOption.AUTH_PROVIDER_PASSWORD, "cassandra");
SessionManager sm = new SessionManager();
sm.setConfigLoader(dclBuilder.build());
sm.setDefaultHostsAndPorts("localhost");
sm.init();
// 2. create CassandraDirectory
Directory DIR = new CassandraDirectory(sm).setKeyspace("mykeyspace").init();
// 3. use the directory normally with Lucene
IndexReader ir = DirectoryReader.open(DIR);
IndexSearcher is = new IndexSearcher(ir);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(null, analyzer);
Query q = parser.parse("...");
TopDocs result = is.search(q, 10);
System.out.println("Hits:" + result.totalHits);
for (ScoreDoc sDoc : result.scoreDocs) {
Document doc = is.doc(sDoc.doc);
System.out.println(doc);
}
ir.close();
// 4. close the directory when done
DIR.close();
(See more about SessionManager
here)
CassandraDirectory
can cache data to boost performance. The following example cache data in a Redis cache.
import com.github.ddth.cacheadapter.*;
import com.github.ddth.cacheadapter.cacheimpl.redis.*;
import com.github.ddth.commons.redis.*;
import com.github.ddth.cql.*;
import com.github.ddth.lucext.directory.cassandra.*;
import com.datastax.oss.driver.api.core.config.*;
// create a ICacheFactory
JedisConnector jc = new JedisConnector();
jc.setRedisHostsAndPorts("localhost:6379").seetRedisPassword("secret").init();
ICacheFactory cf = new RedisCacheFactory().setJedisConnector(jc).init();
// CassandraDirectory with caching enabled
ProgrammaticDriverConfigLoaderBuilder dclBuilder = DriverConfigLoader.programmaticBuilder();
dclBuilder.withString(DefaultDriverOption.LOAD_BALANCING_LOCAL_DATACENTER, "datacenter1")
.withString(DefaultDriverOption.AUTH_PROVIDER_USER_NAME, "cassandra")
.withString(DefaultDriverOption.AUTH_PROVIDER_PASSWORD, "cassandra");
SessionManager sm = new SessionManager();
sm.setConfigLoader(dclBuilder.build());
sm.setDefaultHostsAndPorts("localhost");
sm.init();
Directory DIR = new CassandraDirectory(sm)
.setKeyspace("mykeyspace")
.setCacheFactory(cf).setCacheName("cachename")
.init();
(See more about ICacheManager
here)
See more examples here.
See LICENSE.txt for details. Copyright (c) 2018-2019 Thanh Ba Nguyen.
Third party libraries are distributed under their own licenses.