Skip to content

DDTH/ddth-lucext

Repository files navigation

Build Status

ddth-lucext

DDTH's utilities and extensions for Apache Lucene.

Project home: https://github.com/DDTH/ddth-lucext

ddth-lucext requires Java 11+ since v1.0.0, for Java 8, use v0.x.y

Installation

Latest release version: 1.0.0. See RELEASE-NOTES.md.

Maven dependency: if only a sub-set of ddth-lucext functionality is used, choose the corresponding dependency artifact(s) to reduce the number of unused jar files.

ddth-lucext-core: (not much functionality at the moment)

<dependency>
    <groupId>com.github.ddth</groupId>
    <artifactId>ddth-lucext-core</artifactId>
    <version>1.0.0</version>
</dependency>

ddth-lucext-cassandra: include ddth-lucext-core and Apache's Cassandra dependencies.

<dependency>
    <groupId>com.github.ddth</groupId>
    <artifactId>ddth-lucext-cassandra</artifactId>
    <version>1.0.0</version>
    <type>pom</type>
</dependency>

ddth-lucext-redis: include ddth-lucext-core and Redis dependencies.

<dependency>
    <groupId>com.github.ddth</groupId>
    <artifactId>ddth-lucext-redis</artifactId>
    <version>1.0.0</version>
    <type>pom</type>
</dependency>

Usage

IndexManager

Help to manage index-objects (IndexWriter, IndexSearcher and DirectoryReader) associated with a Directory.

import com.github.ddth.lucext.directory.IndexManager;

// first open a Lucene directory
org.apache.lucene.store.Directory directory = FSDirectory.open("./temp");
// then create an IndexManager instance
IndexManager indexManager = new IndexManager(directory);

// customize the IndexManager
indexManager.setIndexWriterConfig(iwc)
    .setScheduledExecutorService(ses)                    //supply a custom ScheduledExecutorService for background jobs
    .setBackgroundRefreshIndexSearcherPeriodMs(10000)    //automatically refresh DirectoryReader and IndexSearcher per 10 seconds
    .setBackgroundCommitIndexPeriodMs(1000)              //automatically call IndexWriter.commit() per 1 second
    .setNrtIndexSearcher(true)                           //enable near-real-time IndexSearcher
    ;

// remember to initialize the IndexManager
indexManager.init();

From this point, application obtains IndexWriter, IndexSearcher and DirectoryReader and works with them as usual.

IndexWriter indexWriter = indexManager.getIndexWriter();
indexWriter.addDocument(...);
indexWriter.updateDocument(...);
indexWriter.removeDocument(...);
indexWriter.commit();

IndexSearcher indexSearcher = indexManager.getIndexSearcher();
indexSearcher.search(...);

Finally, do not forget to close the IndexManager when done:

indexManager.destroy(); // or indexManager.close();

Notes:

  • Do not close the obtained IndexWriter or DirectoryReader! IndexManager.close() will take care of closing those instances.
  • Application is free to call IndexWriter.commit(). In most cases, however, let IndexManager do that in the background: IndexManager.setBackgroundCommitIndexPeriodMs(1000) should be sufficient.
  • In near-real-time mode (which is turned on by default), IndexManager.getDirectoryReader() and IndexManager.getIndexSeacher() always return the most up-to-date instances. If near-real-time mode is too costly for application (which is a rare case, however), application can turn off near-real-time mode (IndexManager.setNrtIndexSearcher(false)) and enable background refresh of IndexSearcher (and DirectoryReader) via IndexManager.setBackgroundRefreshIndexSearcherPeriodMs(...).
  • Warning: if both near-real-time mode and background IndexSeacher refresh are turned off, all index changes (document added/deleted/updated) occurred after IndexManager.init() is called will not be read.
  • After IndexManager.init() is invoked:
    • setIndexWriterConfig(IndexWriterConfig) will NOT take effect and a warning message will be logged.
    • setScheduledExecutorService(ScheduledExecutorService) will NOT take effect and a warning message will be logged.
    • setBackgroundRefreshIndexSearcherPeriodMs(long) will take effect on-the-fly.
    • setBackgroundCommitIndexPeriodMs(long) will take effect on-the-fly.
    • setNrtIndexSearcher(boolean) will take effect on-the-fly.

RedisDirectory

Store Lucene's data in Redis.

import com.github.ddth.commons.redis.*;
import com.github.ddth.lucext.directory.redis.*;
import redis.clients.jedis.*;

// 1. create a JedisConnector instance
JedisConnector jc = new JedisConnector();
jc.setRedisHostsAndPorts("localhost:6379").setRedisPassword("secret").init();

// 2. create RedisDirectory
Directory DIR = new RedisDirectory(jc).init();

// 3. use the directory normally with Lucene
IndexReader ir = DirectoryReader.open(DIR);
IndexSearcher is = new IndexSearcher(ir);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(null, analyzer);
Query q = parser.parse("...");
TopDocs result = is.search(q, 10);
System.out.println("Hits:" + result.totalHits);
for (ScoreDoc sDoc : result.scoreDocs) {
    Document doc = is.doc(sDoc.doc);
    System.out.println(doc);
}
ir.close();

// 4. close the directory when done
DIR.close();

(See more about JedisConnector here)

CassandraDirectory

Store Lucene's data in Cassandra.

import com.github.ddth.cql.*;
import com.github.ddth.lucext.directory.cassandra.*;
import com.datastax.oss.driver.api.core.config.*;

// 1. create a SessionManager instance
ProgrammaticDriverConfigLoaderBuilder dclBuilder = DriverConfigLoader.programmaticBuilder();
dclBuilder.withString(DefaultDriverOption.LOAD_BALANCING_LOCAL_DATACENTER, "datacenter1")
    .withString(DefaultDriverOption.AUTH_PROVIDER_USER_NAME, "cassandra")
    .withString(DefaultDriverOption.AUTH_PROVIDER_PASSWORD, "cassandra");
SessionManager sm = new SessionManager();
sm.setConfigLoader(dclBuilder.build());
sm.setDefaultHostsAndPorts("localhost");
sm.init();

// 2. create CassandraDirectory
Directory DIR = new CassandraDirectory(sm).setKeyspace("mykeyspace").init();

// 3. use the directory normally with Lucene
IndexReader ir = DirectoryReader.open(DIR);
IndexSearcher is = new IndexSearcher(ir);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(null, analyzer);
Query q = parser.parse("...");
TopDocs result = is.search(q, 10);
System.out.println("Hits:" + result.totalHits);
for (ScoreDoc sDoc : result.scoreDocs) {
    Document doc = is.doc(sDoc.doc);
    System.out.println(doc);
}
ir.close();

// 4. close the directory when done
DIR.close();

(See more about SessionManager here)

CassandraDirectory can cache data to boost performance. The following example cache data in a Redis cache.

import com.github.ddth.cacheadapter.*;
import com.github.ddth.cacheadapter.cacheimpl.redis.*;
import com.github.ddth.commons.redis.*;
import com.github.ddth.cql.*;
import com.github.ddth.lucext.directory.cassandra.*;
import com.datastax.oss.driver.api.core.config.*;

// create a ICacheFactory
JedisConnector jc = new JedisConnector();
jc.setRedisHostsAndPorts("localhost:6379").seetRedisPassword("secret").init();
ICacheFactory cf = new RedisCacheFactory().setJedisConnector(jc).init();

// CassandraDirectory with caching enabled
ProgrammaticDriverConfigLoaderBuilder dclBuilder = DriverConfigLoader.programmaticBuilder();
dclBuilder.withString(DefaultDriverOption.LOAD_BALANCING_LOCAL_DATACENTER, "datacenter1")
    .withString(DefaultDriverOption.AUTH_PROVIDER_USER_NAME, "cassandra")
    .withString(DefaultDriverOption.AUTH_PROVIDER_PASSWORD, "cassandra");
SessionManager sm = new SessionManager();
sm.setConfigLoader(dclBuilder.build());
sm.setDefaultHostsAndPorts("localhost");
sm.init();
Directory DIR = new CassandraDirectory(sm)
     .setKeyspace("mykeyspace")
     .setCacheFactory(cf).setCacheName("cachename")
     .init();

(See more about ICacheManager here)

Examples

See more examples here.

License

See LICENSE.txt for details. Copyright (c) 2018-2019 Thanh Ba Nguyen.

Third party libraries are distributed under their own licenses.