Skip to content

Commit

Permalink
Server documentation and examples.
Browse files Browse the repository at this point in the history
  • Loading branch information
coady committed Dec 29, 2017
1 parent 4f2ac31 commit de0c885
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 21 deletions.
50 changes: 31 additions & 19 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
About Lupyne
==================
.. image:: https://img.shields.io/pypi/v/lupyne.svg
:target: https://pypi.org/project/lupyne/
.. image:: https://img.shields.io/pypi/pyversions/lupyne.svg
Expand All @@ -9,28 +7,42 @@ About Lupyne
.. image:: https://api.shippable.com/projects/56059e3e1895ca4474182ec3/coverageBadge?branch=master
:target: https://app.shippable.com/github/coady/lupyne

The core engine is a high level interface to `PyLucene`_, which is a Python extension for accessing the popular Java Lucene search engine.
Lucene has a reputation for being a relatively low-level toolkit, and the goal of PyLucene is to wrap it through automatic code generation.
So although PyLucene transforms Java idioms to Python idioms where possible, the resulting interface is far from Pythonic.
Lupyne is a search engine based on `PyLucene`_, the Python extension for accessing Java Lucene.
Lucene is a relatively low-level toolkit, and PyLucene wraps it through automatic code generation.
So although Java idioms are translated to Python idioms where possible, the resulting interface is far from Pythonic.
See ``./examples`` for comparisons with the Lucene API.

A RESTful JSON search server, based on `CherryPy`_.
Many python applications which require better search capabilities are migrating from using conventional client-server databases,
whereas Lucene is an embedded search library. Solr and Elasticsearch are popular options for remote searching and advanced features,
but then any customization beyond the REST API is difficult and coupled to Java.
Using a python web framework instead can provide the best of both worlds, e.g., batch indexing offline and remote searching live.
Lupyne also provides a RESTful JSON search server, based on `CherryPy`_.
Note Solr and Elasticsearch are popular options for Lucene-based search, if no further (Python) customization is needed.
So while the server is suitable for production usage, its primary motivation is to be an extensible example.

Not having to initially choose between an embedded library and a server not only provides greater flexibility,
it can provide better performance, e.g., batch indexing offline and remote searching live.
Additionally only lightweight wrappers with extended behavior are used wherever possible,
so falling back to using PyLucene directly is always an option, but should never necessary for performance.

Usage
==================
PyLucene requries initializing the VM.

.. code-block:: python
import lucene
lucene.initVM()
Indexes are accessed through an `IndexSearcher` (read-only), `IndexWriter`, or the combined `Indexer`.

.. code-block:: python
from lupyne import engine
A simple client to make interacting with the server as convenient as an RPC interface.
It handles all of the HTTP interactions, with support for compression, json, and connection reuse.
searcher = engine.IndexSearcher('index/path')
hits = searcher.search('text:query')
Advanced search features:
Run the server. ::

* Automatic updating and syncing to support replication.
* Optimized faceted and grouped search.
* Optimized prefix and range queries.
* Geospatial support.
* Spellchecking.
* Near real-time indexing.
$ python -m lupyne.server

Read the `documentation`_.

Expand Down
1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ Lupyne's documentation
Lupyne_ is:
* a high-level Pythonic search `engine <engine.html>`_ library, built on PyLucene_
* a RESTful_ JSON_ search `server <server.html>`_, built on CherryPy_
* a simple Python `client <client.html>`_ for interacting with the server

Quickstart
^^^^^^^^^^
Expand Down
10 changes: 10 additions & 0 deletions docs/server.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ server
:cwd: ..
.. automodule:: lupyne.server
.. note:: Lucene doc ids are ephemeral; only use doc ids across requests for the same index version.
.. warning:: Autosyncing is not recommended for production.

Lucene index files are incremental, so synchronizing files and refreshing searchers is a viable replication strategy.
The `autoupdate` and `autosync` features demonstrate this, but are not meant to recommend HTTP for file syncing.
Autoupdating is considered production-ready; autosyncing is not.

CherryPy was chosen because not only is it well suited to exposing APIs, but it includes a production multithreaded server.
Lucene caches heavily, and the PyLucene is not bound by the `GIL`_ when in the Java VM.
So threads are a natural choice for a worker pool, even if a different concurrency model is used for HTTP.

tools
-----------
Expand Down Expand Up @@ -44,3 +53,4 @@ start
.. autofunction:: start

.. _CherryPy tools: http://docs.cherrypy.org/en/latest/extend.html#tools
.. _GIL: https://docs.python.org/3/glossary.html#term-gil
2 changes: 1 addition & 1 deletion examples/indexers.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Basic indexing and searching example adapted from http://lucene.apache.org/core/4_10_1/core/index.html
Basic indexing and searching example adapted from http://lucene.apache.org/core/7_2_0/core/index.html
"""

import lucene
Expand Down
3 changes: 3 additions & 0 deletions lupyne/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
WebSearchers and WebIndexers can of course also be subclassed for custom interfaces.
CherryPy and Lucene VM integration issues:
* Monitors (such as autoreload) are not compatible with the VM unless threads are attached.
* WorkerThreads must be also attached to the VM.
* VM initialization must occur after daemonizing.
Expand All @@ -49,6 +50,7 @@
import os
import re
import time
import warnings
import lucene
import cherrypy
import clients
Expand Down Expand Up @@ -858,6 +860,7 @@ def start(root=None, path='', config=None, pidfile='', daemonize=False, autorelo
kwargs['urls'] = args.autosync.split(',')
if not (args.autoupdate and len(args.directories) == 1):
parser.error('autosync requires autoupdate and a single directory')
warnings.warn('autosync is not recommended for production usage')
if args.config and not os.path.exists(args.config):
args.config = {'global': json.loads(args.config)}
cls = WebSearcher if read_only else WebIndexer
Expand Down

0 comments on commit de0c885

Please sign in to comment.