Server documentation and examples.

coady · Dec 29, 2017 · de0c885 · de0c885
1 parent 4f2ac31
commit de0c885
Show file tree

Hide file tree

Showing 5 changed files with 45 additions and 21 deletions.
diff --git a/README.rst b/README.rst
@@ -1,5 +1,3 @@
-About Lupyne
-==================
 .. image:: https://img.shields.io/pypi/v/lupyne.svg
    :target: https://pypi.org/project/lupyne/
 .. image:: https://img.shields.io/pypi/pyversions/lupyne.svg
@@ -9,28 +7,42 @@ About Lupyne
 .. image:: https://api.shippable.com/projects/56059e3e1895ca4474182ec3/coverageBadge?branch=master
    :target: https://app.shippable.com/github/coady/lupyne
 
-The core engine is a high level interface to `PyLucene`_, which is a Python extension for accessing the popular Java Lucene search engine.
-Lucene has a reputation for being a relatively low-level toolkit, and the goal of PyLucene is to wrap it through automatic code generation.
-So although PyLucene transforms Java idioms to Python idioms where possible, the resulting interface is far from Pythonic.
+Lupyne is a search engine based on `PyLucene`_, the Python extension for accessing Java Lucene.
+Lucene is a relatively low-level toolkit, and PyLucene wraps it through automatic code generation.
+So although Java idioms are translated to Python idioms where possible, the resulting interface is far from Pythonic.
 See ``./examples`` for comparisons with the Lucene API.
 
-A RESTful JSON search server, based on `CherryPy`_.
-Many python applications which require better search capabilities are migrating from using conventional client-server databases,
-whereas Lucene is an embedded search library.  Solr and Elasticsearch are popular options for remote searching and advanced features,
-but then any customization beyond the REST API is difficult and coupled to Java.
-Using a python web framework instead can provide the best of both worlds, e.g., batch indexing offline and remote searching live.
+Lupyne also provides a RESTful JSON search server, based on `CherryPy`_.
+Note Solr and Elasticsearch are popular options for Lucene-based search, if no further (Python) customization is needed.
+So while the server is suitable for production usage, its primary motivation is to be an extensible example.
+
+Not having to initially choose between an embedded library and a server not only provides greater flexibility,
+it can provide better performance, e.g., batch indexing offline and remote searching live.
+Additionally only lightweight wrappers with extended behavior are used wherever possible,
+so falling back to using PyLucene directly is always an option, but should never necessary for performance.
+
+Usage
+==================
+PyLucene requries initializing the VM.
+
+.. code-block:: python
+
+   import lucene
+
+   lucene.initVM()
+
+Indexes are accessed through an `IndexSearcher` (read-only), `IndexWriter`, or the combined `Indexer`.
+
+.. code-block:: python
+
+   from lupyne import engine
 
-A simple client to make interacting with the server as convenient as an RPC interface.
-It handles all of the HTTP interactions, with support for compression, json, and connection reuse.
+   searcher = engine.IndexSearcher('index/path')
+   hits = searcher.search('text:query')
 
-Advanced search features:
+Run the server. ::
 
-* Automatic updating and syncing to support replication.
-* Optimized faceted and grouped search.
-* Optimized prefix and range queries.
-* Geospatial support.
-* Spellchecking.
-* Near real-time indexing.
+   $ python -m lupyne.server
 
 Read the `documentation`_.
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -8,7 +8,6 @@ Lupyne's documentation
 Lupyne_ is:
  * a high-level Pythonic search `engine <engine.html>`_ library, built on PyLucene_
  * a RESTful_ JSON_ search `server <server.html>`_, built on CherryPy_
- * a simple Python `client <client.html>`_ for interacting with the server
 
 Quickstart
 ^^^^^^^^^^

diff --git a/docs/server.rst b/docs/server.rst
@@ -4,6 +4,15 @@ server
   :cwd: ..
 .. automodule:: lupyne.server
 .. note:: Lucene doc ids are ephemeral;  only use doc ids across requests for the same index version.
+.. warning:: Autosyncing is not recommended for production.
+
+Lucene index files are incremental, so synchronizing files and refreshing searchers is a viable replication strategy.
+The `autoupdate` and `autosync` features demonstrate this, but are not meant to recommend HTTP for file syncing.
+Autoupdating is considered production-ready; autosyncing is not.
+
+CherryPy was chosen because not only is it well suited to exposing APIs, but it includes a production multithreaded server.
+Lucene caches heavily, and the PyLucene is not bound by the `GIL`_ when in the Java VM.
+So threads are a natural choice for a worker pool, even if a different concurrency model is used for HTTP.
 
 tools
 -----------
@@ -44,3 +53,4 @@ start
 .. autofunction:: start
 
 .. _CherryPy tools: http://docs.cherrypy.org/en/latest/extend.html#tools
+.. _GIL: https://docs.python.org/3/glossary.html#term-gil
diff --git a/examples/indexers.py b/examples/indexers.py
@@ -1,5 +1,5 @@
 """
-Basic indexing and searching example adapted from http://lucene.apache.org/core/4_10_1/core/index.html
+Basic indexing and searching example adapted from http://lucene.apache.org/core/7_2_0/core/index.html
 """
 
 import lucene

diff --git a/lupyne/server.py b/lupyne/server.py
@@ -34,6 +34,7 @@
 WebSearchers and WebIndexers can of course also be subclassed for custom interfaces.
 
 CherryPy and Lucene VM integration issues:
+
  * Monitors (such as autoreload) are not compatible with the VM unless threads are attached.
  * WorkerThreads must be also attached to the VM.
  * VM initialization must occur after daemonizing.
@@ -49,6 +50,7 @@
 import os
 import re
 import time
+import warnings
 import lucene
 import cherrypy
 import clients
@@ -858,6 +860,7 @@ def start(root=None, path='', config=None, pidfile='', daemonize=False, autorelo
         kwargs['urls'] = args.autosync.split(',')
         if not (args.autoupdate and len(args.directories) == 1):
             parser.error('autosync requires autoupdate and a single directory')
+        warnings.warn('autosync is not recommended for production usage')
     if args.config and not os.path.exists(args.config):
         args.config = {'global': json.loads(args.config)}
     cls = WebSearcher if read_only else WebIndexer