pysolr
is a lightweight Python wrapper for Apache Solr. It provides an
interface that queries the server and returns results based on the query.
- Basic operations such as selecting, updating & deleting.
- Index optimization.
- "More Like This" support (if set up in Solr).
- Spelling correction (if set up in Solr).
- Timeout support.
- SolrCloud awareness
- Python 2.7 - 3.6
- Requests 2.9.1+
- Optional -
simplejson
- Optional -
kazoo
for SolrCloud mode
pysolr is on PyPI:
$ pip install pysolr
Or if you want to install directly from the repository: python setup.py install
, or drop the pysolr.py
file anywhere on your PYTHONPATH
.
Basic usage looks like:
# If on Python 2.X
from __future__ import print_function
import pysolr
# Setup a Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8983/solr/', timeout=10, auth=<type of authentication>)
# How you'd index data.
solr.add([
{
"id": "doc_1",
"title": "A test document",
},
{
"id": "doc_2",
"title": "The Banana: Tasty or Dangerous?",
"_doc": [
{ "id": "child_doc_1", "title": "peel" },
{ "id": "child_doc_2", "title": "seed" },
]
},
])
# Note that the add method has commit=True by default, so this is
# immediately committed to your index.
# You can index a parent/child document relationship by
# associating a list of child documents with the special key '_doc'. This
# is helpful for queries that join together conditions on children and parent
# documents.
# Later, searching is easy. In the simple case, just a plain Lucene-style
# query is fine.
results = solr.search('bananas')
# The ``Results`` object stores total results found, by default the top
# ten most relevant results and any additional data like
# facets/highlighting/spelling/etc.
print("Saw {0} result(s).".format(len(results)))
# Just loop over it to access the results.
for result in results:
print("The title is '{0}'.".format(result['title']))
# For a more advanced query, say involving highlighting, you can pass
# additional options to Solr.
results = solr.search('bananas', **{
'hl': 'true',
'hl.fragsize': 10,
})
# You can also perform More Like This searches, if your Solr is configured
# correctly.
similar = solr.more_like_this(q='id:doc_2', mltfl='text')
# Finally, you can delete either individual documents,
solr.delete(id='doc_1')
# also in batches...
solr.delete(id=['doc_1', 'doc_2'])
# ...or all documents.
solr.delete(q='*:*')
# For SolrCloud mode, initialize your Solr like this:
zookeeper = pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")
solr = pysolr.SolrCloud(zookeeper, "collection1", auth=<type of authentication>)
Simply point the URL to the index core:
# Setup a Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', timeout=10)
# Setup a Solr instance. The trailing slash is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', search_handler='/autocomplete', use_qt_param=False)
If use_qt_param
is True
it is essential that the name of the handler is exactly what is configured
in solrconfig.xml
, including the leading slash if any (though with the qt
parameter a leading slash is not
a requirement by SOLR). If use_qt_param
is False
(default), the leading and trailing slashes can be
omitted.
If search_handler
is not specified, pysolr will default to /select
.
The handlers for MoreLikeThis, Update, Terms etc. all default to the values set in the solrconfig.xml
SOLR ships
with: mlt
, update
, terms
etc. The specific methods of pysolr's Solr
class (like more_like_this
,
suggest_terms
etc.) allow for a kwarg handler
to override that value. This includes the search
method.
Setting a handler in search
explicitly overrides the search_handler
setting (if any).
# Setup a Solr instance in a kerborized enviornment
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)
solr = pysolr.Solr('http://localhost:8983/solr/', auth=kerberos_auth)
# Setup a CloudSolr instance in a kerborized environment
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)
zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
solr = pysolr.SolrCloud(zookeeper, "collection", auth=kerberos_auth)
# Setup a Solr instance in an https environment
solr = pysolr.Solr('http://localhost:8983/solr/', verify=path/to/cert.pem)
# Setup a CloudSolr instance in a kerborized environment
zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
solr = pysolr.SolrCloud(zookeeper, "collection", verify=path/to/cert.perm)
pysolr
is licensed under the New BSD license.
The run-tests.py
script will automatically perform the steps below and is recommended for testing by
default unless you need more control.
Downloading, configuring and running Solr 4 looks like this:
./start-solr-test-server.sh
The test suite requires the unittest2 library:
Python 2:
python -m unittest2 tests
Python 3:
python3 -m unittest tests