Skip to content

Plone Search Integration: Addon to index content in Open-/ElasticSearch

License

GPL-2.0, Unknown licenses found

Licenses found

GPL-2.0
LICENSE.GPL
Unknown
LICENSE.rst
Notifications You must be signed in to change notification settings

collective/collective.elastic.plone

Repository files navigation

collective.elastic.plone

OpenSearch or ElasticSearch Integration for Plone content.

It consists of these parts:

  • indexer passing content to a separate running collective.elastic.ingest service.
  • catalog index acting as a proxy to Open-/ElasticSearch, integrates with ZCatalog. I.e. use as drop-in replacement for SearchableText index.
  • custom plugins for plone.restapi to provide structural information for the ingestion service
  • REST API endpoint @kitsearch accepting Open-/ ElasticSearch query returning results with Plone permission check.

You need a working collective.elastic.ingest (version 2.x) service running. This implies a running Redis instance and a running Open- xor ElasticSearch instance.

Add collective.elastic.plone[redis,opensearch]>=2.0.0b11 to your requirements.txt (alternatively use a constraints.txt for version pinning).

The extra requirements are needed for the queue server and index server used and may vary, see below. Alternatively add it to your pyproject.toml as dependencies (or in case of legacy code to setup.[py|cfg]).

Provide and source an environments variable file (i.e. .env) in your backend directory before Plone startup with:

export INDEX_SERVER=localhost:9200
export INDEX_USE_SSL=1
export INDEX_OPENSEARCH=1
export INDEX_LOGIN=admin
export INDEX_PASSWORD=admin
export INDEX_NAME=plone
export CELERY_BROKER=redis://localhost:6379/0

Install collective.elastic.plone[redis,opensearch] by adding it to your buildout. The extra requirements are needed for the queue server and index server used may vary, see below. Environment may vary too. Also, see below.

[buildout]

# ...

eggs =
    # ...
    collective.elastic.plone[redis,opensearch]

environment-vars +=
    INDEX_SERVER=localhost:9200
    INDEX_USE_SSL=1
    INDEX_OPENSEARCH=1
    INDEX_LOGIN=admin
    INDEX_PASSWORD=admin
    INDEX_NAME=plone
    CELERY_BROKER=redis://localhost:6379/0

[versions]
collective.elastic.plone = 2.0.0

and run bin/buildout

Depending on the queue server and index server used, the extra requirements vary:

  • queue server: redis or rabbitmq.
  • index server: opensearch or elasticsearch.

After startup you need to install the addon in Plone via the Addons control panel. This replaces the SearchableText index with the proxy index and a minimal configuration. Best is to alter the configuration to the projects needs.

To index all content in the catalog, append /@@update-index-server-index to the URL of your Plone site. This queues all content for indexing in ElasticSearch (but not in the ZCatalog). Alternatively a reindex catalog (in ZMI under advanced tab) works too.

New or modified content is queued for indexing automatically.

The proxy index works out of the box in Volto.

However, in Volto a direct (and much faster) search is possible by using the @kitsearch endpoint, bypassing the catalog. The endpoint takes a native Open-/ ElasticSearch query and returns the results with Plone permission check.

The Volto add-on volto-searchkit-block provides a configurable block using this endpoint.

Remark: For security reasons, in collective.elastic.plone 2.0.0 the @kitsearch endpoint always overrides any "API URL" and "API index" settings with the configured values from the environment.

Global configuration is done via environment variables.

Each catalog proxy-index has its distinct JSON configuration.

Environment variables are:

INDEX_SERVER

The URL of the ElasticSearch or OpenSearch server.

Default: localhost:9200

INDEX_NAME

The name of the index to use at the ElasticSearch or OpenSearch service.

Default: plone

INDEX_USE_SSL

Whether to use a secure connection or not.

Default: 0

INDEX_OPENSEARCH

Whether to use OpenSearch or ElasticSearch.

Default: 1

INDEX_LOGIN

Username for the ElasticSearch 8+ or OpenSearch 2 server. For the Plone addon read access is enough.

Default: admin

INDEX_PASSWORD

Password of the above user

Default: admin

CELERY_BROKER

The broker URL for Celery. See docs.celeryq.dev for details.

Default: redis://localhost:6379/0

Through-the-web, the proxy-index can be configured in the Zope Management Interface (ZMI) under portal_catalog, then click on the ElasticSearchProxyIndex (i.e. SearchableText).

In the file system it can be configured as any other index in the portal_catalog tool using a GenericSetup profile and placing a catalog.xml file in there. The index configuration looks like so:

<index meta_type="ElasticSearchProxyIndex"
        name="SearchableText"
>
    <querytemplate>
{
    "query": {
        "multi_match": {
            "query": "{{keys[0]}}",
            "fields": [
                "title*^1.9",
                "description*^1.5",
                "text.data*^1.2",
                "blocks_plaintext*^1.2"
                "file__extracted.content*"
            ],
            "analyzer": "{{analyzer}}","operator": "or",
            "fuzziness": "AUTO",
            "prefix_length": 1,
            "type": "most_fields",
            "minimum_should_match": "75%"
        }
    }
}
    </querytemplate>
</index>

It uses Jinja2 templates to inject the search term into the query. Available variables are:

keys
a list of search terms, usually just one.
language
the current language of the portal.
analyzer
the name of the analyzer for the query based on the language. This is hardcoded by now. If there is no analyzer for the language, the standard analyzer is used.

The resulting query needs to be a valid OpenSearch Query DSL or ElasticSearch Query DSL text.

The sources are in a GIT DVCS with its main branches at github. There you can report issue too.

We'd be happy to see many forks and pull-requests to make this addon even better.

Maintainers are Jens Klein, Peter Holzer and the BlueDynamics Alliance developer team. We appreciate any contribution and if a release is needed to be done on PyPI, please just contact one of us. We also offer commercial support if any training, coaching, integration or adaptions are needed.

Releases are done using the Github Release feature and PyPI trusted publishing. Never use a different release process! If in doubt ask Jens.

Idea and testing: Peter Holzer

Initial concept & code by Jens W. Klein (Github: @jensens)

Contributors:

  • Katja Süss (Github: @ksuess)

The project is licensed under the GPLv2.