Extending Dokang

Dokang currently supports a single backend: Whoosh. Whoosh is responsible for the indexation and the actual search. As of now, Dokang does not let you easily use another backend such as Elasticsearch. Contributions are welcome.

However, you may want to add your own harvester. The harvester is responsible for retrieving data (title and content) from a document. Dokang provides a few harvesters but you may implement your own.

A harvester should be a subclass of dokang.harvesters.Harvester and implement a harvest_file(path) method that should return a dictionary with the following keys. All values should be text-like: a string (in Python 3) or a unicode object (in Python 2).

title: The title of the document.
content: The concatenated content of the document.
kind: The kind of document: HTML, PDF, etc.

Here is an example of a simple harvester for text files.

import codecs
import os

from dokang.harvesters import Harvester

class TextHarvester(Harvester):

    def harvest_file(path):
        with codecs.open(path, encoding='utf-8') as fp:
            return {
                'title': os.path.basename(path),  # Use the filename as the title
                'content: 'fp.read()',
                'kind': 'TXT',
            }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extending.rst

extending.rst

Extending Dokang

Files

extending.rst

Latest commit

History

extending.rst

File metadata and controls

Extending Dokang