Skip to content

Latest commit

 

History

History
45 lines (32 loc) · 1.32 KB

extending.rst

File metadata and controls

45 lines (32 loc) · 1.32 KB

Extending Dokang

Dokang currently supports a single backend: Whoosh. Whoosh is responsible for the indexation and the actual search. As of now, Dokang does not let you easily use another backend such as Elasticsearch. Contributions are welcome.

However, you may want to add your own harvester. The harvester is responsible for retrieving data (title and content) from a document. Dokang provides a few harvesters but you may implement your own.

A harvester should be a subclass of dokang.harvesters.Harvester and implement a harvest_file(path) method that should return a dictionary with the following keys. All values should be text-like: a string (in Python 3) or a unicode object (in Python 2).

title

The title of the document.

content

The concatenated content of the document.

kind

The kind of document: HTML, PDF, etc.

Here is an example of a simple harvester for text files.

import codecs
import os

from dokang.harvesters import Harvester

class TextHarvester(Harvester):

    def harvest_file(path):
        with codecs.open(path, encoding='utf-8') as fp:
            return {
                'title': os.path.basename(path),  # Use the filename as the title
                'content: 'fp.read()',
                'kind': 'TXT',
            }