Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for 'attachment' data type #12

Closed
frague59 opened this issue Nov 3, 2015 · 8 comments
Closed

Support for 'attachment' data type #12

frague59 opened this issue Nov 3, 2015 · 8 comments

Comments

@frague59
Copy link

frague59 commented Nov 3, 2015

Hi,

I'm trying to use haystack + elasticsearch to populate an index with documents, aka files, using the mapper-attachments elasticsearch plugin.

My files are uploaded to the index as base64 streams, I've built a SearchField that provides this functionnality. My files are visible as a string in the elasticsearch index, it has been uploaded.

My question is : How can I bind the field type 'attachment' into the mappings ?
As I saw in the haystack code, it uses a hard coded set of rules, and does not provide this kind of mappings. In elasticstack, show_mapping --detail uses the same mapping stuff, so I'm stuck.

Have you any idea on how to provide this functionality ? Thanks !

@bennylope
Copy link
Owner

I can only give this a cursory answer at the moment, but the show_mapping management command here is used to show the search mapping you're using primarily as a debug tool for building your own. You'd do that by creating your own mapping named ELASTICSEARCH_INDEX_SETTINGS and then to enable that mapping, use the bundled backend.

Does that answer your question?

@frague59
Copy link
Author

frague59 commented Nov 5, 2015

OK,
I've added my mapping :

 ELASTICSEARCH_INDEX_SETTINGS = {
        'mappings': {
            'document': {
                'properties': {'document_file': {'type': 'attachment'}}
            }
        },
        ... the rest of the default mapping provided as default ...

but when I do a :

$ ./manage.py show_mapping --detail

document file is a string.

@frague59
Copy link
Author

frague59 commented Nov 5, 2015

I've tryed also to put my mapping definition to the settings section, but I 've the same result.

@frague59
Copy link
Author

frague59 commented Nov 5, 2015

My default analyzer is "french" :

ELASTICSEARCH_DEFAULT_ANALYZER = 'french'

@frague59
Copy link
Author

frague59 commented Nov 5, 2015

Hi, I'm back with my attachments... and I'got it !

I've MANUALY updated the FIELD_MAPPINGS in the haystack elasticsearch_bakends.py file, which work and gives the correct mapping from the server.

So I have to find a way to ovrride the build_schema method in your ConfigurableElasticBackend to use this feature :

  • get the default mapping from settings
  • build schema with this mapping

I'll give you my piece of code after...

Thanks !

@frague59
Copy link
Author

frague59 commented Nov 5, 2015

OK, I've got it :

  • This provides a way of gettings mappings from settings :
DEFAULT_FIELD_MAPPING = {'type': 'string', 'analyzer': 'snowball'}
FIELD_MAPPINGS = {
    'edge_ngram': {'type': 'string', 'analyzer': 'edgengram_analyzer'},
    'ngram': {'type': 'string', 'analyzer': 'ngram_analyzer'},
    'date': {'type': 'date'},
    'datetime': {'type': 'date'},

    'location': {'type': 'geo_point'},
    'boolean': {'type': 'boolean'},
    'float': {'type': 'float'},
    'long': {'type': 'long'},
    'integer': {'type': 'long'},
    'attachment': {'type': 'attachment'}, # I've added as default the attachment 
}

def get_default_field_mappings():

    default_field_mappings = getattr(settings, 'ELASTICSEARCH_DEFAULT_FIELD_MAPPINGS', DEFAULT_FIELD_MAPPING)
    return default_field_mappings


def get_field_mappings():
    """
    Gets the field_mappings from settings `ELASTICSEARCH_FIELD_MAPPINGS` if exists,
    otherwise returns FIELD_MAPPINGS dict.
    :return: dict of mappings from field types to properties
    """
    field_mappings = getattr(settings, 'ELASTICSEARCH_FIELD_MAPPINGS', FIELD_MAPPINGS)
    return field_mappings
  • And the backend class :
class ExtendedElasticsearchBackend(ConfigurableElasticBackend):
    """
    Adds `attachment` support for elasticsearch backend settings
    """

    def build_schema(self, fields):
        """
        Merge from haystack and elasticstack elasticsearch backend `build_shema` methods.
        It provides an additional feuture : custom field mappings, from settings or default FIELD_MAPPINGS dict.
        :param fields:
        :return:
        """
        content_field_name = ''
        mapping = {
            DJANGO_CT: {'type': 'string', 'index': 'not_analyzed', 'include_in_all': False},
            DJANGO_ID: {'type': 'string', 'index': 'not_analyzed', 'include_in_all': False},
        }
        field_mappings = get_field_mappings()
        default_field_mappings = get_default_field_mappings()

        for field_name, field_class in fields.items():
            field_mapping = field_mappings.get(field_class.field_type, default_field_mappings).copy()
            if field_class.boost != 1.0:
                field_mapping['boost'] = field_class.boost

            if field_class.document is True:
                content_field_name = field_class.index_fieldname

            # Do this last to override `text` fields.
            if field_mapping['type'] == 'string' and field_class.indexed:
                if not hasattr(field_class, 'facet_for') and not field_class.field_type in ('ngram', 'edge_ngram'):
                    field_mapping['analyzer'] = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER)

            mapping[field_class.index_fieldname] = field_mapping

        return content_field_name, mapping


class ExtendedElasticSearchEngine(ConfigurableElasticSearchEngine):
    backend = ExtendedElasticsearchBackend

This class can be used as a backend, it's a quick and dirty merge of the haystack and elasticstack elasticsearch backends.

  • my AttachmentField class. (I actualy uses Filer as file management application)
from filer.models import File as fi_File

class AttachmentField(SearchField):
    field_type = 'attachment'
    author_field = 'author'

    def __init__(self, **kwargs):
        if 'content_type_field' in kwargs:
            self.content_type_field = kwargs.pop('content_type_field')
        if 'author_field' in kwargs:
            self.author_field = kwargs.pop('author_field')

        super(AttachmentField, self).__init__(**kwargs)

    def convert(self, value):
        if isinstance(value, fi_File):
            field_file = value.file.file
            name = value.label
            content_length = len(field_file)
            content_type = get_content_type(name)
            try:
                content = base64.b64encode(field_file.read())
            except AttributeError:
                content = base64.b64encode(field_file)

        else:  # isinstance(field, dj_File):
            field_file = value
            content_length = len(field_file)
            content_type = None
            name = None
            try:
                content = base64.b64encode(field_file.read())
            except AttributeError:
                content = base64.b64encode(field_file)

        output = {'_language': 'fr',
                  '_content': content,
                  '_content_type': content_type,
                  '_name': name,
                  '_title': name,
                  '_content_length': content_length
                  }
        return output

... And it seems to work !
You can try this code, and if you can adapt it to integrate into elasticsearch. It looks ti work here...
Thanks !

@frague59
Copy link
Author

frague59 commented Nov 6, 2015

My final version :

https://gist.github.com/frague59/aab071f0bdce5b010ce4

Do WTF you want with it ;)

@frague59 frague59 closed this as completed Nov 6, 2015
@cstrap
Copy link

cstrap commented Nov 9, 2015

Thanks @frague59 for sharing! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants