Plugins::Attachments: Add an attachements plugin (support parsing various file formats)

Using the new plugins system, implement the `attachments` plugin, allow to add a mapping type called `attachment` which accepts a binary input (base64) of an attachment to index.

Installation is simple, just download the plugin zip file and place it under `plugins` directory within the installation. When building from source, the plugin will be under `build/distributions/plugins`. Once placed in the installation, the `attachment` mapper type will be automatically supported.

Using the `attachment` type is simple, in your mapping JSON, simply a certain JSON element as `attachment`, for example:

```
{
    person : {
        properties : {
            "myAttachment" : { type : "attachment" }
        }
    }
}
```

In this case, the JSON to index can be:

```
{
    myAttachment : "... base64 encoded attachment ..."
}
```

The `attachment` type not only indexes the content of the doc, but also automatically adds meta data on the attachment as well (when available). The metadata supported are: `date`, `title`, `author`, and `keywords`. They can be queries using the "dot notation", for example: `myAttachment.author`.

Both the meta data and the actual content are simple core type mappers (`string`, `date`, ...), thus, they can be controlled in the mappings. For example:

```
{
    person : {
        properties : {
            "file" : { 
                type : "attachment",
                fields : {
                    file : {index : "no"},
                    date : {store : "yes"},
                    author : {analyzer: "myAnalyzer"}
                }
            }
        }
    }
}
```

In the above example, the actual content indexed is mapped under `fields` name `file`, and we decide not to index it, so it will only be available in the `_all` field. The other `fields` map to their respective metadata names, but there is no need to specify the `type` (like `string` or `date`) since it is already known.

The plugin uses Apache Tika (http://lucene.apache.org/tika/) to parse it, so many formats are supported, listed here: http://lucene.apache.org/tika/0.6/formats.html.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plugins::Attachments: Add an attachements plugin (support parsing various file formats) #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plugins::Attachments: Add an attachements plugin (support parsing various file formats) #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions