Skip to content

Commit

Permalink
Use _language field instead of language
Browse files Browse the repository at this point in the history
 When we want to force a language instead of using Tika language detection, we set `language` field in documents.

 To be consistent with other forced fields, `_content_type` and `_name`, we should prefix `language` field by an underscore `_`.

 So `language` become `_language`.

 We first deprecate `language` in version 2.1.0 and we remove it in 2.3.0.

 Closes elastic#68.

(cherry picked from commit 2f46343)
  • Loading branch information
dadoonet committed Jun 3, 2014
1 parent 7c1c201 commit 94cf141
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 4 deletions.
14 changes: 12 additions & 2 deletions README.md
Expand Up @@ -46,13 +46,14 @@ In this case, the JSON to index can be:
}
```

Or it is possible to use more elaborated JSON if content type or resource name need to be set explicitly:
Or it is possible to use more elaborated JSON if content type, resource name or language need to be set explicitly:

```javascript
{
"my_attachment" : {
"_content_type" : "application/pdf",
"_name" : "resource/name/of/my.pdf",
"_language" : "en",
"content" : "... base64 encoded attachment ..."
}
}
Expand Down Expand Up @@ -121,7 +122,16 @@ By default, language detection is disabled (`false`) as it could come with a cos
This default value can be changed by setting the `index.mapping.attachment.detect_language` setting.
It can also be provided on a per document indexed using the `_detect_language` parameter.

Note, this feature is supported since `2.0.0` version.
Note that you can force language using `_language` field when sending your actual document:

```javascript
{
"my_attachment" : {
"_language" : "en",
"content" : "... base64 encoded attachment ..."
}
}
```

Highlighting attachments
------------------------
Expand Down
Expand Up @@ -352,8 +352,11 @@ public void parse(ParseContext context) throws IOException {
} else if ("_name".equals(currentFieldName)) {
name = parser.text();
} else if ("language".equals(currentFieldName)) {
// TODO should be _language
// TODO deprecated form. Will be removed in 2.3
language = parser.text();
logger.debug("`language` is now deprecated. Use `_language`. See https://github.com/elasticsearch/elasticsearch-mapper-attachments/issues/68");
} else if ("_language".equals(currentFieldName)) {
language = parser.text();
}
} else if (token == XContentParser.Token.VALUE_NUMBER) {
if ("_indexed_chars".equals(currentFieldName) || "_indexedChars".equals(currentFieldName)) {
Expand Down
Expand Up @@ -74,7 +74,7 @@ private void testLanguage(String filename, String expected, String... forcedLang
.field("content", html);

if (forcedLanguage.length > 0) {
xcb.field("language", forcedLanguage[0]);
xcb.field("_language", forcedLanguage[0]);
}

xcb.endObject().endObject();
Expand Down

0 comments on commit 94cf141

Please sign in to comment.