Skip to content

Commit

Permalink
Tika may "hang up" client application
Browse files Browse the repository at this point in the history
Sometimes Tika may crash while parsing some files.  In this case it may generate just runtime errors (Throwable), not  TikaException.
But there is no “catch” clause for  Throwable in the AttachmentMapper.java :

        String parsedContent;
        try {
            // Set the maximum length of strings returned by the parseToString method, -1 sets no limit
            parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars);
        } catch (TikaException e) {
            throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
        }

As a result,  tika() may “hang up” the whole application.
(we have some pdf-files that "hang up" Elastic client if you try to parse them using mapper-attahcment plugin)

We propose the following fix:

        String parsedContent;
        try {
            // Set the maximum length of strings returned by the parseToString method, -1 sets no limit
            parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars);
        } catch (Throwable e) {
            throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
        }

(just replace “TikaException” with “Throwable” – it works for our cases)

Thank you!
Closes elastic#21.
  • Loading branch information
dadoonet committed Aug 20, 2013
1 parent 0fff26f commit d6aa2f0
Showing 1 changed file with 1 addition and 2 deletions.
Expand Up @@ -19,7 +19,6 @@

package org.elasticsearch.index.mapper.attachment;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.elasticsearch.common.io.stream.BytesStreamInput;
import org.elasticsearch.common.logging.ESLogger;
Expand Down Expand Up @@ -356,7 +355,7 @@ public void parse(ParseContext context) throws IOException {
try {
// Set the maximum length of strings returned by the parseToString method, -1 sets no limit
parsedContent = tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars);
} catch (TikaException e) {
} catch (Throwable e) {
throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
}

Expand Down

0 comments on commit d6aa2f0

Please sign in to comment.