Available as of Camel version 2.19
The Tika: components provides the ability to detect and parse documents with Apache Tika. This component uses Apache Tika as underlying library to work with documents.
In order to use the Tika component, Maven users will need to add the
following dependency to their pom.xml
:
pom.xml
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-tika</artifactId>
<version>x.x.x</version>
<!-- use the same version as your Camel core version -->
</dependency>
The TIKA component only supports producer endpoints.
The Tika component has no options.
The Tika endpoint is configured using URI syntax:
tika:operation
with the following path and query parameters:
Name | Description | Default | Type |
---|---|---|---|
operation |
Required Tika Operation. parse or detect |
TikaOperation |
Name | Description | Default | Type |
---|---|---|---|
tikaConfig (producer) |
Tika Config |
TikaConfig |
|
tikaConfigUri (producer) |
Tika Config Uri: The URI of tika-config.xml |
String |
|
tikaParseOutputEncoding (producer) |
Tika Parse Output Encoding - Used to specify the character encoding of the parsed output. Defaults to Charset.defaultCharset() . |
String |
|
tikaParseOutputFormat (producer) |
Tika Output Format. Supported output formats. xml: Returns Parsed Content as XML. html: Returns Parsed Content as HTML. text: Returns Parsed Content as Text. textMain: Uses the boilerpipe library to automatically extract the main content from a web page. |
xml |
TikaParseOutputFormat |
synchronous (advanced) |
Sets whether synchronous processing should be strictly used, or Camel is allowed to use asynchronous processing (if supported). |
false |
boolean |
The component supports 2 options, which are listed below.
Name | Description | Default | Type |
---|---|---|---|
camel.component.tika.enabled |
Enable tika component |
true |
Boolean |
camel.component.tika.resolve-property-placeholders |
Whether the component should resolve property placeholders on itself when starting. Only properties which are of String type can use property placeholders. |
true |
Boolean |
The file should be placed in the Body.
from("direct:start")
.to("tika:detect");