Skip to content
This repository


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Mirror of Apache Tika

TIKA-936: encoding of ZipArchiveInputStream

Allow a custom ArchiveStreamFactory instance to be passed through the ParseContext

git-svn-id: 13f79535-47bb-0310-9956-ffa450edef68
latest commit 474a0ac18d
Jukka Zitting jukka authored April 18, 2014
Octocat-spinner-32 src TIKA-966: org.apache.tika.Tika missing from tika-bundle-1.2.jar August 05, 2012
Octocat-spinner-32 tika-app If a mimetype is handeld by a composite parser, report the underlying… April 12, 2014
Octocat-spinner-32 tika-bundle
Octocat-spinner-32 tika-core TIKA-1010 extract embedded documents from RTF April 16, 2014
Octocat-spinner-32 tika-dotnet prepare for next development iteration February 05, 2014
Octocat-spinner-32 tika-java7 prepare for next development iteration February 05, 2014
Octocat-spinner-32 tika-parent prepare for next development iteration February 05, 2014
Octocat-spinner-32 tika-parsers TIKA-936: encoding of ZipArchiveInputStream April 18, 2014
Octocat-spinner-32 tika-server TIKA-1270 Move to a common set of logic to decide what to display, so… April 17, 2014
Octocat-spinner-32 tika-xmp
Octocat-spinner-32 .gitattributes TIKA-431: Tika currently misuses the HTTP Content-Encoding header, an… July 08, 2012
Octocat-spinner-32 .gitignore Add a .gitignore file for people using the git mirrors April 30, 2012
Octocat-spinner-32 CHANGES.txt TIKA-1010 extract embedded documents from RTF April 16, 2014
Octocat-spinner-32 HEADER.txt Add svn:eol-style October 02, 2009
Octocat-spinner-32 KEYS Add my signature to David's GPG key, as used in the 1.5 release March 25, 2014
Octocat-spinner-32 LICENSE.txt TIKA-842 IPTC Metadata Properties, including full descriptions of all… January 27, 2012
Octocat-spinner-32 NOTICE.txt TIKA-842 IPTC Metadata Properties, including full descriptions of all… January 27, 2012
Octocat-spinner-32 README.txt Update copyright year to 2011 February 04, 2011
Octocat-spinner-32 assembly.xml TIKA-281: Use to deploy snapshots and releases September 28, 2009
Octocat-spinner-32 pom.xml
Welcome to Apache Tika  <>

Apache Tika(TM) is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser

Tika is a project of the Apache Software Foundation <>.

Apache Tika, Tika, Apache, the Apache feather logo, and the Apache Tika
project logo are trademarks of The Apache Software Foundation.

Getting Started

Tika is based on Java 5 and uses the Maven 2 <>
build system. To build Tika, use the following command in this directory:

    mvn clean install

The build consists of a number of components, including a standalone runnable
jar that you can use to try out Tika features. You can run it like this:

    java -jar tika-app/target/tika-app-*.jar --help

License (see also LICENSE.txt)

Collective work: Copyright 2011 The Apache Software Foundation.

Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.

Apache Tika includes a number of subcomponents with separate copyright
notices and license terms. Your use of these subcomponents is subject to
the terms and conditions of the licenses listed in the LICENSE.txt file.

Export control

This distribution includes cryptographic software.  The country in  which
you currently reside may have restrictions on the import,  possession, use,
and/or re-export to another country, of encryption software.  BEFORE using
any encryption software, please  check your country's laws, regulations and
policies concerning the import, possession, or use, and re-export of
encryption software, to  see if this is permitted.  See
<> for more information.

The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity Control
Number (ECCN) 5D002.C.1, which includes information security software using
or performing cryptographic functions with asymmetric algorithms.  The form
and manner of this Apache Software Foundation distribution makes it eligible
for export under the License Exception ENC Technology Software Unrestricted
(TSU) exception (see the BIS Export Administration Regulations, Section
740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

    Apache Tika uses the Bouncy Castle generic encryption libraries for
    extracting text content and metadata from encrypted PDF files.
    See for more details on Bouncy Castle.

Mailing Lists

Discussion about Tika takes place on the following mailing lists:    - About using Tika     - About developing Tika

Notification on all code changes are sent to the following mailing list:

The mailing lists are open to anyone and publicly archived.

You can subscribe the mailing lists by sending a message to
<LIST> (for example user-subscribe@...).
To unsubscribe, send a message to <LIST>
For more instructions, send a message to <LIST>

Issue Tracker

If you encounter errors in Tika or want to suggest an improvement or
a new feature, please visit the Tika issue tracker at There you can also find the
latest information on known issues and recent bug fixes and enhancements.
Something went wrong with that request. Please try again.