tika is a Dart wrapper around the Apache Tika command-line interface. It uses
dart:io to spawn a local Tika process, stream extracted text back as
Stream<String>, and optionally collect the entire document into a single
String.
This package assumes Apache Tika is already installed on the host machine or is
available through java -jar /path/to/tika-app.jar.
- Extract document text through the
tikaCLI. - Stream stdout as
Stream<String>for server-side pipelines. - Stream extracted text directly into a file without buffering the full payload in memory.
- Read the full extracted text with a single
Future<String>. - Support both PATH-based installs and explicit
java -jar tika-app.jarexecution.
Add the package to your project:
dart pub add tikaYour runtime environment must also have:
- A Java runtime.
- Apache Tika installed and available on
PATH, or a downloadedtika-app.jar.
Use the default constructor when tika is already available on PATH:
import 'package:tika/tika.dart';
Future<void> main() async {
TikaClient tika = TikaClient();
String text = await tika.readText(
documentPath: '/srv/documents/report.pdf',
);
print(text);
}Stream text chunks directly from the Tika process:
import 'dart:io';
import 'package:tika/tika.dart';
Future<void> main() async {
TikaClient tika = TikaClient();
await for (String chunk in tika.streamText(
documentPath: '/srv/documents/invoice.docx',
)) {
stdout.write(chunk);
}
}Write large extracted payloads straight to disk:
import 'dart:io';
import 'package:tika/tika.dart';
Future<void> main() async {
TikaClient tika = TikaClient();
await tika.streamToFile(
documentPath: '/srv/documents/archive.pdf',
file: File('/srv/output/archive.txt'),
);
}Use the jar constructor when you want to run an explicit Tika jar:
import 'package:tika/tika.dart';
Future<void> main() async {
TikaClient tika = TikaClient.jar(
jarPath: '/opt/tika/tika-app.jar',
);
String text = await tika.readText(
documentPath: '/srv/documents/contract.pdf',
);
print(text);
}Apache Tika 3.3.0 is published by the Apache project as tika-app-3.3.0.jar,
and the CLI documentation shows the supported java -jar tika-app.jar --text
command shape. A minimal Ubuntu-based Docker image needs:
default-jre-headlesscurlca-certificates
Example Dockerfile snippet:
FROM dart:stable
ARG TIKA_VERSION=3.3.0
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
default-jre-headless \
curl \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /opt/tika \
&& curl -fsSL "https://archive.apache.org/dist/tika/${TIKA_VERSION}/tika-app-${TIKA_VERSION}.jar" \
-o /opt/tika/tika-app.jar \
&& printf '#!/bin/sh\nexec java -jar /opt/tika/tika-app.jar "$@"\n' > /usr/local/bin/tika \
&& chmod +x /usr/local/bin/tikaThat wrapper script makes tika --text /path/to/file.pdf available to your
Dart server, which means TikaClient() can use the default executable without
extra configuration.
If you prefer not to create a shell wrapper, point the package at the jar directly:
TikaClient tika = TikaClient.jar(
jarPath: '/opt/tika/tika-app.jar',
);For local development on macOS, the easiest path is Homebrew:
brew install tikaThis installs the tika executable and pulls in the required Java dependency.
After installation, verify it is available:
tika --versionThen your local Dart app can use:
TikaClient tika = TikaClient();- This package shells out to an installed binary and does not bundle Apache Tika itself.
- If the process cannot be started, or if Tika exits with a non-zero code, the
package throws
TikaException. streamToFile()usesstreamText()internally so very large extracted text can be written incrementally instead of building one giant in-memory string.streamLines()is available when line-by-line consumption is easier than raw text chunks.