Skip to content
Use the Java Tika text extraction library on the .NET platform
C# F# Batchfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.paket
src
.editorconfig
.gitattributes Code clean up and dotfile support (#96) Apr 23, 2017
.gitignore Fix build automation to use latest msbuild Apr 3, 2018
Contributing.md
Developers.md
LICENSE
Readme.md Fix broken links in docs Dec 30, 2016
Release-Notes.md
Thanks.md
appveyor.yml Change appveyor to use VS2017 Apr 3, 2018
build.cmd Tika-app dll is now compiled via automation Apr 11, 2016
build.fsx
paket.dependencies
paket.lock Fix build automation to use latest msbuild Apr 3, 2018

Readme.md

Tika on .NET

Build status NuGet version

This project is a simple wrapper around the very excellent and robust Tika text extraction Java library. This project produces two nugets:

  • TikaOnDotNet - A straight IKVM hosted port of Java Tika project.

Install-Package TikaOnDotNet

  • TikaOnDotNet.TextExtractor - Use Tika to extract text from rich documents.

Install-Package TikaOnDotNet.TextExtractor

Getting Started

The best way to get started is to:

  • Add a Nuget dependency to TikaOnDotNet.TextExtractor.
  • Instantiate a new TextExtractor object and call one of the Extract methods.

Usage

// using TikaOnDotNet.TextExtractor;

var textExtractor = new TextExtractor();

var wordDocContents = textExtractor.Extract(@".\path\to\my favorite word.docx");
var webPageContents = textExtractor.Extract(new Uri("https://google.com"));

Take a look at our tests for more usage examples.

How To Contribute

Have an idea to make this project better? Great! Start out by taking a look at our Contributing Guide.

Having A Problem?

Search in the Issues as your problem may be a common one. If don't find your problem please create an issue. Contributors here will chime in when they can.

You can’t perform that action at this time.