Tika on .NET
This project is a simple wrapper around the very excellent and robust Tika text extraction Java library. This project produces two nugets:
- TikaOnDotNet - A straight IKVM hosted port of Java Tika project.
- TikaOnDotNet.TextExtractor - Use Tika to extract text from rich documents.
The best way to get started is to:
- Add a Nuget dependency to TikaOnDotNet.TextExtractor.
- Instantiate a new
TextExtractorobject and call one of the
// using TikaOnDotNet.TextExtractor; var textExtractor = new TextExtractor(); var wordDocContents = textExtractor.Extract(@".\path\to\my favorite word.docx"); var webPageContents = textExtractor.Extract(new Uri("https://google.com"));
Take a look at our tests for more usage examples.
How To Contribute
Have an idea to make this project better? Great! Start out by taking a look at our Contributing Guide.