Skip to content

Bengt/python-tika-with-deps

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-tika - Python bindings for Apache Tika

Requirements

  • Java >= 1.5
  • JCC

Installation

$ python setup.py build
$ python setup.py install

Usage

To use the AutoDetectParser,

import tika
tika.initVM()

from tika import parser

print parser.from_buffer("<html><body>Hello World</body></html>")
# Or directly from a file, 
# print parser.from_file("/tmp/foo.doc")

returns a dict,

{'content': u'Hello Cruel World',
 'metadata': {u'Content-Encoding': u'ISO-8859-1',
				  u'Content-Type': u'text/html',
				  u'title': u'Hello world'}
}

About

Python wrapper for Apache Tika, made to be easy_installed

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Java 52.2%
  • Python 47.8%