From affea80c89c94dc8d7a333706b493677c7de6eb0 Mon Sep 17 00:00:00 2001 From: Latha Krishnamurthi Date: Fri, 1 Sep 2017 13:23:11 -0700 Subject: [PATCH] Update README.md added readme notes on how to output the extracted content using the parser interface. --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index e5af9c7..3313ccc 100644 --- a/README.md +++ b/README.md @@ -58,6 +58,10 @@ The parser interface extracts text and metadata using the /rmeta interface. This is one of the better ways to get the internal XHTML content extracted. +The parser interface needs the following environment variable set on the console for printing of the extracted content. + +export PYTHONIOENCODING=utf8 + ``` #!/usr/bin/env python import tika @@ -76,6 +80,10 @@ Specify Output Format To XHTML --------------------- The parser interface is optionally able to output the content as XHTML rather than plain text. +The parser interface needs the following environment variable set on the console for printing of the extracted content. + +export PYTHONIOENCODING=utf8 + ``` #!/usr/bin/env python import tika