Skip to content

Commit

Permalink
- update based on README revs from branch-1.3 to bring up to date
Browse files Browse the repository at this point in the history
git-svn-id: https://svn.apache.org/repos/asf/nutch/trunk@983324 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information
chrismattmann committed Aug 7, 2010
1 parent c2cef7c commit 4c87f33
Showing 1 changed file with 4 additions and 28 deletions.
32 changes: 4 additions & 28 deletions README.txt
@@ -1,40 +1,16 @@
Apache Nutch README

Important note: Due to licensing issues we cannot provide two libraries that
are normally provided with PDFBox (jai_core.jar, jai_codec.jar), the parser
library we use for parsing PDF files. If you encounter unexpected problems when
working with PDF files please

1. download the two missing libraries from:
http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/external/

2. Put them to directory src/plugin/parse-pdf/lib
3. follow the instructions in file src/plugin/parse-pdf/plugin.xml
4. Rebuild nutch.



Interesting files include:


docs/api/index.html
Javadocs for the Nutch software.

CHANGES.txt
Log of changes to Nutch.


For the latest information about Nutch, please visit our website at:

http://lucene.apache.org/nutch/
http://nutch.apache.org

and our wiki, at:

http://wiki.apache.org/nutch/

To get started using Nutch read Tutorial:

http://lucene.apache.org/nutch/tutorial.html
http://wiki.apache.org/nutch/NutchTutorial

Export Control

Expand All @@ -55,6 +31,6 @@ Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache Nutch uses the PDFBox API in its parse-pdf plugin for extracting textual content
and metadata from encrypted PDF files. See http://incubator.apache.org/pdfbox/ for more
Apache Nutch uses the PDFBox API in its parse-tika plugin for extracting textual content
and metadata from encrypted PDF files. See http://pdfbox.apache.org for more
details on PDFBox.

0 comments on commit 4c87f33

Please sign in to comment.