Permalink
Browse files

- update based on README revs from branch-1.3 to bring up to date

git-svn-id: https://svn.apache.org/repos/asf/nutch/trunk@983324 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information...
1 parent c2cef7c commit 4c87f337336d5035681fc89a63e59384e2c51be5 @chrismattmann chrismattmann committed Aug 7, 2010
Showing with 4 additions and 28 deletions.
  1. +4 −28 README.txt
View
@@ -1,40 +1,16 @@
Apache Nutch README
-Important note: Due to licensing issues we cannot provide two libraries that
-are normally provided with PDFBox (jai_core.jar, jai_codec.jar), the parser
-library we use for parsing PDF files. If you encounter unexpected problems when
-working with PDF files please
-
-1. download the two missing libraries from:
- http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/external/
-
-2. Put them to directory src/plugin/parse-pdf/lib
-3. follow the instructions in file src/plugin/parse-pdf/plugin.xml
-4. Rebuild nutch.
-
-
-
-Interesting files include:
-
-
- docs/api/index.html
- Javadocs for the Nutch software.
-
- CHANGES.txt
- Log of changes to Nutch.
-
-
For the latest information about Nutch, please visit our website at:
- http://lucene.apache.org/nutch/
+ http://nutch.apache.org
and our wiki, at:
http://wiki.apache.org/nutch/
To get started using Nutch read Tutorial:
- http://lucene.apache.org/nutch/tutorial.html
+ http://wiki.apache.org/nutch/NutchTutorial
Export Control
@@ -55,6 +31,6 @@ Section 740.13) for both object code and source code.
The following provides more details on the included cryptographic software:
-Apache Nutch uses the PDFBox API in its parse-pdf plugin for extracting textual content
-and metadata from encrypted PDF files. See http://incubator.apache.org/pdfbox/ for more
+Apache Nutch uses the PDFBox API in its parse-tika plugin for extracting textual content
+and metadata from encrypted PDF files. See http://pdfbox.apache.org for more
details on PDFBox.

0 comments on commit 4c87f33

Please sign in to comment.