Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Mirror of Apache Nutch

This branch is 0 commits ahead and 0 commits behind nutch-gui

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 .settings
Octocat-spinner-32 bin
Octocat-spinner-32 conf
Octocat-spinner-32 docs
Octocat-spinner-32 lib
Octocat-spinner-32 site
Octocat-spinner-32 src
Octocat-spinner-32 .classpath
Octocat-spinner-32 .gitignore
Octocat-spinner-32 .project
Octocat-spinner-32 CHANGES.txt
Octocat-spinner-32 GUI-README.txt
Octocat-spinner-32 KEYS
Octocat-spinner-32 LICENSE.txt
Octocat-spinner-32 NOTICE.txt
Octocat-spinner-32 README.txt
Octocat-spinner-32 build.xml
Octocat-spinner-32 default.properties
Octocat-spinner-32 index.html
README.txt
Apache Nutch README

Important note: Due to licensing issues we cannot provide two libraries that
are normally provided with PDFBox (jai_core.jar, jai_codec.jar), the parser
library we use for parsing PDF files. If you encounter unexpected problems when
working with PDF files please

1. download the two missing libraries  from:
   http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/external/

2. Put them to directory src/plugin/parse-pdf/lib
3. follow the instructions in file src/plugin/parse-pdf/plugin.xml
4. Rebuild nutch.



Interesting files include:


  docs/api/index.html
      Javadocs for the Nutch software.

  CHANGES.txt
      Log of changes to Nutch.


For the latest information about Nutch, please visit our website at:

   http://lucene.apache.org/nutch/

and our wiki, at:

   http://wiki.apache.org/nutch/

To get started using Nutch read Tutorial:

   http://lucene.apache.org/nutch/tutorial.html
   
Export Control

This distribution includes cryptographic software.  The country in which you 
currently reside may have restrictions on the import, possession, use, and/or 
re-export to another country, of encryption software.  BEFORE using any encryption 
software, please check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to see if this is 
permitted.  See <http://www.wassenaar.org/> for more information. 

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has 
classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which 
includes information security software using or performing cryptographic functions with 
asymmetric algorithms.  The form and manner of this Apache Software Foundation 
distribution makes it eligible for export under the License Exception ENC Technology 
Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, 
Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache Nutch uses the PDFBox API in its parse-pdf plugin for extracting textual content 
and metadata from encrypted PDF files. See http://incubator.apache.org/pdfbox/ for more 
details on PDFBox.
Something went wrong with that request. Please try again.