public
Fork of apache/nutch
Description: Mirror of Apache Nutch
Homepage:
Clone URL: git://github.com/101tec/nutch.git
nutch /
name age message
file .classpath Wed Nov 18 06:12:37 -0800 2009 Added regular expression support for the black ... [rwe17]
file .gitignore Wed Nov 18 07:56:50 -0800 2009 Changed version from '0.3-dev' to '0.3'. [rwe17]
file .project Sun Aug 09 06:12:21 -0700 2009 + commit eclipse based files that a developer (... [Marko Bauhardt]
directory .settings/ Fri Aug 14 03:11:36 -0700 2009 simpler eclipse setup [Marko Bauhardt]
file CHANGES.txt Mon Mar 23 11:59:26 -0700 2009 update release date git-svn-id: https://svn.ap... [Sami Siren]
file GUI-README.txt Wed Aug 26 08:22:40 -0700 2009 test checkin aftre creating the tag v0.1 [Marko Bauhardt]
file KEYS Thu Mar 19 14:26:52 -0700 2009 copy keys to trunk git-svn-id: https://svn.apa... [Sami Siren]
file LICENSE.txt Thu Mar 19 14:09:56 -0700 2009 NUTCH-723 git-svn-id: https://svn.apache.org/r... [Sami Siren]
file NOTICE.txt Thu Mar 19 14:10:28 -0700 2009 NUTCH-725 git-svn-id: https://svn.apache.org/r... [Sami Siren]
file README.txt Sun Aug 09 06:32:25 -0700 2009 + revert first test commit [Marko Bauhardt]
directory bin/ Sun Aug 09 06:05:17 -0700 2009 + implement http server + implement a gui compo... [Marko Bauhardt]
file build.xml Sun Aug 09 06:05:17 -0700 2009 + implement http server + implement a gui compo... [Marko Bauhardt]
directory conf/ Tue Oct 06 01:33:00 -0700 2009 NUTCHGUI-28, NUTCHGUI-24, NUTCHGUI-23: fix secu... [Marko Bauhardt]
file default.properties Wed Nov 18 07:56:50 -0800 2009 Changed version from '0.3-dev' to '0.3'. [rwe17]
directory docs/ Tue May 09 06:43:52 -0700 2006 NUTCH-261 : Add an optional lang parameter for ... [Jerome Charron]
file index.html Tue Mar 01 14:04:46 -0800 2005 Initial import of Nutch to Apache. git-svn-id:... [Douglass Cutting]
directory lib/ Thu Sep 17 08:00:48 -0700 2009 NUTCHGUI-1: start to implement login/logout [Marko Bauhardt]
directory site/ Wed Feb 11 15:48:50 -0800 2009 fix link and name git-svn-id: https://svn.apac... [Sami Siren]
directory src/ Wed Nov 18 06:12:37 -0800 2009 Added regular expression support for the black ... [rwe17]
README.txt
Apache Nutch README

Important note: Due to licensing issues we cannot provide two libraries that
are normally provided with PDFBox (jai_core.jar, jai_codec.jar), the parser
library we use for parsing PDF files. If you encounter unexpected problems when
working with PDF files please

1. download the two missing libraries  from:
   http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/external/

2. Put them to directory src/plugin/parse-pdf/lib
3. follow the instructions in file src/plugin/parse-pdf/plugin.xml
4. Rebuild nutch.



Interesting files include:


  docs/api/index.html
      Javadocs for the Nutch software.

  CHANGES.txt
      Log of changes to Nutch.


For the latest information about Nutch, please visit our website at:

   http://lucene.apache.org/nutch/

and our wiki, at:

   http://wiki.apache.org/nutch/

To get started using Nutch read Tutorial:

   http://lucene.apache.org/nutch/tutorial.html
   
Export Control

This distribution includes cryptographic software.  The country in which you 
currently reside may have restrictions on the import, possession, use, and/or 
re-export to another country, of encryption software.  BEFORE using any encryption 
software, please check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to see if this is 
permitted.  See <http://www.wassenaar.org/> for more information. 

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has 
classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which 
includes information security software using or performing cryptographic functions with 
asymmetric algorithms.  The form and manner of this Apache Software Foundation 
distribution makes it eligible for export under the License Exception ENC Technology 
Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, 
Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache Nutch uses the PDFBox API in its parse-pdf plugin for extracting textual content 
and metadata from encrypted PDF files. See http://incubator.apache.org/pdfbox/ for more 
details on PDFBox.