This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
nutch /
| name | age | message | |
|---|---|---|---|
| |
.classpath | Wed Nov 18 06:12:37 -0800 2009 | |
| |
.gitignore | Wed Nov 18 07:56:50 -0800 2009 | |
| |
.project | Sun Aug 09 06:12:21 -0700 2009 | |
| |
.settings/ | Fri Aug 14 03:11:36 -0700 2009 | |
| |
CHANGES.txt | Mon Mar 23 11:59:26 -0700 2009 | |
| |
GUI-README.txt | Wed Aug 26 08:22:40 -0700 2009 | |
| |
KEYS | Thu Mar 19 14:26:52 -0700 2009 | |
| |
LICENSE.txt | Thu Mar 19 14:09:56 -0700 2009 | |
| |
NOTICE.txt | Thu Mar 19 14:10:28 -0700 2009 | |
| |
README.txt | Sun Aug 09 06:32:25 -0700 2009 | |
| |
bin/ | Sun Aug 09 06:05:17 -0700 2009 | |
| |
build.xml | Sun Aug 09 06:05:17 -0700 2009 | |
| |
conf/ | Tue Oct 06 01:33:00 -0700 2009 | |
| |
default.properties | Wed Nov 18 07:56:50 -0800 2009 | |
| |
docs/ | Tue May 09 06:43:52 -0700 2006 | |
| |
index.html | Tue Mar 01 14:04:46 -0800 2005 | |
| |
lib/ | Thu Sep 17 08:00:48 -0700 2009 | |
| |
site/ | Wed Feb 11 15:48:50 -0800 2009 | |
| |
src/ | Wed Nov 18 06:12:37 -0800 2009 |
README.txt
Apache Nutch README Important note: Due to licensing issues we cannot provide two libraries that are normally provided with PDFBox (jai_core.jar, jai_codec.jar), the parser library we use for parsing PDF files. If you encounter unexpected problems when working with PDF files please 1. download the two missing libraries from: http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/external/ 2. Put them to directory src/plugin/parse-pdf/lib 3. follow the instructions in file src/plugin/parse-pdf/plugin.xml 4. Rebuild nutch. Interesting files include: docs/api/index.html Javadocs for the Nutch software. CHANGES.txt Log of changes to Nutch. For the latest information about Nutch, please visit our website at: http://lucene.apache.org/nutch/ and our wiki, at: http://wiki.apache.org/nutch/ To get started using Nutch read Tutorial: http://lucene.apache.org/nutch/tutorial.html Export Control This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See <http://www.wassenaar.org/> for more information. The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Apache Nutch uses the PDFBox API in its parse-pdf plugin for extracting textual content and metadata from encrypted PDF files. See http://incubator.apache.org/pdfbox/ for more details on PDFBox.







