public
Fork of emi/bixo
Description: A creepy crawler
Homepage:
Clone URL: git://github.com/cwensel/bixo.git
bixo /
name age message
file .gitignore Wed Apr 29 11:24:44 -0700 2009 update some ignores [joa23]
file README Mon Apr 06 11:31:04 -0700 2009 test commit [mbauhardt]
directory bin/ Wed Apr 08 14:40:52 -0700 2009 checkpoint for ContentDB taps and schemes [cwensel]
file build.xml Fri May 01 11:58:45 -0700 2009 adding target to build hadoop job jar [joa23]
directory doc/ Thu Apr 16 09:44:38 -0700 2009 Add def for main Hadoop class. Minor cleanup of... [Ken Krugler]
file ivy.xml Tue May 05 11:33:05 -0700 2009 Added args4j jar [Ken Krugler]
directory ivy/ Tue Mar 31 19:11:54 -0700 2009 BIXO-1, ivy based build and simple project skel... [joa23]
directory lib/ Tue May 05 11:35:26 -0700 2009 Fixed up test packages to match source. Use ar... [Ken Krugler]
directory release/ Fri May 01 15:30:51 -0700 2009 Do 0.3.1 build [Ken Krugler]
directory src/ Sat May 09 16:34:20 -0700 2009 test anchor exists in url [cwensel]
README
===============================
Introduction
===============================

Bixo is an open source Java crawler that runs as a series of Cascading
pipes. It is designed to be used as a tool for creating customized
crawlers, thus each Cascading pipe implements a discrete operation. By
building a customized Cascading pipe assembly, you can quickly create
specialized crawlers that are optimized for a particular use case.

Bixo borrows heavily from the Apache Nutch project, as well as many other
open source projects at Apache and elsewhere.

Bixo is released under the MIT license.

===============================
Building
===============================

You need Apache Ant 1.7 or higher. 
In the project root type:
ant -p

To  clean, run the tests and integration tests and build a jar type:
ant clean test it jar

To build a distribution type:
ant dist

To build a eclipse project type:
ant eclipse
Than choose "import existing project" in eclipse.