public
Fork of emi/bixo
Description: A creepy crawler
Homepage:
Clone URL: git://github.com/cwensel/bixo.git
bixo /
name age message
file .gitignore Wed Apr 29 11:24:44 -0700 2009 update some ignores [sgroschupf]
file README Mon Apr 06 11:31:04 -0700 2009 test commit [Marko Bauhardt]
directory bin/ Wed Apr 08 14:40:52 -0700 2009 checkpoint for ContentDB taps and schemes [cwensel]
file build.xml Fri May 01 11:58:45 -0700 2009 adding target to build hadoop job jar [sgroschupf]
directory doc/ Thu Apr 16 09:44:38 -0700 2009 Add def for main Hadoop class. Minor cleanup of... [Ken Krugler]
file ivy.xml Loading commit data...
directory ivy/ Tue Mar 31 19:11:54 -0700 2009 BIXO-1, ivy based build and simple project skel... [sgroschupf]
directory lib/ Tue May 05 11:35:26 -0700 2009 Fixed up test packages to match source. Use ar... [Ken Krugler]
directory release/
directory src/
README
===============================
Introduction
===============================

Bixo is an open source Java crawler that runs as a series of Cascading
pipes. It is designed to be used as a tool for creating customized
crawlers, thus each Cascading pipe implements a discrete operation. By
building a customized Cascading pipe assembly, you can quickly create
specialized crawlers that are optimized for a particular use case.

Bixo borrows heavily from the Apache Nutch project, as well as many other
open source projects at Apache and elsewhere.

Bixo is released under the MIT license.

===============================
Building
===============================

You need Apache Ant 1.7 or higher. 
In the project root type:
ant -p

To  clean, run the tests and integration tests and build a jar type:
ant clean test it jar

To build a distribution type:
ant dist

To build a eclipse project type:
ant eclipse
Than choose "import existing project" in eclipse.