public
Description: A creepy crawler
Homepage:
Clone URL: git://github.com/emi/bixo.git
bixo /
name age message
file .gitignore Tue Dec 01 13:36:42 -0800 2009 Ignore target/ directory, which is created if y... [Ken Krugler]
file README Mon Oct 26 16:09:20 -0700 2009 Fixed some typos [Ken Krugler]
directory bin/ Tue Dec 01 13:39:15 -0800 2009 Properly set stack size (and max open files) fo... [Ken Krugler]
file build.xml Fri Dec 04 17:48:31 -0800 2009 Rolled in Vivek's patch to fix long-standing is... [Ken Krugler]
directory contrib/ Wed Oct 28 07:13:24 -0700 2009 Fixed build problem w/helpful project in contrib [Ken Krugler]
directory doc/ Fri Dec 04 17:48:31 -0800 2009 Rolled in Vivek's patch to fix long-standing is... [Ken Krugler]
directory lib/ Mon Sep 21 12:40:47 -0700 2009 Complete the switch to using Maven for jar depe... [Ken Krugler]
file pom.xml Thu Dec 03 17:24:11 -0800 2009 Meaningless whitespace changes. [Ken Krugler]
directory src/ Mon Dec 07 00:09:28 -0800 2009 Remove noisy trace logging. Handle files that ... [Ken Krugler]
README
===============================
Introduction
===============================

Bixo is an open source Java web mining toolkit that runs as a series of Cascading
pipes. It is designed to be used as a tool for creating customized web mining apps.
By building a customized Cascading pipe assembly, you can quickly create a workflow
using Bixo that fetches web content, parses, analyzes, and publishes the results.

Bixo borrows heavily from the Apache Nutch project, as well as many other open source
projects at Apache and elsewhere.

Bixo is released under the MIT license.

===============================
Building
===============================

See http://bixo.101tec.com/documentation/building-bixo/ for full details.

You need Apache Ant 1.7 or higher. 

To get a list of valid targets:

% cd <project directory>
% ant -p

To  clean, run the tests and build a jar:

% ant clean test jar

To create Eclipse project files:

% ant eclipse

Than choose "Import existing project" in Eclipse, and select the Bixo project directory.