public
Description: A creepy crawler
Homepage:
Clone URL: git://github.com/emi/bixo.git
bixo / README
100644 39 lines (23 sloc) 1.032 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
===============================
Introduction
===============================
 
Bixo is an open source Java web mining toolkit that runs as a series of Cascading
pipes. It is designed to be used as a tool for creating customized web mining apps.
By building a customized Cascading pipe assembly, you can quickly create a workflow
using Bixo that fetches web content, parses, analyzes, and publishes the results.
 
Bixo borrows heavily from the Apache Nutch project, as well as many other open source
projects at Apache and elsewhere.
 
Bixo is released under the MIT license.
 
===============================
Building
===============================
 
See http://bixo.101tec.com/documentation/building-bixo/ for full details.
 
You need Apache Ant 1.7 or higher.
 
To get a list of valid targets:
 
% cd <project directory>
% ant -p
 
To clean, run the tests and build a jar:
 
% ant clean test jar
 
To create Eclipse project files:
 
% ant eclipse
 
Than choose "Import existing project" in Eclipse, and select the Bixo project directory.