cwensel / bixo forked from emi/bixo
- Source
- Commits
- Network (5)
- Issues (0)
- Downloads (1)
- Wiki (1)
- Graphs
-
Branch:
ken
Ken Krugler (author)
Mon Mar 30 15:57:37 -0700 2009
commit 2b527bf633b053140c39d724dff4ef0c2e307cb5
tree 16d5d6ee7423934dc0eee7352bd0cc2c1e511972
parent b83b56e30308b3445f3e487723332b76b7ed5e37
tree 16d5d6ee7423934dc0eee7352bd0cc2c1e511972
parent b83b56e30308b3445f3e487723332b76b7ed5e37
bixo /
README
=============================== Introduction =============================== Bixo is an open source Java crawler that runs as a series of Cascading pipes. It is designed to be used as a tool for creating customized crawlers, thus each Cascading pipe implements a discrete operation. By building a customized Cascading pipe assembly, you can quickly create specialized crawlers that are optimized for a particular use case. Bixo borrows heavily from the Apache Nutch project, as well as many other open source projects at Apache and elsewhere. Bixo is released under the MIT license. =============================== Building =============================== To build, you first need: 1. A recent release of Cascading that works with Hadoop 0.19.0+ http://cascading.googlecode.com/files/cascading-1.0.6-hadoop-0.19.0%2B.tgz 2. A recent release of Hadoop 0.19 http://www.apache.org/dyn/closer.cgi/hadoop/core/ 3. A build.properties file in the same directory as the build.xml file This file should contain: hadoop.home=<path to Hadoop you just downloaded> cascading.home=<path to Cascading you just downloaded> Note that you can't use user-relative paths here, e.g. ~/<path> won't work. They need to be either absolute or relative to the project directory. 4. Run ant test to compile & test the code.

