Permalink
Commits on Oct 26, 2015
  1. Clean up imports

    kkrugler committed Oct 26, 2015
Commits on Sep 1, 2015
  1. Re-enable parser test

    Meant finding the Heritrix jar in its new location, then finding out
    which of the many jars we actually need, and excluding all of the extra
    stuff it pulls in. Also fixed up dup slf4j bindings, due to bogus
    dependency in dsiutils.
    kkrugler committed Sep 1, 2015
  2. Put back in dep on crawler-commons

    Needed for robots.txt processing
    kkrugler committed Sep 1, 2015
  3. Backed out use of crawler-commons fetcher

    Their fetcher isn't serializable, which creates issues for Hadoop jobs.
    kkrugler committed Sep 1, 2015
  4. Remove main class from manifest

    So that you can run both the demo crawl and web mining tool. Also fixed
    up README to include numreducetasks parameter
    kkrugler committed Sep 1, 2015
Commits on Aug 30, 2015
  1. Fix up dependencies

    But parser test depends on no-longer available (??) Heritrix jar, so
    that is commented out for now.
    kkrugler committed Aug 30, 2015
Commits on May 26, 2015
  1. Switched to crawler-commons for processing robots.txt.

    The robots handling code in crawler-commons was based on the the Bixo code but has subsequently been improved.
    vivek committed May 26, 2015
Commits on May 8, 2015
  1. Added private constructor to utility classes to enforce that they sho…

    …uldn't be instantiated
    vivek committed May 8, 2015
  2. Maintain SNAPSHOT version number in master

    vivek committed May 8, 2015
Commits on May 5, 2015
  1. Setting up release of distribution for 0.9.2

    When the 0.9.2 tag was created, the contrib components hadn’t been
    updated.
    vivek committed May 5, 2015
Commits on May 4, 2015
Commits on Apr 10, 2015
  1. Releasing version 0.9.2

    vivek committed Apr 10, 2015
  2. Get rid of the ec2 section in the dist target.

    vivek committed Apr 10, 2015
  3. Updated changes file for 0.9.2

    vivek committed Apr 10, 2015
  4. Deleted EC2 support files.

    With the move to Hadoop 2, we don't have an AMI that we can target (if there is a requirement to run using AWS then EMR can be used instead).
    vivek committed Apr 10, 2015
Commits on Apr 9, 2015
  1. Clean up warning in test - close input stream after we are done readi…

    …ng from it.
    vivek committed Apr 9, 2015
Commits on Apr 8, 2015
  1. Added required option for the the number of reduce tasks. The current…

    … version of cascading.utils sets the number of reduce tasks to 1 (when the BasePlatform.CLUSTER_REDUCER_COUNT is specified) with MapReduce2. Until that is resolved users need to specify the number of reduce tasks.
    vivek committed Apr 8, 2015
Commits on Mar 25, 2015
  1. Reverted the addition of assertPathExists to BixoPlatform (instead th…

    …e code should use assertExists from BasePath).
    vivek committed Mar 25, 2015
Commits on Mar 23, 2015
  1. Fixed bug in CreateUrlDatumFromOutlinksFunction where it wasn't setti…

    …ng up the fetched status and last fetched fields.
    
    Thanks to Al Hendry for finding this bug and providing the fix.
    vivek committed Mar 23, 2015
Commits on Mar 21, 2015
  1. Merge branch 'master' into version-0.9

    vivek committed Mar 21, 2015
  2. Updated the year in the copyright boilerplate.

    vivek committed Mar 21, 2015
  3. Set up the version-0.9 branch for doing a release.

    Switched to cascading.utils 2.6.0
    Added assertPathExists to BixoPlatform.
    vivek committed Mar 21, 2015
Commits on Mar 16, 2015
  1. Update for Hadoop 2.4.1

    kkrugler committed Mar 16, 2015
Commits on Apr 18, 2014
  1. Merge in pull request

    kkrugler committed Apr 18, 2014
  2. Merge with GitHub

    kkrugler committed Apr 18, 2014
Commits on Apr 8, 2014
  1. Merge pull request #1 from bixo/master

    Update
    rockwalrus committed Apr 8, 2014
Commits on Feb 10, 2014
  1. Update version of cascading to 2.2.1

    vivek committed Feb 10, 2014
  2. Merge pull request #61 from rockwalrus/master

    Declare SubAssembly input pipes.
    vmagotra committed Feb 10, 2014
  3. Minor cleanup to test page content

    vivek committed Feb 10, 2014