Permalink
Commits on May 12, 2010
  1. Nutch a tlp, moving svn

    git-svn-id: https://svn.apache.org/repos/asf/nutch/tags/release-0.8.1@943363 13f79535-47bb-0310-9956-ffa450edef68
    gmcdonald committed May 12, 2010
Commits on Sep 24, 2006
  1. Nutch 0.8.1 release

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/tags/release-0.8.1@449369 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Sep 24, 2006
  2. preparing for 0.8.1 release

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@449365 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Sep 24, 2006
  3. preparing for 0.8.1 release

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@449364 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Sep 24, 2006
Commits on Sep 23, 2006
  1. NUTCH-336: differentiate between newly discovered pages (known value …

    …through
    
    inlink contributions) and newly injected pages (aribtrarily defined initial
    value).
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@449279 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Sep 23, 2006
Commits on Sep 22, 2006
  1. NUTCH-332: fix the problem of doubling scores caused by links pointing

    to the current page (e.g. anchors).
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@449100 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Sep 22, 2006
  2. Use a CombiningCollector when calculating readdb -stats. This drastic…

    …ally
    
    reduces the size of intermediate data, resulting in significant speedups
    for large databases.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@449097 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Sep 22, 2006
Commits on Sep 19, 2006
  1. NUTCH-105 - Network error during robots.txt fetch causes file to beig…

    …nored, contributed by Greg Kim
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@447867 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Sep 19, 2006
Commits on Sep 18, 2006
Commits on Aug 19, 2006
  1. NUTCH-338 - Remove the text parser as an option for parsing PDF files…

    … in parse-plugins.xml (Chris A. Mattmann)
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@432794 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Aug 19, 2006
Commits on Aug 18, 2006
  1. NUTCH-341 - if -workingdir is specified, always create a unique subdir.

    Also, use unique directory names to allow multiple IndexMergers to run
    simultaneously.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@432675 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Aug 18, 2006
Commits on Aug 17, 2006
  1. Update CHANGES.

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@432290 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Aug 17, 2006
  2. Apply patch in NUTCH-348 - Generator used the lowest score instead of

    the highest. Contributed by Chris Schneider and Stefan Groschupf.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@432287 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Aug 17, 2006
Commits on Aug 14, 2006
  1. Fix incorrect calculation of max and min scores in readdb -stats. Spo…

    …tted
    
    by Chris Schneider.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@431368 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Aug 14, 2006
  2. Apply patches in rev 431364.

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@431366 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Aug 14, 2006
Commits on Aug 11, 2006
Commits on Aug 8, 2006
  1. NUTCH-260 - update hadoop.jar

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@429769 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Aug 8, 2006
Commits on Jul 27, 2006
  1. logging changes to 0.8 branch also

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@426118 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Jul 27, 2006
Commits on Jul 25, 2006
  1. Nutch 0.8 release maintenance branch.

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8@425492 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Jul 25, 2006
  2. Change the name of SegmentReader alias to 'readseg' for consistency w…

    …ith other
    
    reading-related commands. Keep the old 'segread' for compatibility, and
    give a deprecation message.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@425354 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Jul 25, 2006
  3. preparing 0.8 release

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@425324 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Jul 25, 2006
  4. preparing 0.8 release

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@425321 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Jul 25, 2006
Commits on Jul 24, 2006
  1. Even if a filter doesn't make any adjustments, each one should still …

    …return
    
    the input value, which other filters may have modified.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@425087 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Jul 24, 2006
  2. Expire all finished addresses. When sites request long crawl delays

    this quickly ties down all threads, and lock expiration heppens
    rarely and proceeds too slowly to remove all expired entries.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@425071 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Jul 24, 2006
  3. Fix an NPE.

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@425042 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Jul 24, 2006
  4. Set job names (NUTCH-329).

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@424965 13f79535-47bb-0310-9956-ffa450edef68
    sigram committed Jul 24, 2006
Commits on Jul 23, 2006
  1. NUTCH-328 update commons-cli-2.0-SNAPSHOT.jar

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@424784 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Jul 23, 2006
  2. NUTCH-327 fix log path under cygwin

    git-svn-id: https://svn.apache.org/repos/asf/lucene/nutch/trunk@424779 13f79535-47bb-0310-9956-ffa450edef68
    siren committed Jul 23, 2006