Permalink
Commits on Apr 12, 2018
  1. Merge pull request #312 from sebastian-nagel/NUTCH-2533-npe-inject-ur…

    sebastian-nagel committed Apr 12, 2018
    …ldir-with-non-files-2x
    
    NUTCH-2533 Injector: NullPointerException if seed URL dir contains non-file entries
Commits on Apr 11, 2018
  1. NUTCH-2533 Injector: NullPointerException if seed URL dir contains no…

    sebastian-nagel committed Apr 11, 2018
    …n-file entries
    
    - read directory explicitely and log all non-file entries
    - exit early if no seed URL files are present
Commits on Apr 9, 2018
  1. Merge pull request #308 from rustyx/NUTCH-2548

    sebastian-nagel committed Apr 9, 2018
    fix for NUTCH-2548 contributed by rustyx
  2. NUTCH-2548 Compressed content skipped, contributed by Rustam

    sebastian-nagel committed Apr 9, 2018
    - do not store content length from HTTP header if content was compressed
Commits on Apr 3, 2018
Commits on Mar 27, 2018
  1. Merge pull request #298 from benmvachon/NUTCH-2536

    lewismc committed Mar 27, 2018
    NUTCH-2536 change GeneratorReducer.count field to non-static variable…
Commits on Mar 16, 2018
  1. NUTCH-2536 change GeneratorReducer.count field to non-static variable…

    Ben Vachon
    Ben Vachon committed Mar 16, 2018
    … for easier SDK experience
Commits on Mar 6, 2018
Commits on Dec 21, 2017
Commits on Dec 17, 2017
Commits on Dec 15, 2017
Commits on Dec 13, 2017
  1. Merge pull request #258 from lewismc/NUTCH-2438

    lewismc committed Dec 13, 2017
    NUTCH-2438 Upgrade Nutch 2.X to Gora 0.8
Commits on Dec 5, 2017
  1. NUTCH-2469 Documents not commited to solr in Sever mode

    sebastian-nagel committed Dec 5, 2017
    - applied patch contributed by Ninaad Joshi
  2. NUTCH-2468 should filter out invalid URLs by default

    sebastian-nagel committed Dec 5, 2017
    - enable plugin urlfilter-validate by default
Commits on Oct 24, 2017
  1. NUTCH-2448: Treat white-space http.agent.version as empty.

    YossiTamari authored and sebastian-nagel committed Oct 23, 2017
    (And do not append a slash to http.agent.name.)
Commits on Oct 23, 2017
Commits on Oct 5, 2017
  1. fix for NUTCH-2438 contributed by tmzzngl

    Tulay Muezzinoglu
    Tulay Muezzinoglu committed Oct 5, 2017
Commits on Oct 4, 2017
  1. Merge pull request #228 from tulay/NUTCH-2437

    lewismc committed Oct 4, 2017
    fix for NUTCH-2437 contributed by tmzzngl
  2. fix for NUTCH-2437 contributed by tmzzngl

    Tulay Muezzinoglu
    Tulay Muezzinoglu committed Oct 4, 2017
Commits on Sep 11, 2017
  1. Merge pull request #198 from sebastian-nagel/NUTCH-2397-2x

    sebastian-nagel committed Sep 11, 2017
    NUTCH-2397: Parser to add paragraph line breaks
    - port solution from 1.x
Commits on Aug 18, 2017
  1. Merge pull request #214 from sebastian-nagel/NUTCH-2378-child-first-c…

    sebastian-nagel committed Aug 18, 2017
    …lass-loader-2x
    
    NUTCH-2378 ChildFirst plugin classloader
Commits on Aug 16, 2017
  1. NUTCH-2378 ChildFirst plugin classloader

    sebastian-nagel committed Aug 16, 2017
    - fix jsoup-extractor: all classes dynamically instantiated
      by JsoupDocumentReader and implementing an interface defined
      in the plugin must live in the plugin's class loader
Commits on Aug 9, 2017
  1. Merge pull request #209 from kaidul/2.x

    lewismc committed Aug 9, 2017
    Issues mentioned in NUTCH-2405 fixed
Commits on Aug 6, 2017
  1. NUTCH-2405 1. Missed root tag <extractor> added in jsoup-extractor.xm…

    kaidul committed Aug 6, 2017
    …l like jsoup-extractor-example.xml
    
    2. jsoup API text() used instead of ownText() to get full contents under CSS selector
    3. <default> => <default-value> typo fixed
Commits on Jul 31, 2017
  1. Merge pull request #208 from kaidul/2.x

    lewismc committed Jul 31, 2017
    NUTCH-2404 Fix for Failed Jenkin build #1588 after merging pull request #192 (NUTCH-2389).
Commits on Jul 30, 2017
  1. Merge pull request #192 from kaidul/NUTCH-2389

    lewismc committed Jul 30, 2017
    NUTCH-2389 Precise data extractor implemented for 2.x