Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTCH-2793 indexer-csv: make it work in distributed mode #534

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Jun 10, 2020

  1. NUTCH-2793 indexer-csv: make it work in distributed mode

    Before the change, the output file name was hard-coded to "nutch.csv".
    When running in distributed mode, multiple reducers would clobber each
    other output.
    
    After the change, the filename is taken from the first open(cfg, name)
    initialization call, where name is a unique file name generated by
    IndexerOutputFormat, derived from hadoop FileOutputFormat. The CSV files
    are now named like part-r-000xx.
    pmezard committed Jun 10, 2020
    Configuration menu
    Copy the full SHA
    1680346 View commit details
    Browse the repository at this point in the history