Skip to content

NUTCH-2700 Indexchecker: improve command-line help#446

Merged
sebastian-nagel merged 1 commit intoapache:masterfrom
sebastian-nagel:NUTCH-2700-indexchecker-cmd-line-help
Apr 11, 2019
Merged

NUTCH-2700 Indexchecker: improve command-line help#446
sebastian-nagel merged 1 commit intoapache:masterfrom
sebastian-nagel:NUTCH-2700-indexchecker-cmd-line-help

Conversation

@sebastian-nagel
Copy link
Copy Markdown
Contributor

... and add options -doIndex to pass "checked" document to index writers (the property doIndex is kept to ensure back-ward compatibility):

% bin/nutch indexchecker
Usage:
  IndexingFiltersChecker [OPTIONS] <url>
    Fetch single URL and index it
  IndexingFiltersChecker [OPTIONS] -stdin
    Read URLs to be indexed from stdin
  IndexingFiltersChecker [OPTIONS] -listen <port> [-keepClientCnxOpen]
    Listen on <port> for URLs to be indexed
Options:
  -D<property>=<value>  set/overwrite Nutch/Hadoop properties
                        (a generic Hadoop option to be passed
                         before other command-specific options)
  -normalize            normalize URLs
  -followRedirects      follow redirects when fetching URL
  -dumpText             show the entire plain-text content,
                        not only the first 100 characters
  -doIndex              pass document to configured index writers
                        and let them index it
  -md <key>=<value>     metadata added to CrawlDatum before parsing

- add options `-doIndex` to pass "checked" document to index writers
  (the property `doIndex` is kept to ensure back-ward compatibility)
@sebastian-nagel sebastian-nagel merged commit 510a4ea into apache:master Apr 11, 2019
@sebastian-nagel sebastian-nagel deleted the NUTCH-2700-indexchecker-cmd-line-help branch April 11, 2019 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant