New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NUTCH-2184 Enable IndexingJob to function with no crawldb #95
Conversation
* Implementation of {@link org.apache.hadoop.mapred.Reducer} | ||
* which generates {@link org.apache.nutch.indexer.NutchIndexAction}'s | ||
* from combinations of various Nutch data structures. Essentially | ||
* teh result is a key representing a URL and a value representing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo teh -> the
@lewismc what's the status of this patch? are we close to merging? |
ping @lewismc any status on this? |
I think I will fire a PR up today. On Sun, Mar 27, 2016 at 11:56 AM, Chris Mattmann notifications@github.com
Lewis |
@lewismc happy to review anything here later tonight if you have it. Cheers aye |
@lewismc ping if you're ready on this please let me know happy to get this sorted. |
Also looking forward to this. Anything I can do to help get it in 1.12? |
this patch is the one from Jan 2016, but has been updated at apache#95, so once 1.12 is released check if the PR made it in.
Option noCommitOpt = OptionBuilder | ||
.withArgName("noCommit") | ||
.withDescription( | ||
"do the commits once and for all the reducers in one go (optional)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description for -noCommit
is backward: this option tells the Indexer not to do a final commit after the job finishes.
Hi @naegelejd @chrismattmann the PR has conflicts.
@naegelejd are you able to rebase and provide a test? If not then I can go back to my branch. |
I can't promise anything. I'm not familiar with mrunit yet but I may find time soon to continue work on this in addition to a handful of other issues I'm hoping to fix or have merged soon. |
- make the CrawlDb argument passed to indexing job optional - improve command-line help - pick various improvements from PR apache#95
Closed in favor of #486
|
OK folks, this issue addresses https://issues.apache.org/jira/browse/NUTCH-2184 by
Any questions, then please let me know. I would really appreciate if people could pull this code and try it out within your test or local environment.
Thanks, also thanks Markus for the original suggestions for tests, etc.