Skip to content

Conversation

@Omkar20895
Copy link
Contributor

  • Added a new method cleanupAfterFailure which was already existing in Injector.
  • I have added this only in one of the 25 instances of job.waitForCompletion(true), please go ahead and review it and I will incorporate the review comments in the upcoming commits/changes.

Confusion:

  • When a job failure happens do we need to do System.exit(non-zero status) or throw a runtime exception with a custom message specific to each class? I went ahead with the later one. Please suggest otherwise with reason.

Thanks.

Path outPath = FileOutputFormat.getOutputPath(job);
if (fs.exists(outPath))
fs.delete(outPath, true);
LOG.error("Crawl job failed ", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always use parameterized logging e.g.
LOG.error("Crawl job failed {}", e);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK! I will incorporate this change and send a commit covering all the 25 instances. Thanks.

Copy link
Contributor

@sebastian-nagel sebastian-nagel Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The CrawlDb cleanup method looks same as that from Injector: maybe move it to CrawlDb.java - there are already methods to lock and install the CrawlDb shared by various jobs (inject, update, merge, generate).

Copy link
Contributor Author

@Omkar20895 Omkar20895 Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not shift the sub routine cleanup to the class nutch/src/java/org/apache/nutch/util/NutchJob.java? This is a nutch job utils file, I think this would be more appropriate. I have cross checked that cleanup sub routine would be useful outside crawldb like in hostdb, indexer etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It's not specific to the CrawlDb because lock file and temp dir are passed explicitly.

@Omkar20895
Copy link
Contributor Author

I have added a new commit, please feel free to review it and let me know if any changes are required. I will squash the commits at the end.

Copy link
Contributor

@sebastian-nagel sebastian-nagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 great job! One point needs a fix. Thanks!

+ job.getStatus().getFailureInfo();
LOG.error(message);
cleanupAfterFailure(tempCrawlDb, lock, fs);
CrawlDb.cleanupAfterFailure(tempCrawlDb, lock, fs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be NutchJob.cleanupAfterFailure(...)

}
LockUtil.removeLockFile(fs, lock);
} catch (IOException e) {
CrawlDb.cleanupAfterFailure(tempCrawlDb, lock, fs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this, Thanks.

+ job.getStatus().getFailureInfo();
LOG.error(message);
NutchJob.cleanupAfterFailure(tempDir, lock, fs);
NutchJob.cleanupAfterFailure(tempDir2, lock, fs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it's not an error to try to remove the lock file a second time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @sebastian-nagel. AFAIK if I try to remove a lock file that does not exist, the function would just return False.

@sebastian-nagel
Copy link
Contributor

+1 Thanks, @Omkar20895! I'll step forward and commit the fix.

@Omkar20895
Copy link
Contributor Author

Thanks @sebastian-nagel :)

sebastian-nagel added a commit that referenced this pull request Apr 4, 2018
NUTCH-2518 Cleaning up the file system after a job failure.
@sebastian-nagel
Copy link
Contributor

Squashed, rebased and committed in 8682b96. Thanks again, @Omkar20895!

@lewismc
Copy link
Member

lewismc commented Apr 10, 2018

This is ready to be merged @sebastian-nagel ?

@sebastian-nagel
Copy link
Contributor

sebastian-nagel commented Apr 11, 2018

That's already merged but with a rebase and squash done in command-line. Sorry, that's not automatically detected by github and I forgot to manually close this PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants