Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug - crawl job hangs #478

Closed
danizen opened this issue Mar 30, 2018 · 2 comments
Closed

Bug - crawl job hangs #478

danizen opened this issue Mar 30, 2018 · 2 comments
Labels
stale From automation, when inactive for too long.

Comments

@danizen
Copy link

danizen commented Mar 30, 2018

In #477, I diagnosed a serial problem where my crawling job experienced a Fatal OutOfMemoryError, and then later an attempt to stop the collector failed, because the JVM to be stopped would not exit.

It seems likely that the crawler job entered a terminal state, but the code was waiting for it to stop logically, except that it had failed or something instead.

The exception that produced this state was:

FATAL [JobSuite] Fatal error occured in job: monitor_lessdepth_crawler
INFO  [JobSuite] Running monitor_lessdepth_crawler: END (Tue Feb 06 17:13:41 EST 2018)
Exception in thread "monitor_lessdepth_crawler" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.lang.StringCoding.safeTrim(StringCoding.java:79)
        at java.lang.StringCoding.encode(StringCoding.java:365)
        at java.lang.String.getBytes(String.java:941)
        at org.apache.http.entity.StringEntity.<init>(StringEntity.java:70)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:589)
        at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179)
        at com.norconex.committer.core.AbstractBatchCommitter.commitComplete(AbstractBatchCommitter.java:159)
        at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:233)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commit(ElasticsearchCommitter.java:537)
        at com.norconex.collector.core.crawler.AbstractCrawler.execute(AbstractCrawler.java:274)
        at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:228)
        at com.norconex.collector.core.crawler.AbstractCrawler.resumeExecution(AbstractCrawler.java:190)
        at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:51)
        at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:355)
        at com.norconex.jef4.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:119)
        at com.norconex.jef4.job.group.AsyncJobGroup.access$000(AsyncJobGroup.java:44)
        at com.norconex.jef4.job.group.AsyncJobGroup$1.run(AsyncJobGroup.java:86)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

For me, reducing the size of the elasticsearch commitSize resolved the problem, but still worth preventing job crawl hangs.

@essiembre
Copy link
Contributor

Often OOM exceptions can't be recovered from and as such cannot be handled reliably. The JVM application state is already compromised the moment you get this and killing/restarting with more memory is usually the best approach.

Still, if you want to prevent hangs, the best options likely is to use that JVM trick with a kill command or equivalent (from Oracle JVM documentation):

-XX:OnOutOfMemoryError="<cmd args>; <cmd args>"

As of Java 8u92, you can also use those JVM argument (described here):

-XX:ExitOnOutOfMemoryError
-XX:CrashOnOutOfMemoryError 

The next major release will require Java 8 so the launch scripts shipped with the collector may be modified to include one of the Java 8 arguments.

@stale
Copy link

stale bot commented Aug 1, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale From automation, when inactive for too long. label Aug 1, 2021
@stale stale bot closed this as completed Aug 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale From automation, when inactive for too long.
Projects
None yet
Development

No branches or pull requests

2 participants