Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fscrawler does not recover when it lost communication with elasticsearch #255

Closed
twindragons1987 opened this issue Dec 27, 2016 · 6 comments
Assignees
Labels
bug For confirmed bugs
Milestone

Comments

@twindragons1987
Copy link

twindragons1987 commented Dec 27, 2016

Hi
I am trying to run fscrawler on my local machine to read data that has been extracted from readpst command and sending it for elasticsearh. It works fine for some iterations and then it gives following error and doesn't recover. Can some one suggest if I am doing something wrong

17:06:22,255 DEBUG [f.p.e.c.f.FsCrawlerImpl] Looking for removed files in [/MY_PATH/]...
17:06:22,255 TRACE [f.p.e.c.f.FsCrawlerImpl] Querying elasticsearch for files in dir [path.encoded:7fbc1f13665e5067aa6550ad5b4a6ba5]
17:06:22,255 WARN  [f.p.e.c.f.FsCrawlerImpl] Error while indexing content from [java.io.IOException: no active node found. Start an elasticsearch cluster first! Expecting something running at [localhost:9200], /MY_PATH]
17:06:22,255 DEBUG [f.p.e.c.f.FsCrawlerImpl] Fs crawler is going to sleep for 15m
17:06:26,295 DEBUG [f.p.e.c.f.c.BulkProcessor] Going to execute new bulk composed of 14 actions
17:06:26,295 WARN  [f.p.e.c.f.c.BulkProcessor] Error executing bulk
java.io.IOException: no active node found. Start an elasticsearch cluster first! Expecting something running at [localhost:9200]
at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.findNextNode(ElasticsearchClient.java:114) ~[fscrawler-2.1.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.bulk(ElasticsearchClient.java:243) ~[fscrawler-2.1.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.BulkProcessor.execute(BulkProcessor.java:136) ~[fscrawler-2.1.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.BulkProcessor.executeWhenNeeded(BulkProcessor.java:123) ~[fscrawler-2.1.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_111]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_111]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_111]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
@dadoonet
Copy link
Owner

In 2.2, I changed from internal home made REST client to the official REST client (#203).
It might have a better exception handling.

Any chance you can try the 2.2-SNAPSHOT version?

@dadoonet dadoonet self-assigned this Dec 27, 2016
@dadoonet dadoonet added the bug For confirmed bugs label Dec 27, 2016
@dadoonet dadoonet added this to the 2.2 milestone Dec 27, 2016
@dadoonet dadoonet changed the title fscrawler crashes and does not recover fscrawler does not recover when it lost communication with elasticsearch Dec 27, 2016
@dadoonet
Copy link
Owner

Closing for now. Feel free to reopen if you can see that problem again with 2.2.

@twindragons1987
Copy link
Author

Can you please provide me with a link to 2.2?

@c4tom
Copy link

c4tom commented Mar 23, 2017

I have, maybe, same problem (i am using last version)

09:21:45,826 WARN  [f.p.e.c.f.c.BulkProcessor] Error executing bulk
java.io.IOException: listener timeout after waiting for [30000] ms
	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:617) ~[jar:rsrc:rest-5.2.2.jar!/:?]
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[jar:rsrc:rest-5.2.2.jar!/:?]
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:184) ~[jar:rsrc:rest-5.2.2.jar!/:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.bulk(ElasticsearchClient.java:149) ~[rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.client.BulkProcessor.execute(BulkProcessor.java:157) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.client.BulkProcessor.executeIfNeeded(BulkProcessor.java:137) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.client.BulkProcessor.internalAdd(BulkProcessor.java:130) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.client.BulkProcessor.add(BulkProcessor.java:118) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.client.BulkProcessor.add(BulkProcessor.java:86) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl$FSParser.esIndex(FsCrawlerImpl.java:719) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl$FSParser.esIndex(FsCrawlerImpl.java:703) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl$FSParser.indexFile(FsCrawlerImpl.java:596) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl$FSParser.addFilesRecursively(FsCrawlerImpl.java:386) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl$FSParser.addFilesRecursively(FsCrawlerImpl.java:405) [rsrc:./:?]
	at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl$FSParser.run(FsCrawlerImpl.java:273) [rsrc:./:?]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]

@dadoonet
Copy link
Owner

@candido1212 In which context is this happening? FSCrawler was running and was able to index then suddenly started to fail?
Anything in elasticsearch logs?

If elasticsearch restarts, is FSCrawler able to recover?

Would help a lot if you can provide more details. Feel free to open another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For confirmed bugs
Projects
None yet
Development

No branches or pull requests

3 participants