Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE in elasticsearch CollapsingSpout #595

Closed
noerw opened this Issue Aug 2, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@noerw
Copy link

noerw commented Aug 2, 2018

com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout throws a NPE due to lastDate being null:

https://github.com/DigitalPebble/storm-crawler/blob/1.10/external/elasticsearch/src/main/java/com/digitalpebble/stormcrawler/elasticsearch/persistence/CollapsingSpout.java#L187

To reproduce, just configure an empty or nonexistent status index.

v1.9 is also affected. AggregationSpout might be affected as well.

noerw added a commit to 52North/ecmwf-dataset-crawl that referenced this issue Aug 2, 2018

@jnioche jnioche added this to the 1.11 milestone Aug 3, 2018

@jnioche

This comment has been minimized.

Copy link
Member

jnioche commented Aug 3, 2018

It happens with an empty status index. If the index does not exist at all we get

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index]

which is normal and expected. The init script must be called prior to running the topology.

The issue happens with the CollapsingSpout only, not with the AggregationSpout. I'll push a fix as soon as github is working again. Can't pull or push at the moment.

jnioche added a commit that referenced this issue Aug 3, 2018

@jnioche

This comment has been minimized.

Copy link
Member

jnioche commented Aug 3, 2018

thanks for reporting it @noerw
please give the master branch a try

@jnioche jnioche closed this Aug 3, 2018

@noerw

This comment has been minimized.

Copy link
Author

noerw commented Aug 3, 2018

Thanks for the fix!
I used a wildcard for the status index crawlstatus-*, and in that case there is no index_not_found_exception when there is no match, but a valid response with no hits (ES 6.2).

@jcruzmartini

This comment has been minimized.

Copy link
Contributor

jcruzmartini commented Aug 3, 2018

Hi @jnioche , we have been dealing with something similar on these days in our project, is same scenario because we are using also status* as @noerw is using, when we do not have any status index created yet, we are getting an exception on StatusMetricBolt.

87152 [Thread-62-status_metrics-executor[24 24]] INFO c.d.s.e.m.StatusMetricsBolt - Multiquery returned in 279 msec 87164 [Thread-62-status_metrics-executor[24 24]] ERROR o.a.s.util - Async loop died! java.lang.RuntimeException: java.lang.NullPointerException at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:522) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:487) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:74) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$fn__10795$fn__10808$fn__10861.invoke(executor.clj:861) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) [storm-core-1.2.2.jar:1.2.2] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171] Caused by: java.lang.NullPointerException at com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt.execute(StatusMetricsBolt.java:145) ~[stormcrawler-1.1.0.jar:?] at org.apache.storm.daemon.executor$fn__10795$tuple_action_fn__10797.invoke(executor.clj:739) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.executor$mk_task_receiver$fn__10716.invoke(executor.clj:471) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.disruptor$clojure_handler$reify__10135.onEvent(disruptor.clj:41) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:509) ~[storm-core-1.2.2.jar:1.2.2] ... 6 more

if you want and you think that this will add some value to the project, I can create a PR for fixing that NPE that is being thrown when there is not any index created yet.

Just a comment to add more information to @noerw scenario, we are getting a NPE also in AggregationSpout when using status*, but spout is not crashing so is more or less like a warning

11018 [I/O dispatcher 2] ERROR c.d.s.e.p.AggregationSpout - Exception with ES query java.io.IOException: Unable to parse response body for Response{requestLine=POST /status*/status/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A6&search_type=query_then_fetch&batched_reduce_size=512 HTTP/1.1, host=http://localhost:9200, response=HTTP/1.1 200 OK} at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:582) [stormcrawler-1.1.0.jar:?] at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:621) [stormcrawler-1.1.0.jar:?] at org.elasticsearch.client.RestClient$1.completed(RestClient.java:375) [stormcrawler-1.1.0.jar:?] at org.elasticsearch.client.RestClient$1.completed(RestClient.java:366) [stormcrawler-1.1.0.jar:?] at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:123) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) [stormcrawler-1.1.0.jar:?] at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) [stormcrawler-1.1.0.jar:?] at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [stormcrawler-1.1.0.jar:?] at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) [stormcrawler-1.1.0.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25] Caused by: java.lang.NullPointerException
Thanks in advance

@jnioche

This comment has been minimized.

Copy link
Member

jnioche commented Aug 3, 2018

Hi @jcruzmartini
StatusMetricBolt -> yes please a PR would be welcome. I'll try to reproduce the problem
AggregationSpout-> do you have the source of the NullPointerException further down in the logs? Could you please open a separate issue for it?
Thanks

@jcruzmartini

This comment has been minimized.

Copy link
Contributor

jcruzmartini commented Aug 3, 2018

sure! thanks @jnioche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.