River does not start #4089

jprante · 2013-11-05T15:58:54Z

In 0.90.6, creating a river seems not to start the river.

Example debug log (for JDBC river)

https://gist.github.com/jprante/7321028

What am I doing wrong?

richardwilly98 · 2013-11-05T16:03:09Z

More details here [1]

[1] - https://groups.google.com/forum/#!topic/elasticsearch/hkBqWisL4UI

imotov · 2013-11-07T01:24:02Z

I briefly looked at this issue and it looks like the problem is caused by race condition in creation of river's _meta document and propagation of cluster state with the new river mapping. Basically, river creation and startup fails when RiversRouter tries to load _meta record for the new river mapping and this _meta document doesn't exist yet.

I can only reproduce it if the _river index doesn't exists or exists but doesn't have any types in it. After I create the first dummy river, all other rivers seem to work just fine.

richardwilly98 · 2013-11-07T11:47:29Z

Even with your suggestion in my "@BeforeSuite" method:

    private void registerDummyRiver() {
        if (!node.client().admin().indices().prepareExists("_river").get().isExists()) {
            node.client().admin().indices().prepareCreate("_river").get();
        }
        if (!node.client().prepareGet("_river", "my_dummy", "_meta").get().isExists()) {
            node.client().prepareIndex("_river", "my_dummy", "_meta").setSource("{ \"type\": \"dummy\" }").get();
        }
        refreshIndex("_river");
        Assert.assertTrue(node.client().prepareGet("_river", "my_dummy", "_meta").get().isExists());
    }

I still have tests failing randomly.

…isn't With elastic#3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started. What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway. Closes elastic#4089, elastic#3840

…isn't With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started. What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway. Closes #4089, #3840

javanna · 2013-11-10T20:04:17Z

I pushed a fix for this. Could you confirm it fixes the issue you are experiencing?

richardwilly98 · 2013-11-11T15:04:32Z

@javanna it definitely help in my integration test.
All test are now passing after a small change when the river is created.

In 0.90.5 it looks like node.client().prepareIndex("_river", "my_dummy", "_meta") and wait for 1 second was enough to ensure that the river was registered and started.

So as workaround I am checking a flag attribute set by river.start() method.

Would it make sense to have such feature in ES api?

javanna · 2013-11-11T18:38:13Z

I'm not sure I got where you put the 1 second wait with 0.90.5.

Anyways, with 0.90.6 there is a different problem, as we send the mapping update to the master node (when you create the river type by indexing the _meta document) before actually indexing the documen. The mapping update on the master should trigger the river start, but it needs the _meta document to be available, which is not the case, thus in most cases rivers won't get started.

My fix addresses this for rivers by scheduling a retry (actually a few of them just in case) if the _meta document isn't found. I think this should be enough, could you confirm this?

richardwilly98 · 2013-11-11T20:27:18Z

As I said all test are now passing so your fix addresses the issue.

I am just trying to understand how I could make sure from my integration test when the river is really started. Right now (and it was already the case before I assume) there is no way to tell that from the api so it has to be built in the river.

Is that correct?

Thanks a lot for the fix.

javanna · 2013-11-11T21:08:57Z

Ok thanks @richardwilly98 , got it, I just wanted to make sure no other workarounds are needed.

I got what you are asking, you can have a wait and check if something happened (for instance check if an index is expected to be created is there). You are right, you need to handle this in the river itself. Would be even better to have different retries and a maximum wait time. We are going to package our test classes as a separate jar with the next release, so that those classes can easily be used to test plugins; the method that you'd need in this case is ElasticsearchTestCase#awaitBusy.

jprante · 2013-11-12T09:29:33Z

Tests with JDBC river now pass with ES 0.90.7-SNAPSHOT.

Thanks @javanna !

richardwilly98 · 2013-11-17T14:10:56Z

@javanna unfortunately I am still having issues with ES 0.90.7 where rivers are not started.

I have post a gist [1] with ES log level to DEBUG.
Starting line 6982 you can see the sequence after node.client().prepareIndex("_river", river, "_meta").setSource(settings).get(); I wait up to 30 seconds but the river does not start.

In my scenario rivers are started about 50 times and 14 of them fail.

[1] - https://gist.github.com/richardwilly98/7513822/raw/16e30de3f7c7e78a56b9e1bfa68c8fbbb48ad132/gistfile1.txt

javanna · 2013-11-18T09:57:15Z

@richardwilly98 that's odd. We currently retry registering the river for max 5 times, one attempt per second, in case the _meta document is not found when the master node gets the mapping update caused by the index operation of the _meta doc itself. That would mean that in your case, the _meta document takes more than 5 seconds to show up after its mapping update, which sounds strange.

Did anything change since you previously said that the issue was fixed? I'm curious on how you create the river. Is the river already registered in the cluster state or do you recreate everything from scratch in your tests?

richardwilly98 · 2013-11-18T11:44:43Z

@javanna last time I have only executed the test on a dummy river which register only 1 river but did not test it on the real river.

River is created using this code [1].
River is delete using this code [2].

[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb/blob/master/src/test/java/org/elasticsearch/river/mongodb/RiverMongoDBTestAbstract.java#L300
[2] - https://github.com/richardwilly98/elasticsearch-river-mongodb/blob/master/src/test/java/org/elasticsearch/river/mongodb/RiverMongoDBTestAbstract.java#L378

javanna · 2013-11-18T13:35:48Z

@richardwilly98 I spent some time looking at your log file. I do see that 5 attempts are not enough in some cases, but it's hard to understand why and if it's a test problem or not. Do you have failures on the same tests or random ones? Can you make logging more specific to rivers, log when the before test method starts and ends, same for the after method and run it again? Maybe that would help understanding what's going on.

@jprante everything ok with your river or do you have the same problem?

richardwilly98 · 2013-11-18T13:45:58Z

@javanna
The failures are random. The same tests work fine with ES 0.90.5.
What king of information should I log in before / after test method?

jprante · 2013-11-18T14:32:16Z

@javanna With 0.90.7, I can create JDBC river instances flawlessly (a manual test succeeds, also junit tests). If required, I could spend some time on setting up random test creations of river instances.

javanna · 2013-11-18T14:48:23Z

Thanks @jprante for your feedback! We do have a test for river creation now, which is part of our randomized tests that countinuously run. That test never failed since I added it, maybe it's just not nasty enough.

@richardwilly98 I just want to be able to read those logs properly. From the big log you attached it's hard to understand when a test method starts, what is part of the before test and what is part of the after test cleanup. We need to isolate the failures in order to understand what caused them.

richardwilly98 · 2013-11-18T14:59:36Z

@javanna I understand I will produce better logs and isolate a working test and a failed one.

richardwilly98 · 2013-11-18T16:19:31Z

@javanna
I have posted a new gist [1]. It contains 2 file:

RiverMongoInitialImportTest a successful test
RiverMongoInitialTimestampTest test where river is not started

[1] - https://gist.github.com/richardwilly98/7530444/edit

richardwilly98 · 2013-11-19T11:03:29Z

@javanna
Do not spend time on analyzing the logs. That might be due to the way river closing is handled on my side.
I will keep you updated.

richardwilly98 · 2013-11-19T13:48:46Z

@javanna
I had an issue on my side where one thread was not correctly stopped on closing the river.
I am not sure the reason tests were still successful with ES 0.90.5...

javanna · 2013-11-19T14:54:38Z

Ok, thanks @richardwilly98 . Does it mean that all your tests are consistently green now?

richardwilly98 · 2013-11-19T14:57:08Z

Yes

javanna · 2013-11-19T15:03:45Z

Cool, glad to hear that!

ryan1234 · 2014-01-21T06:08:36Z

I'm having similar problems with 0.90.9 and the JDBC river plugin. I have a simple two node cluster and the JDBC river seems to not start roughly 4 out of 5 times. I have another custom river running and that always starts and runs. The DEBUG log for my cluster is obviously very verbose. What does a retry look like in the logs?

From reading through the logs I can't seem to find any errors or warnings about starting the JDBC river. When I restart the master node it just seems to ignore that river altogether.

dadoonet · 2014-01-21T06:20:02Z

Are you using templates?

ryan1234 · 2014-01-21T06:22:24Z

Sorry for my ignorance, but what is a template?

dadoonet · 2014-01-21T06:24:38Z

Forget my previous comment. There is an issue #4577 which I think could produce this when you have more than one river running.

ryan1234 · 2014-01-21T06:28:55Z

We don't have any templates defined. I have seen an example where more than one river runs successfully one our cluster. Is it a rare use case to have more than one river running?

ryan1234 · 2014-01-21T19:36:30Z

Some additional notes:

I ended up scaling down from a cluster of two nodes to one node. The rivers start reliably on one node for whatever reason. For the time being we'll scale that node up and hopefully in a month or two we'll work through this issue.
In this function: https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/river/routing/RiversRouter.java#L113 I see various logging lines as Elastic looks for the _meta documents. I can't find any of the _meta document found. or the failed to get/parse _meta for lines in my log file even though it is set to DEBUG. I would assume no matter what, when a master node is restarted for a cluster that one of those lines would be logged.
I keep seeing lines like this in my log:

[2014-01-21 19:27:59,091][DEBUG][river.cluster ] [Isaac] processing [reroute_rivers_node_changed]: execute
[2014-01-21 19:27:59,097][DEBUG][river.cluster ] [Isaac] processing [reroute_rivers_node_changed]: no change in cluster_state

The missing log lines plus the "no change in cluster_state" make me think the updateRiverClusterState function is never getting fired. Am I way off on that?

jprante · 2014-01-21T21:47:28Z

@ryan1234 is it possible you can try JDBC river 1.0.0.RC1.2 if the problem still persists? I just tested a simple MySQL river with two nodes in all combinations with 1.0.0.RC1.2 but I have no problems with starting rivers.

ryan1234 · 2014-01-22T00:22:12Z

@jprante I just removed the old JDBC plugin and installed 1.0.0.RC1.2. Does that require ES 1.0.0.RC1? I'm getting what looks like dependency injection errors in my log now (I'm doing this on ES 0.90.9):

org.elasticsearch.common.inject.CreationException: Guice creation errors:

Error injecting constructor, java.util.ServiceConfigurationError: org.xbib.elasticsearch.river.jdbc.RiverMouth: Provider org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth could not be instantiated: java.lang.NoClassDefFoundError: org/elasticsearch/ElasticsearchTimeoutException
at org.xbib.elasticsearch.river.jdbc.JDBCRiver.(Unknown Source)
while locating org.xbib.elasticsearch.river.jdbc.JDBCRiver
while locating org.elasticsearch.river.River

1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:178)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:139)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:272)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:266)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:93)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.util.ServiceConfigurationError: org.xbib.elasticsearch.river.jdbc.RiverMouth: Provider org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth could not be instantiated: java.lang.NoClassDefFoundError: org/elasticsearch/ElasticsearchTimeoutException

Only one jar was placed in the /plugin/river-jdbc directory. That's normal right?

jprante · 2014-01-22T06:58:26Z

Yes, the version number says it all, JDBC river 1.0.0.RC1.2 is for ES 1.0.0.RC1.
You need to add a JDBC driver in the river-jdbc directory.

javanna · 2014-01-22T20:34:50Z

@ryan1234 the _meta document found log line was just added and will be released with the next version. On the other hand, you should find log lines that tell that the _meta document wasn't found if anything went wrong after registering the river.

Those log lines you mention (processing [reroute_rivers_node_changed]: execute and processing [reroute_rivers_node_changed]: no change in cluster_state) are normal, nothing you need to worry about.

Are you maybe deleting and re-registering the same river straight-away?

ryan1234 · 2014-01-22T20:39:00Z

@javanna I have 6 JDBC rivers and I register them once with the master node and then I leave them alone. When I restart the master node, that's when the race condition seems to happen.

I'm going to finish setting up my test cluster with 1.0.0.RC1 as suggested by @jprante later this week. We'll see if that fixes the problem. I think we just might be a little behind on the versions and that this is fixed with 1.0.0.RC1.

Everything is fine with one node. In other words, when I shut down the second node (the non-master node), the rivers start fine every time.

jprante · 2014-01-22T21:09:56Z

@ryan1234 maybe not the right place here under this issue, but I'd like to point out that with JDBC River 1.0.0.RC1.2, you can issue a series of SQL statements in a single JDBC river instance. Also, cron expressions can be defined for improved regular execution. This eases the river instance handling significantly.

javanna · 2014-01-23T14:09:13Z

@ryan1234 I think I found the issue you encountered. Have a look at #4864 , we also had a test that failed once in a while suffering from the very same problem.

Makes sense that on one node everything was working fine, as there was only a single shard for the _river index, no replicas. Makes sense also that you weren't seeing any log lines, as the missing _meta document was in another class (RiversService.ApplyRivers) and nothing was logged in that case.

ryan1234 · 2014-01-27T04:27:17Z

@jprante @javanna Sorry it took me a while to get back. Busy getting ready for a big production push.

Elastic and JDBC driver version 1.0.0RC1 resolved the issue. I was able to get two nodes up in a cluster and have the JDBC rivers start reliably every time. For our production launch we're going to go with a single node for a few months and then eventually move to 1.0.0.

Thanks for the help!

richardwilly98 · 2014-02-04T11:09:56Z

This issue has been reported by user with 0.90.11 (see richardwilly98/elasticsearch-river-mongodb#208)

Running mvn test on https://github.com/richardwilly98/my-dummy-river fail with ES 0.90.11

javanna · 2014-02-04T12:31:57Z

Hi @richardwilly98 ,
this issue is closed as it was solved, and test coverage was added around it, have a look at RiverTests to know more.

I just downloaded your dummy river and ran tests successfully, can you be more specific on why things fail for you and how? Maybe also open a new issue if the problem persists, adding your findings to it?

Thanks!

…isn't With elastic#3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started. What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway. Closes elastic#4089, elastic#3840

richardwilly98 mentioned this issue Nov 6, 2013

Upgrade to ElasticSearch 0.90.6 richardwilly98/elasticsearch-river-mongodb#166

Closed

ghost assigned javanna Nov 10, 2013

javanna mentioned this issue Nov 10, 2013

Make sure rivers get started #4143

Closed

javanna closed this as completed in b7cc378 Nov 10, 2013

javanna mentioned this issue Nov 10, 2013

river instance not created after successful creation of _meta document #3840

Closed

richardwilly98 mentioned this issue Nov 17, 2013

Upgrade to 0.90.7 richardwilly98/elasticsearch-river-mongodb#173

Closed

javanna mentioned this issue Jan 23, 2014

Rivers might not get started due to missing _meta document #4864

Closed

richardwilly98 mentioned this issue Mar 20, 2014

CreationException richardwilly98/elasticsearch-river-mongodb#237

Open

lukas-vlcek mentioned this issue Apr 8, 2014

Upgrade to Elasticsearch 1.3.0 searchisko/searchisko#48

Closed

River does not start #4089

River does not start #4089

Comments

jprante commented Nov 5, 2013

richardwilly98 commented Nov 5, 2013

imotov commented Nov 7, 2013

richardwilly98 commented Nov 7, 2013

javanna commented Nov 10, 2013

richardwilly98 commented Nov 11, 2013

javanna commented Nov 11, 2013

richardwilly98 commented Nov 11, 2013

javanna commented Nov 11, 2013

jprante commented Nov 12, 2013

richardwilly98 commented Nov 17, 2013

javanna commented Nov 18, 2013

richardwilly98 commented Nov 18, 2013

javanna commented Nov 18, 2013

richardwilly98 commented Nov 18, 2013

jprante commented Nov 18, 2013

javanna commented Nov 18, 2013

richardwilly98 commented Nov 18, 2013

richardwilly98 commented Nov 18, 2013

richardwilly98 commented Nov 19, 2013

richardwilly98 commented Nov 19, 2013

javanna commented Nov 19, 2013

richardwilly98 commented Nov 19, 2013

javanna commented Nov 19, 2013

ryan1234 commented Jan 21, 2014

dadoonet commented Jan 21, 2014

ryan1234 commented Jan 21, 2014

dadoonet commented Jan 21, 2014

ryan1234 commented Jan 21, 2014

ryan1234 commented Jan 21, 2014

jprante commented Jan 21, 2014

ryan1234 commented Jan 22, 2014

jprante commented Jan 22, 2014

javanna commented Jan 22, 2014

ryan1234 commented Jan 22, 2014

jprante commented Jan 22, 2014

javanna commented Jan 23, 2014

ryan1234 commented Jan 27, 2014

richardwilly98 commented Feb 4, 2014

javanna commented Feb 4, 2014