Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

River does not start #4089

Closed
jprante opened this issue Nov 5, 2013 · 39 comments
Closed

River does not start #4089

jprante opened this issue Nov 5, 2013 · 39 comments

Comments

@jprante
Copy link
Contributor

jprante commented Nov 5, 2013

In 0.90.6, creating a river seems not to start the river.

Example debug log (for JDBC river)

https://gist.github.com/jprante/7321028

What am I doing wrong?

@richardwilly98
Copy link
Contributor

More details here [1]

[1] - https://groups.google.com/forum/#!topic/elasticsearch/hkBqWisL4UI

@imotov
Copy link
Contributor

imotov commented Nov 7, 2013

I briefly looked at this issue and it looks like the problem is caused by race condition in creation of river's _meta document and propagation of cluster state with the new river mapping. Basically, river creation and startup fails when RiversRouter tries to load _meta record for the new river mapping and this _meta document doesn't exist yet.

I can only reproduce it if the _river index doesn't exists or exists but doesn't have any types in it. After I create the first dummy river, all other rivers seem to work just fine.

@richardwilly98
Copy link
Contributor

Even with your suggestion in my "@BeforeSuite" method:

    private void registerDummyRiver() {
        if (!node.client().admin().indices().prepareExists("_river").get().isExists()) {
            node.client().admin().indices().prepareCreate("_river").get();
        }
        if (!node.client().prepareGet("_river", "my_dummy", "_meta").get().isExists()) {
            node.client().prepareIndex("_river", "my_dummy", "_meta").setSource("{ \"type\": \"dummy\" }").get();
        }
        refreshIndex("_river");
        Assert.assertTrue(node.client().prepareGet("_river", "my_dummy", "_meta").get().isExists());
    }

I still have tests failing randomly.

@ghost ghost assigned javanna Nov 10, 2013
javanna added a commit to javanna/elasticsearch that referenced this issue Nov 10, 2013
…isn't

With elastic#3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.

Closes elastic#4089, elastic#3840
javanna added a commit that referenced this issue Nov 10, 2013
…isn't

With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.

Closes #4089, #3840
@javanna
Copy link
Member

javanna commented Nov 10, 2013

I pushed a fix for this. Could you confirm it fixes the issue you are experiencing?

@richardwilly98
Copy link
Contributor

@javanna it definitely help in my integration test.
All test are now passing after a small change when the river is created.

In 0.90.5 it looks like node.client().prepareIndex("_river", "my_dummy", "_meta") and wait for 1 second was enough to ensure that the river was registered and started.

So as workaround I am checking a flag attribute set by river.start() method.

Would it make sense to have such feature in ES api?

@javanna
Copy link
Member

javanna commented Nov 11, 2013

I'm not sure I got where you put the 1 second wait with 0.90.5.

Anyways, with 0.90.6 there is a different problem, as we send the mapping update to the master node (when you create the river type by indexing the _meta document) before actually indexing the documen. The mapping update on the master should trigger the river start, but it needs the _meta document to be available, which is not the case, thus in most cases rivers won't get started.

My fix addresses this for rivers by scheduling a retry (actually a few of them just in case) if the _meta document isn't found. I think this should be enough, could you confirm this?

@richardwilly98
Copy link
Contributor

As I said all test are now passing so your fix addresses the issue.

I am just trying to understand how I could make sure from my integration test when the river is really started. Right now (and it was already the case before I assume) there is no way to tell that from the api so it has to be built in the river.

Is that correct?

Thanks a lot for the fix.

@javanna
Copy link
Member

javanna commented Nov 11, 2013

Ok thanks @richardwilly98 , got it, I just wanted to make sure no other workarounds are needed.

I got what you are asking, you can have a wait and check if something happened (for instance check if an index is expected to be created is there). You are right, you need to handle this in the river itself. Would be even better to have different retries and a maximum wait time. We are going to package our test classes as a separate jar with the next release, so that those classes can easily be used to test plugins; the method that you'd need in this case is ElasticsearchTestCase#awaitBusy.

@jprante
Copy link
Contributor Author

jprante commented Nov 12, 2013

Tests with JDBC river now pass with ES 0.90.7-SNAPSHOT.

Thanks @javanna !

@richardwilly98
Copy link
Contributor

@javanna unfortunately I am still having issues with ES 0.90.7 where rivers are not started.

I have post a gist [1] with ES log level to DEBUG.
Starting line 6982 you can see the sequence after node.client().prepareIndex("_river", river, "_meta").setSource(settings).get(); I wait up to 30 seconds but the river does not start.

In my scenario rivers are started about 50 times and 14 of them fail.

[1] - https://gist.github.com/richardwilly98/7513822/raw/16e30de3f7c7e78a56b9e1bfa68c8fbbb48ad132/gistfile1.txt

@javanna
Copy link
Member

javanna commented Nov 18, 2013

@richardwilly98 that's odd. We currently retry registering the river for max 5 times, one attempt per second, in case the _meta document is not found when the master node gets the mapping update caused by the index operation of the _meta doc itself. That would mean that in your case, the _meta document takes more than 5 seconds to show up after its mapping update, which sounds strange.

Did anything change since you previously said that the issue was fixed? I'm curious on how you create the river. Is the river already registered in the cluster state or do you recreate everything from scratch in your tests?

@richardwilly98
Copy link
Contributor

@javanna last time I have only executed the test on a dummy river which register only 1 river but did not test it on the real river.

  • River is created using this code [1].
  • River is delete using this code [2].

[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb/blob/master/src/test/java/org/elasticsearch/river/mongodb/RiverMongoDBTestAbstract.java#L300
[2] - https://github.com/richardwilly98/elasticsearch-river-mongodb/blob/master/src/test/java/org/elasticsearch/river/mongodb/RiverMongoDBTestAbstract.java#L378

@javanna
Copy link
Member

javanna commented Nov 18, 2013

@richardwilly98 I spent some time looking at your log file. I do see that 5 attempts are not enough in some cases, but it's hard to understand why and if it's a test problem or not. Do you have failures on the same tests or random ones? Can you make logging more specific to rivers, log when the before test method starts and ends, same for the after method and run it again? Maybe that would help understanding what's going on.

@jprante everything ok with your river or do you have the same problem?

@richardwilly98
Copy link
Contributor

@javanna
The failures are random. The same tests work fine with ES 0.90.5.
What king of information should I log in before / after test method?

@jprante
Copy link
Contributor Author

jprante commented Nov 18, 2013

@javanna With 0.90.7, I can create JDBC river instances flawlessly (a manual test succeeds, also junit tests). If required, I could spend some time on setting up random test creations of river instances.

@javanna
Copy link
Member

javanna commented Nov 18, 2013

Thanks @jprante for your feedback! We do have a test for river creation now, which is part of our randomized tests that countinuously run. That test never failed since I added it, maybe it's just not nasty enough.

@richardwilly98 I just want to be able to read those logs properly. From the big log you attached it's hard to understand when a test method starts, what is part of the before test and what is part of the after test cleanup. We need to isolate the failures in order to understand what caused them.

@richardwilly98
Copy link
Contributor

@javanna I understand I will produce better logs and isolate a working test and a failed one.

@richardwilly98
Copy link
Contributor

@javanna
I have posted a new gist [1]. It contains 2 file:

  • RiverMongoInitialImportTest a successful test
  • RiverMongoInitialTimestampTest test where river is not started

[1] - https://gist.github.com/richardwilly98/7530444/edit

@richardwilly98
Copy link
Contributor

@javanna
Do not spend time on analyzing the logs. That might be due to the way river closing is handled on my side.
I will keep you updated.

@richardwilly98
Copy link
Contributor

@javanna
I had an issue on my side where one thread was not correctly stopped on closing the river.
I am not sure the reason tests were still successful with ES 0.90.5...

@javanna
Copy link
Member

javanna commented Nov 19, 2013

Ok, thanks @richardwilly98 . Does it mean that all your tests are consistently green now?

@richardwilly98
Copy link
Contributor

Yes

@javanna
Copy link
Member

javanna commented Nov 19, 2013

Cool, glad to hear that!

@ryan1234
Copy link

I'm having similar problems with 0.90.9 and the JDBC river plugin. I have a simple two node cluster and the JDBC river seems to not start roughly 4 out of 5 times. I have another custom river running and that always starts and runs. The DEBUG log for my cluster is obviously very verbose. What does a retry look like in the logs?

From reading through the logs I can't seem to find any errors or warnings about starting the JDBC river. When I restart the master node it just seems to ignore that river altogether.

@dadoonet
Copy link
Member

Are you using templates?

@ryan1234
Copy link

Sorry for my ignorance, but what is a template?

@dadoonet
Copy link
Member

Forget my previous comment. There is an issue #4577 which I think could produce this when you have more than one river running.

@ryan1234
Copy link

We don't have any templates defined. I have seen an example where more than one river runs successfully one our cluster. Is it a rare use case to have more than one river running?

@ryan1234
Copy link

Some additional notes:

  • I ended up scaling down from a cluster of two nodes to one node. The rivers start reliably on one node for whatever reason. For the time being we'll scale that node up and hopefully in a month or two we'll work through this issue.
  • In this function: https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/river/routing/RiversRouter.java#L113 I see various logging lines as Elastic looks for the _meta documents. I can't find any of the _meta document found. or the failed to get/parse _meta for lines in my log file even though it is set to DEBUG. I would assume no matter what, when a master node is restarted for a cluster that one of those lines would be logged.
  • I keep seeing lines like this in my log:

[2014-01-21 19:27:59,091][DEBUG][river.cluster ] [Isaac] processing [reroute_rivers_node_changed]: execute
[2014-01-21 19:27:59,097][DEBUG][river.cluster ] [Isaac] processing [reroute_rivers_node_changed]: no change in cluster_state

  • The missing log lines plus the "no change in cluster_state" make me think the updateRiverClusterState function is never getting fired. Am I way off on that?

@jprante
Copy link
Contributor Author

jprante commented Jan 21, 2014

@ryan1234 is it possible you can try JDBC river 1.0.0.RC1.2 if the problem still persists? I just tested a simple MySQL river with two nodes in all combinations with 1.0.0.RC1.2 but I have no problems with starting rivers.

@ryan1234
Copy link

@jprante I just removed the old JDBC plugin and installed 1.0.0.RC1.2. Does that require ES 1.0.0.RC1? I'm getting what looks like dependency injection errors in my log now (I'm doing this on ES 0.90.9):

org.elasticsearch.common.inject.CreationException: Guice creation errors:

  1. Error injecting constructor, java.util.ServiceConfigurationError: org.xbib.elasticsearch.river.jdbc.RiverMouth: Provider org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth could not be instantiated: java.lang.NoClassDefFoundError: org/elasticsearch/ElasticsearchTimeoutException
    at org.xbib.elasticsearch.river.jdbc.JDBCRiver.(Unknown Source)
    while locating org.xbib.elasticsearch.river.jdbc.JDBCRiver
    while locating org.elasticsearch.river.River

1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:178)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:139)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:272)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:266)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:93)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.util.ServiceConfigurationError: org.xbib.elasticsearch.river.jdbc.RiverMouth: Provider org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth could not be instantiated: java.lang.NoClassDefFoundError: org/elasticsearch/ElasticsearchTimeoutException

Only one jar was placed in the /plugin/river-jdbc directory. That's normal right?

@jprante
Copy link
Contributor Author

jprante commented Jan 22, 2014

Yes, the version number says it all, JDBC river 1.0.0.RC1.2 is for ES 1.0.0.RC1.
You need to add a JDBC driver in the river-jdbc directory.

@javanna
Copy link
Member

javanna commented Jan 22, 2014

@ryan1234 the _meta document found log line was just added and will be released with the next version. On the other hand, you should find log lines that tell that the _meta document wasn't found if anything went wrong after registering the river.

Those log lines you mention (processing [reroute_rivers_node_changed]: execute and processing [reroute_rivers_node_changed]: no change in cluster_state) are normal, nothing you need to worry about.

Are you maybe deleting and re-registering the same river straight-away?

@ryan1234
Copy link

@javanna I have 6 JDBC rivers and I register them once with the master node and then I leave them alone. When I restart the master node, that's when the race condition seems to happen.

I'm going to finish setting up my test cluster with 1.0.0.RC1 as suggested by @jprante later this week. We'll see if that fixes the problem. I think we just might be a little behind on the versions and that this is fixed with 1.0.0.RC1.

Everything is fine with one node. In other words, when I shut down the second node (the non-master node), the rivers start fine every time.

@jprante
Copy link
Contributor Author

jprante commented Jan 22, 2014

@ryan1234 maybe not the right place here under this issue, but I'd like to point out that with JDBC River 1.0.0.RC1.2, you can issue a series of SQL statements in a single JDBC river instance. Also, cron expressions can be defined for improved regular execution. This eases the river instance handling significantly.

@javanna
Copy link
Member

javanna commented Jan 23, 2014

@ryan1234 I think I found the issue you encountered. Have a look at #4864 , we also had a test that failed once in a while suffering from the very same problem.

Makes sense that on one node everything was working fine, as there was only a single shard for the _river index, no replicas. Makes sense also that you weren't seeing any log lines, as the missing _meta document was in another class (RiversService.ApplyRivers) and nothing was logged in that case.

@ryan1234
Copy link

@jprante @javanna Sorry it took me a while to get back. Busy getting ready for a big production push.

Elastic and JDBC driver version 1.0.0RC1 resolved the issue. I was able to get two nodes up in a cluster and have the JDBC rivers start reliably every time. For our production launch we're going to go with a single node for a few months and then eventually move to 1.0.0.

Thanks for the help!

@richardwilly98
Copy link
Contributor

This issue has been reported by user with 0.90.11 (see richardwilly98/elasticsearch-river-mongodb#208)

Running mvn test on https://github.com/richardwilly98/my-dummy-river fail with ES 0.90.11

@javanna
Copy link
Member

javanna commented Feb 4, 2014

Hi @richardwilly98 ,
this issue is closed as it was solved, and test coverage was added around it, have a look at RiverTests to know more.

I just downloaded your dummy river and ran tests successfully, can you be more specific on why things fail for you and how? Maybe also open a new issue if the problem persists, adding your findings to it?

Thanks!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…isn't

With elastic#3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.

Closes elastic#4089, elastic#3840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants