New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
River does not start #4089
Comments
More details here [1] [1] - https://groups.google.com/forum/#!topic/elasticsearch/hkBqWisL4UI |
I briefly looked at this issue and it looks like the problem is caused by race condition in creation of river's I can only reproduce it if the |
Even with your suggestion in my "@BeforeSuite" method: private void registerDummyRiver() {
if (!node.client().admin().indices().prepareExists("_river").get().isExists()) {
node.client().admin().indices().prepareCreate("_river").get();
}
if (!node.client().prepareGet("_river", "my_dummy", "_meta").get().isExists()) {
node.client().prepareIndex("_river", "my_dummy", "_meta").setSource("{ \"type\": \"dummy\" }").get();
}
refreshIndex("_river");
Assert.assertTrue(node.client().prepareGet("_river", "my_dummy", "_meta").get().isExists());
} I still have tests failing randomly. |
…isn't With elastic#3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started. What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway. Closes elastic#4089, elastic#3840
…isn't With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started. What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway. Closes #4089, #3840
I pushed a fix for this. Could you confirm it fixes the issue you are experiencing? |
@javanna it definitely help in my integration test. In 0.90.5 it looks like So as workaround I am checking a flag attribute set by Would it make sense to have such feature in ES api? |
I'm not sure I got where you put the 1 second wait with 0.90.5. Anyways, with 0.90.6 there is a different problem, as we send the mapping update to the master node (when you create the river type by indexing the My fix addresses this for rivers by scheduling a retry (actually a few of them just in case) if the |
As I said all test are now passing so your fix addresses the issue. I am just trying to understand how I could make sure from my integration test when the river is really started. Right now (and it was already the case before I assume) there is no way to tell that from the api so it has to be built in the river. Is that correct? Thanks a lot for the fix. |
Ok thanks @richardwilly98 , got it, I just wanted to make sure no other workarounds are needed. I got what you are asking, you can have a wait and check if something happened (for instance check if an index is expected to be created is there). You are right, you need to handle this in the river itself. Would be even better to have different retries and a maximum wait time. We are going to package our test classes as a separate jar with the next release, so that those classes can easily be used to test plugins; the method that you'd need in this case is |
Tests with JDBC river now pass with ES 0.90.7-SNAPSHOT. Thanks @javanna ! |
@javanna unfortunately I am still having issues with ES 0.90.7 where rivers are not started. I have post a gist [1] with ES log level to DEBUG. In my scenario rivers are started about 50 times and 14 of them fail. |
@richardwilly98 that's odd. We currently retry registering the river for max 5 times, one attempt per second, in case the Did anything change since you previously said that the issue was fixed? I'm curious on how you create the river. Is the river already registered in the cluster state or do you recreate everything from scratch in your tests? |
@javanna last time I have only executed the test on a dummy river which register only 1 river but did not test it on the real river.
[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb/blob/master/src/test/java/org/elasticsearch/river/mongodb/RiverMongoDBTestAbstract.java#L300 |
@richardwilly98 I spent some time looking at your log file. I do see that 5 attempts are not enough in some cases, but it's hard to understand why and if it's a test problem or not. Do you have failures on the same tests or random ones? Can you make logging more specific to rivers, log when the before test method starts and ends, same for the after method and run it again? Maybe that would help understanding what's going on. @jprante everything ok with your river or do you have the same problem? |
@javanna |
@javanna With 0.90.7, I can create JDBC river instances flawlessly (a manual test succeeds, also junit tests). If required, I could spend some time on setting up random test creations of river instances. |
Thanks @jprante for your feedback! We do have a test for river creation now, which is part of our randomized tests that countinuously run. That test never failed since I added it, maybe it's just not nasty enough. @richardwilly98 I just want to be able to read those logs properly. From the big log you attached it's hard to understand when a test method starts, what is part of the before test and what is part of the after test cleanup. We need to isolate the failures in order to understand what caused them. |
@javanna I understand I will produce better logs and isolate a working test and a failed one. |
@javanna
|
@javanna |
@javanna |
Ok, thanks @richardwilly98 . Does it mean that all your tests are consistently green now? |
Yes |
Cool, glad to hear that! |
I'm having similar problems with 0.90.9 and the JDBC river plugin. I have a simple two node cluster and the JDBC river seems to not start roughly 4 out of 5 times. I have another custom river running and that always starts and runs. The DEBUG log for my cluster is obviously very verbose. What does a retry look like in the logs? From reading through the logs I can't seem to find any errors or warnings about starting the JDBC river. When I restart the master node it just seems to ignore that river altogether. |
Are you using templates? |
Sorry for my ignorance, but what is a template? |
Forget my previous comment. There is an issue #4577 which I think could produce this when you have more than one river running. |
We don't have any templates defined. I have seen an example where more than one river runs successfully one our cluster. Is it a rare use case to have more than one river running? |
Some additional notes:
[2014-01-21 19:27:59,091][DEBUG][river.cluster ] [Isaac] processing [reroute_rivers_node_changed]: execute
|
@ryan1234 is it possible you can try JDBC river 1.0.0.RC1.2 if the problem still persists? I just tested a simple MySQL river with two nodes in all combinations with 1.0.0.RC1.2 but I have no problems with starting rivers. |
@jprante I just removed the old JDBC plugin and installed 1.0.0.RC1.2. Does that require ES 1.0.0.RC1? I'm getting what looks like dependency injection errors in my log now (I'm doing this on ES 0.90.9): org.elasticsearch.common.inject.CreationException: Guice creation errors:
1 error Only one jar was placed in the /plugin/river-jdbc directory. That's normal right? |
Yes, the version number says it all, JDBC river 1.0.0.RC1.2 is for ES 1.0.0.RC1. |
@ryan1234 the Those log lines you mention ( Are you maybe deleting and re-registering the same river straight-away? |
@javanna I have 6 JDBC rivers and I register them once with the master node and then I leave them alone. When I restart the master node, that's when the race condition seems to happen. I'm going to finish setting up my test cluster with 1.0.0.RC1 as suggested by @jprante later this week. We'll see if that fixes the problem. I think we just might be a little behind on the versions and that this is fixed with 1.0.0.RC1. Everything is fine with one node. In other words, when I shut down the second node (the non-master node), the rivers start fine every time. |
@ryan1234 maybe not the right place here under this issue, but I'd like to point out that with JDBC River 1.0.0.RC1.2, you can issue a series of SQL statements in a single JDBC river instance. Also, cron expressions can be defined for improved regular execution. This eases the river instance handling significantly. |
@ryan1234 I think I found the issue you encountered. Have a look at #4864 , we also had a test that failed once in a while suffering from the very same problem. Makes sense that on one node everything was working fine, as there was only a single shard for the _river index, no replicas. Makes sense also that you weren't seeing any log lines, as the missing _meta document was in another class ( |
@jprante @javanna Sorry it took me a while to get back. Busy getting ready for a big production push. Elastic and JDBC driver version 1.0.0RC1 resolved the issue. I was able to get two nodes up in a cluster and have the JDBC rivers start reliably every time. For our production launch we're going to go with a single node for a few months and then eventually move to 1.0.0. Thanks for the help! |
This issue has been reported by user with 0.90.11 (see richardwilly98/elasticsearch-river-mongodb#208) Running |
Hi @richardwilly98 , I just downloaded your dummy river and ran tests successfully, can you be more specific on why things fail for you and how? Maybe also open a new issue if the problem persists, adding your findings to it? Thanks! |
…isn't With elastic#3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started. What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway. Closes elastic#4089, elastic#3840
In 0.90.6, creating a river seems not to start the river.
Example debug log (for JDBC river)
https://gist.github.com/jprante/7321028
What am I doing wrong?
The text was updated successfully, but these errors were encountered: