New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HiveTap is not concurrency safe. #14
Comments
Doing the SQL error message check will not work, since the MetaStore can use a different DB backend like mysql: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastoreDatabase If your tables do not exist, you can call createRessource() on the HivePartitionTap before running the flow, which wont work if SinkMode == REPLACE, since it will delete the table again before starting the flow. A structural way to solve it would be to inject a FlowListener that reacts to onStarting and makes sure, the DB is registered in the MetaStore, before anything else happens. I have to experiment with that a bit. |
Agreed, this "fix" leaves a lot to be desired. In our particular case this issues comes up when we do unit testing where we run jobs locally against Hive's in memory derby db. Hive has the ability to retry metastore commands/connections. However, at least when running this locally with our version of Hive (0.10-cdh4.6) this has another bug around mismatch between started and closed transactions which causes another failure. This issue will also occur when running on the cluster. However, in that case I think both the Hive retrying connections and MR restarting tasks cause to overall job to succeed. On the cluster we mostly get failed tasks around https://github.com/CommBank/ebenezer/issues/34 which I think is caused by trying to register the same partition at the same time. Do you think you will be able to address both the issue of creating the same database only once across multiple taps and creating each partition only once accross multiple MR tasks without changes to cascading? |
What we could do, is simply catch the Exception and retry, which is somewhat shaky, but if we want to support hive all the way back to 0.10, we will have to live with some sub-optimal solutions. I am going to implement that and then you can give it a few spins in your environment. |
I did some more research and since 0.13 the hive metastore has a locking API, which would solve all those problems: https://issues.apache.org/jira/browse/HIVE-5843 Since you guys are still on 0.10 and you are a big user of cascading-hive, I don't want to drop support for that just yet, but we should consider moving away from such old versions to get the new lock support and make this less brittle. |
Agreeded. We are hoping to move to CDH5.2 late this year or early next year which will have Hive 0.13. |
I'd like to open this discussion again. Now that Hive 1.0.0 is out, we should start thinking about dropping support for the ancient Hive versions w/o lock support. Would that work for you guys? |
stale issue. closing. If still a problem, please re-open. |
As part of our testing we are writing to two HivePartitionTaps in parallel locally. This will fail with the stack trace below because it will try to create the same database in parallel.
Our current fix is to catch that particular exception around the
createDatabase
call. We also get task failures https://github.com/CommBank/ebenezer/issues/34 caused by a similar issue. I don't really have a good fix for this problem.We are probably going to wrap the createDatabase call in the following try catch block to enable our tests to work:
The stack trace:
The text was updated successfully, but these errors were encountered: