Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Bulk import process can result in a lost file #800
In the 1.8, 1.9 line of code (unsure about 2.0), we can get into a situation where a major compaction starts failing because of a missing file. The scenario is as follows:
master asked to import directory
There are mechanism in Accumulo that are supposed to prevent this. Not sure where those are falling down. I am going to create an IT that attempts to reproduce this. I plan to do this by making the IT directly call the RPC to load a file. If that does not yield anything then I will review the code paths for preventing multiple loads.
Looking at the code I found a potential problem. Each bulk import has a unique fate transaction id stored in zookeeper. There is code that prevents RPCs related to the bulk import from running if the fate transaction id is deleted from ZK. Also there is code to wait for all active RPCs to finish. So the bulk import FATE op will delete the id from ZK and then wait for all tservers to complete any RPCs that were active before it was deleted.
The bulk import FATE op makes RPCs to intermediate tserver that inspect files. Once the intermediate tserver determines where a file goes then makes an RPC to tserver that should load the file. The problem is that only the intermediate RPC is checking the if the transaction id is active. Really the final RPC should be doing this check.
Below are some places in the code where this is all happening.
I think the intermediate RPC should stop using TransactionWatcher and the final RPC should start using TransactionWatcher. I don't think there is a benefit to the intermediate one using it and it puts more load on ZK.