-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporarily cache existence of recovery wals #1462
Conversation
For the case where a lot of tablet servers died, the master was frequently checking for the existence of a recovery log for each tablet. Its very likely that many tablets point to the same recovery logs and the existence checks are redundant. This patch caches the result of existence checks for a short period.
I have not tested this yet. Plan to locally create a table with 10K to 100K tablets, kill the tserver and examine timing w/ and w/o this change. |
server/master/src/main/java/org/apache/accumulo/master/Master.java
Outdated
Show resolved
Hide resolved
server/master/src/main/java/org/apache/accumulo/master/recovery/RecoveryManager.java
Outdated
Show resolved
Hide resolved
server/master/src/main/java/org/apache/accumulo/master/recovery/RecoveryManager.java
Show resolved
Hide resolved
Do you think this is low-risk enough to put in 1.9? Or should this behavior change only occur in 2.1? |
There is certainly the risk that this changes introduces new bugs. However the problem its trying to address is severe enough that I think its worth the risk. The more reviews the better, its a small change and easy to review. |
server/master/src/main/java/org/apache/accumulo/master/recovery/RecoveryManager.java
Outdated
Show resolved
Hide resolved
I ran some performance test using the commands below. These were run on single VM with 16GB of ram and 10 tservers ( I need to run them again w/ more resources, I think the box was stressed). # create 100K random 8 char split points
$ cat /dev/urandom | base32 -w 8 | head -100000 | sort -u > splits.txt
# create a table and split it
$ accumulo shell -u root -p secret
root@uno> createtable foo
root@uno foo> addsplits -t foo -sf splits.txt
root@uno> quit
# after tablets balance, kill all tservers
$ pkill -f tserver
# after waiting a bit, restart tservers
$ tup.sh
# I periodically ran the following command to see how many tablets were assigned but not loaded
$ accumulo shell -u root -p secret -e 'scan -t accumulo.metadata -c future -np' | grep future | wc
# I periodically ran the following command to see what the user tablet assignment thread is up to
$ jstack <master pid> | grep -A 30 'Watching Normal' Running the test above using Accumulo 1.9.3. I saw the following in the master logs for the time period where it was doing log recovery. The logged scan time is how long it took to scan all metadata tablets and make assignment decisions.
With this change applied to 1.9.4-SNAPSHOT I ran the following test and saw the following results. Seeing a speed a improvement. Looking at the logs, I think the part of the 78 seconds in the first scan was spent writing future and/or suspend locations to the metadata table. Because it scan more quickly, its able to complete more scans.
The following shows the progression of tablets (I didn't look at this info for 1.9.3, would have been useful) under log recovery. The count of tablets ASSIGNED are the number of tablets awaiting log recovery on a tserver.
I would like to repeat this test on better hardware and explore more where the master spends time during recovery. Watching the assignment thread in the master (using
When this bug was seen on a cluster, jstack mostly showed it doing existence checks in HDFS. Did not see those with this change. Because my VM was under such stress, I did not add data to tables. I attempted to do this with the following commands, but abandoned that effort. Want to do this on the next round. # create 1000 random entries an insert them into accumulo
$ echo table foo > inserts.txt
$ cat /dev/urandom | base32 -w 16 | head -1000 | xargs printf 'insert %s f q v\n' >> inserts.txt
$ cat inserts.txt | accumulo shell -u root -p secret
# could run the following command before and after log recovery to verify data
accumulo shell -u root -p secret -e 'scan -t foo -np' | grep 'f:q' | sha1sum |
This reverts commit 0c47869.
For the case where a lot of tablet servers died, the master was
frequently checking for the existence of a recovery log for each tablet.
Its very likely that many tablets point to the same recovery logs and
the existence checks are redundant. This patch caches the result of
existence checks for a short period.