Temporarily cache existence of recovery wals #1462

keith-turner · 2019-12-17T22:16:42Z

For the case where a lot of tablet servers died, the master was
frequently checking for the existence of a recovery log for each tablet.
Its very likely that many tablets point to the same recovery logs and
the existence checks are redundant. This patch caches the result of
existence checks for a short period.

For the case where a lot of tablet servers died, the master was frequently checking for the existence of a recovery log for each tablet. Its very likely that many tablets point to the same recovery logs and the existence checks are redundant. This patch caches the result of existence checks for a short period.

keith-turner · 2019-12-17T22:17:36Z

I have not tested this yet. Plan to locally create a table with 10K to 100K tablets, kill the tserver and examine timing w/ and w/o this change.

server/master/src/main/java/org/apache/accumulo/master/Master.java

server/master/src/main/java/org/apache/accumulo/master/recovery/RecoveryManager.java

ctubbsii · 2019-12-18T01:06:40Z

Do you think this is low-risk enough to put in 1.9? Or should this behavior change only occur in 2.1?

keith-turner · 2019-12-18T15:10:10Z

Do you think this is low-risk enough to put in 1.9? Or should this behavior change only occur in 2.1?

There is certainly the risk that this changes introduces new bugs. However the problem its trying to address is severe enough that I think its worth the risk. The more reviews the better, its a small change and easy to review.

server/master/src/main/java/org/apache/accumulo/master/recovery/RecoveryManager.java

keith-turner · 2019-12-18T19:50:47Z

I ran some performance test using the commands below. These were run on single VM with 16GB of ram and 10 tservers ( I need to run them again w/ more resources, I think the box was stressed).

# create 100K random 8 char split points
$ cat /dev/urandom | base32 -w 8 | head -100000 | sort -u > splits.txt

# create a table and split it
$ accumulo shell -u root -p secret
root@uno> createtable foo
root@uno foo> addsplits -t foo -sf splits.txt
root@uno> quit

# after tablets balance, kill all tservers
$ pkill -f tserver

# after waiting a bit, restart tservers
$ tup.sh

# I periodically ran the following command to see how many tablets were assigned but not loaded
$ accumulo shell -u root -p secret -e 'scan -t accumulo.metadata -c future -np' | grep future | wc

# I periodically ran the following command to see what the user tablet assignment thread is up to
$ jstack <master pid> | grep -A 30 'Watching Normal'

Running the test above using Accumulo 1.9.3. I saw the following in the master logs for the time period where it was doing log recovery. The logged scan time is how long it took to scan all metadata tablets and make assignment decisions.

$ grep -a Normal master_localhost.debug.log | grep 'scan time'
2019-12-18 11:29:11,955 [master.Master] DEBUG: [Normal Tablets]: scan time 90.27 seconds
2019-12-18 11:30:22,197 [master.Master] DEBUG: [Normal Tablets]: scan time 70.14 seconds
2019-12-18 11:31:19,081 [master.Master] DEBUG: [Normal Tablets]: scan time 56.78 seconds
2019-12-18 11:32:11,901 [master.Master] DEBUG: [Normal Tablets]: scan time 52.71 seconds
2019-12-18 11:32:57,912 [master.Master] DEBUG: [Normal Tablets]: scan time 45.91 seconds
2019-12-18 11:33:41,018 [master.Master] DEBUG: [Normal Tablets]: scan time 43.00 seconds
2019-12-18 11:34:22,566 [master.Master] DEBUG: [Normal Tablets]: scan time 41.43 seconds
2019-12-18 11:34:55,499 [master.Master] DEBUG: [Normal Tablets]: scan time 32.83 seconds
2019-12-18 11:35:24,624 [master.Master] DEBUG: [Normal Tablets]: scan time 29.02 seconds
2019-12-18 11:35:46,747 [master.Master] DEBUG: [Normal Tablets]: scan time 22.02 seconds
2019-12-18 11:36:07,784 [master.Master] DEBUG: [Normal Tablets]: scan time 20.93 seconds
2019-12-18 11:36:25,454 [master.Master] DEBUG: [Normal Tablets]: scan time 17.56 seconds
2019-12-18 11:36:39,916 [master.Master] DEBUG: [Normal Tablets]: scan time 14.36 seconds
2019-12-18 11:36:54,231 [master.Master] DEBUG: [Normal Tablets]: scan time 14.21 seconds
2019-12-18 11:37:06,020 [master.Master] DEBUG: [Normal Tablets]: scan time 11.68 seconds
2019-12-18 11:37:12,430 [master.Master] DEBUG: [Normal Tablets]: scan time 6.30 seconds
2019-12-18 11:37:17,343 [master.Master] DEBUG: [Normal Tablets]: scan time 4.81 seconds
2019-12-18 11:37:20,489 [master.Master] DEBUG: [Normal Tablets]: scan time 3.05 seconds
2019-12-18 11:37:21,963 [master.Master] DEBUG: [Normal Tablets]: scan time 1.37 seconds
2019-12-18 11:37:23,460 [master.Master] DEBUG: [Normal Tablets]: scan time 1.39 seconds
2019-12-18 11:38:24,382 [master.Master] DEBUG: [Normal Tablets]: scan time 0.82 seconds

With this change applied to 1.9.4-SNAPSHOT I ran the following test and saw the following results. Seeing a speed a improvement. Looking at the logs, I think the part of the 78 seconds in the first scan was spent writing future and/or suspend locations to the metadata table. Because it scan more quickly, its able to complete more scans.

$ grep -a Normal master_localhost.debug.log | grep 'scan time'
2019-12-18 13:26:43,564 [master.Master] DEBUG: [Normal Tablets]: scan time 78.00 seconds
2019-12-18 13:27:22,673 [master.Master] DEBUG: [Normal Tablets]: scan time 39.00 seconds
2019-12-18 13:27:44,160 [master.Master] DEBUG: [Normal Tablets]: scan time 21.38 seconds
2019-12-18 13:28:04,659 [master.Master] DEBUG: [Normal Tablets]: scan time 20.39 seconds
2019-12-18 13:28:21,197 [master.Master] DEBUG: [Normal Tablets]: scan time 16.43 seconds
2019-12-18 13:28:40,954 [master.Master] DEBUG: [Normal Tablets]: scan time 19.65 seconds
2019-12-18 13:28:59,316 [master.Master] DEBUG: [Normal Tablets]: scan time 18.25 seconds
2019-12-18 13:29:16,469 [master.Master] DEBUG: [Normal Tablets]: scan time 17.05 seconds
2019-12-18 13:29:42,952 [master.Master] DEBUG: [Normal Tablets]: scan time 26.37 seconds
2019-12-18 13:29:58,854 [master.Master] DEBUG: [Normal Tablets]: scan time 15.80 seconds
2019-12-18 13:30:13,870 [master.Master] DEBUG: [Normal Tablets]: scan time 14.91 seconds
2019-12-18 13:30:37,890 [master.Master] DEBUG: [Normal Tablets]: scan time 23.92 seconds
2019-12-18 13:30:55,565 [master.Master] DEBUG: [Normal Tablets]: scan time 17.57 seconds
2019-12-18 13:31:11,776 [master.Master] DEBUG: [Normal Tablets]: scan time 16.11 seconds
2019-12-18 13:31:29,372 [master.Master] DEBUG: [Normal Tablets]: scan time 17.49 seconds
2019-12-18 13:31:45,990 [master.Master] DEBUG: [Normal Tablets]: scan time 16.51 seconds
2019-12-18 13:32:00,815 [master.Master] DEBUG: [Normal Tablets]: scan time 14.72 seconds
2019-12-18 13:32:12,827 [master.Master] DEBUG: [Normal Tablets]: scan time 11.91 seconds
2019-12-18 13:32:23,969 [master.Master] DEBUG: [Normal Tablets]: scan time 11.04 seconds
2019-12-18 13:32:35,357 [master.Master] DEBUG: [Normal Tablets]: scan time 11.28 seconds
2019-12-18 13:32:49,911 [master.Master] DEBUG: [Normal Tablets]: scan time 14.40 seconds
2019-12-18 13:33:02,303 [master.Master] DEBUG: [Normal Tablets]: scan time 12.28 seconds
2019-12-18 13:33:11,660 [master.Master] DEBUG: [Normal Tablets]: scan time 9.26 seconds
2019-12-18 13:33:21,043 [master.Master] DEBUG: [Normal Tablets]: scan time 9.28 seconds
2019-12-18 13:33:30,695 [master.Master] DEBUG: [Normal Tablets]: scan time 9.51 seconds
2019-12-18 13:33:40,911 [master.Master] DEBUG: [Normal Tablets]: scan time 10.11 seconds
2019-12-18 13:33:49,516 [master.Master] DEBUG: [Normal Tablets]: scan time 8.50 seconds
2019-12-18 13:33:59,547 [master.Master] DEBUG: [Normal Tablets]: scan time 9.93 seconds
2019-12-18 13:34:09,308 [master.Master] DEBUG: [Normal Tablets]: scan time 9.65 seconds
2019-12-18 13:34:18,776 [master.Master] DEBUG: [Normal Tablets]: scan time 9.37 seconds
2019-12-18 13:34:26,681 [master.Master] DEBUG: [Normal Tablets]: scan time 7.80 seconds
2019-12-18 13:34:34,827 [master.Master] DEBUG: [Normal Tablets]: scan time 8.04 seconds
2019-12-18 13:34:43,155 [master.Master] DEBUG: [Normal Tablets]: scan time 8.22 seconds
2019-12-18 13:34:50,803 [master.Master] DEBUG: [Normal Tablets]: scan time 7.55 seconds
2019-12-18 13:34:57,880 [master.Master] DEBUG: [Normal Tablets]: scan time 6.97 seconds
2019-12-18 13:35:05,446 [master.Master] DEBUG: [Normal Tablets]: scan time 7.46 seconds
2019-12-18 13:35:12,476 [master.Master] DEBUG: [Normal Tablets]: scan time 6.93 seconds
2019-12-18 13:35:19,125 [master.Master] DEBUG: [Normal Tablets]: scan time 6.54 seconds
2019-12-18 13:35:25,909 [master.Master] DEBUG: [Normal Tablets]: scan time 6.66 seconds
2019-12-18 13:35:31,793 [master.Master] DEBUG: [Normal Tablets]: scan time 5.75 seconds
2019-12-18 13:35:37,770 [master.Master] DEBUG: [Normal Tablets]: scan time 5.87 seconds
2019-12-18 13:35:41,815 [master.Master] DEBUG: [Normal Tablets]: scan time 3.94 seconds
2019-12-18 13:35:43,774 [master.Master] DEBUG: [Normal Tablets]: scan time 1.86 seconds
2019-12-18 13:35:45,194 [master.Master] DEBUG: [Normal Tablets]: scan time 1.32 seconds
2019-12-18 13:36:46,198 [master.Master] DEBUG: [Normal Tablets]: scan time 0.90 seconds

The following shows the progression of tablets (I didn't look at this info for 1.9.3, would have been useful) under log recovery. The count of tablets ASSIGNED are the number of tablets awaiting log recovery on a tserver.

$ grep -a 'Normal' master_localhost.debug.log | grep 'tablets are'
2019-12-18 13:25:25,454 [master.EventCoordinator] INFO : [Normal Tablets]: 100002 tablets are ASSIGNED_TO_DEAD_SERVER
2019-12-18 13:26:43,563 [master.EventCoordinator] INFO : [Normal Tablets]: 100002 tablets are SUSPENDED
2019-12-18 13:27:22,673 [master.EventCoordinator] INFO : [Normal Tablets]: 94646 tablets are ASSIGNED
2019-12-18 13:27:44,160 [master.EventCoordinator] INFO : [Normal Tablets]: 89778 tablets are ASSIGNED
2019-12-18 13:28:04,659 [master.EventCoordinator] INFO : [Normal Tablets]: 85679 tablets are ASSIGNED
2019-12-18 13:28:21,197 [master.EventCoordinator] INFO : [Normal Tablets]: 81756 tablets are ASSIGNED
2019-12-18 13:28:40,954 [master.EventCoordinator] INFO : [Normal Tablets]: 78558 tablets are ASSIGNED
2019-12-18 13:28:59,316 [master.EventCoordinator] INFO : [Normal Tablets]: 74943 tablets are ASSIGNED
2019-12-18 13:29:16,469 [master.EventCoordinator] INFO : [Normal Tablets]: 71495 tablets are ASSIGNED
2019-12-18 13:29:42,951 [master.EventCoordinator] INFO : [Normal Tablets]: 68224 tablets are ASSIGNED
2019-12-18 13:29:58,854 [master.EventCoordinator] INFO : [Normal Tablets]: 64142 tablets are ASSIGNED
2019-12-18 13:30:13,870 [master.EventCoordinator] INFO : [Normal Tablets]: 60757 tablets are ASSIGNED
2019-12-18 13:30:37,888 [master.EventCoordinator] INFO : [Normal Tablets]: 57529 tablets are ASSIGNED
2019-12-18 13:30:55,565 [master.EventCoordinator] INFO : [Normal Tablets]: 53843 tablets are ASSIGNED
2019-12-18 13:31:11,776 [master.EventCoordinator] INFO : [Normal Tablets]: 51577 tablets are ASSIGNED
2019-12-18 13:31:29,365 [master.EventCoordinator] INFO : [Normal Tablets]: 48807 tablets are ASSIGNED
2019-12-18 13:31:45,989 [master.EventCoordinator] INFO : [Normal Tablets]: 46176 tablets are ASSIGNED
2019-12-18 13:32:00,815 [master.EventCoordinator] INFO : [Normal Tablets]: 43651 tablets are ASSIGNED
2019-12-18 13:32:12,827 [master.EventCoordinator] INFO : [Normal Tablets]: 41391 tablets are ASSIGNED
2019-12-18 13:32:23,969 [master.EventCoordinator] INFO : [Normal Tablets]: 39225 tablets are ASSIGNED
2019-12-18 13:32:35,354 [master.EventCoordinator] INFO : [Normal Tablets]: 37255 tablets are ASSIGNED
2019-12-18 13:32:49,911 [master.EventCoordinator] INFO : [Normal Tablets]: 35103 tablets are ASSIGNED
2019-12-18 13:33:02,303 [master.EventCoordinator] INFO : [Normal Tablets]: 32662 tablets are ASSIGNED
2019-12-18 13:33:11,660 [master.EventCoordinator] INFO : [Normal Tablets]: 30964 tablets are ASSIGNED
2019-12-18 13:33:21,043 [master.EventCoordinator] INFO : [Normal Tablets]: 28915 tablets are ASSIGNED
2019-12-18 13:33:30,695 [master.EventCoordinator] INFO : [Normal Tablets]: 27149 tablets are ASSIGNED
2019-12-18 13:33:40,910 [master.EventCoordinator] INFO : [Normal Tablets]: 25228 tablets are ASSIGNED
2019-12-18 13:33:49,516 [master.EventCoordinator] INFO : [Normal Tablets]: 23434 tablets are ASSIGNED
2019-12-18 13:33:59,547 [master.EventCoordinator] INFO : [Normal Tablets]: 21655 tablets are ASSIGNED
2019-12-18 13:34:09,308 [master.EventCoordinator] INFO : [Normal Tablets]: 19822 tablets are ASSIGNED
2019-12-18 13:34:18,776 [master.EventCoordinator] INFO : [Normal Tablets]: 18168 tablets are ASSIGNED
2019-12-18 13:34:26,680 [master.EventCoordinator] INFO : [Normal Tablets]: 16561 tablets are ASSIGNED
2019-12-18 13:34:34,827 [master.EventCoordinator] INFO : [Normal Tablets]: 14927 tablets are ASSIGNED
2019-12-18 13:34:43,155 [master.EventCoordinator] INFO : [Normal Tablets]: 13438 tablets are ASSIGNED
2019-12-18 13:34:50,803 [master.EventCoordinator] INFO : [Normal Tablets]: 11763 tablets are ASSIGNED
2019-12-18 13:34:57,880 [master.EventCoordinator] INFO : [Normal Tablets]: 10240 tablets are ASSIGNED
2019-12-18 13:35:05,445 [master.EventCoordinator] INFO : [Normal Tablets]: 8774 tablets are ASSIGNED
2019-12-18 13:35:12,476 [master.EventCoordinator] INFO : [Normal Tablets]: 7218 tablets are ASSIGNED
2019-12-18 13:35:19,124 [master.EventCoordinator] INFO : [Normal Tablets]: 5830 tablets are ASSIGNED
2019-12-18 13:35:25,909 [master.EventCoordinator] INFO : [Normal Tablets]: 4352 tablets are ASSIGNED
2019-12-18 13:35:31,770 [master.EventCoordinator] INFO : [Normal Tablets]: 2848 tablets are ASSIGNED
2019-12-18 13:35:37,770 [master.EventCoordinator] INFO : [Normal Tablets]: 1775 tablets are ASSIGNED
2019-12-18 13:35:41,815 [master.EventCoordinator] INFO : [Normal Tablets]: 966 tablets are ASSIGNED
2019-12-18 13:35:43,774 [master.EventCoordinator] INFO : [Normal Tablets]: 114 tablets are ASSIGNED

I would like to repeat this test on better hardware and explore more where the master spends time during recovery. Watching the assignment thread in the master (using jstack <master pid> | grep -A 30 'Watching Normal') I observed it spending time in the following places.

sending load tablet messages
logging debug message about tablet state (like 2019-12-18 13:26:45,819 [master.Master] DEBUG: Normal Tablets location State: 2;3C44AJIF;3C3KH4MT@(localhost:10006[10001b787ff0019],null,null))
reading from metadata table

When this bug was seen on a cluster, jstack mostly showed it doing existence checks in HDFS. Did not see those with this change.

Because my VM was under such stress, I did not add data to tables. I attempted to do this with the following commands, but abandoned that effort. Want to do this on the next round.

# create 1000 random entries an insert them into accumulo
$ echo table foo > inserts.txt
$ cat /dev/urandom | base32 -w 16 | head -1000 | xargs printf 'insert %s f q v\n' >> inserts.txt
$ cat inserts.txt | accumulo shell -u root -p secret

# could run the following command before and after log recovery to verify data
accumulo shell -u root -p secret -e 'scan -t foo -np' | grep 'f:q' | sha1sum

This reverts commit 0c47869.

ctubbsii reviewed Dec 17, 2019

View reviewed changes

keith-turner added 2 commits December 18, 2019 12:06

adjust cache time

9f632ca

Remove useless log message

0c47869

EdColeman reviewed Dec 18, 2019

View reviewed changes

server/master/src/main/java/org/apache/accumulo/master/recovery/RecoveryManager.java Outdated Show resolved Hide resolved

keith-turner added 2 commits January 2, 2020 13:12

code review updates

7ad5256

Revert "Remove useless log message"

9f463af

This reverts commit 0c47869.

keith-turner merged commit 828a321 into apache:1.9 Jan 2, 2020

keith-turner added a commit that referenced this pull request Jan 2, 2020

Use lambdas in WAL existence cache (#1462)

01b8f29

keith-turner deleted the cache-wal-exists branch January 2, 2020 21:03

keith-turner added a commit that referenced this pull request Jan 8, 2020

remove useless log message (#1462)

728ae6d

ctubbsii mentioned this pull request Jan 28, 2020

Broken SplitRecoveryIT #1486

Closed

milleruntime mentioned this pull request Feb 25, 2021

Improve recovery by optimizing tserver WAL processing #1666

Closed

ctubbsii added this to the 1.10.0 milestone Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporarily cache existence of recovery wals #1462

Temporarily cache existence of recovery wals #1462

keith-turner commented Dec 17, 2019

keith-turner commented Dec 17, 2019

ctubbsii commented Dec 18, 2019

keith-turner commented Dec 18, 2019

keith-turner commented Dec 18, 2019 •

edited

Loading

Temporarily cache existence of recovery wals #1462

Temporarily cache existence of recovery wals #1462

Conversation

keith-turner commented Dec 17, 2019

keith-turner commented Dec 17, 2019

ctubbsii commented Dec 18, 2019

keith-turner commented Dec 18, 2019

keith-turner commented Dec 18, 2019 • edited Loading

keith-turner commented Dec 18, 2019 •

edited

Loading