Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily cache existence of recovery wals #1462

Merged
merged 5 commits into from
Jan 2, 2020

Conversation

keith-turner
Copy link
Contributor

For the case where a lot of tablet servers died, the master was
frequently checking for the existence of a recovery log for each tablet.
Its very likely that many tablets point to the same recovery logs and
the existence checks are redundant. This patch caches the result of
existence checks for a short period.

For the case where a lot of tablet servers died, the master was
frequently checking for the existence of a recovery log for each tablet.
Its very likely that many tablets point to the same recovery logs and
the existence checks are redundant.  This patch caches the result of
existence checks for a short period.
@keith-turner
Copy link
Contributor Author

I have not tested this yet. Plan to locally create a table with 10K to 100K tablets, kill the tserver and examine timing w/ and w/o this change.

@ctubbsii
Copy link
Member

Do you think this is low-risk enough to put in 1.9? Or should this behavior change only occur in 2.1?

@keith-turner
Copy link
Contributor Author

Do you think this is low-risk enough to put in 1.9? Or should this behavior change only occur in 2.1?

There is certainly the risk that this changes introduces new bugs. However the problem its trying to address is severe enough that I think its worth the risk. The more reviews the better, its a small change and easy to review.

@keith-turner
Copy link
Contributor Author

keith-turner commented Dec 18, 2019

I ran some performance test using the commands below. These were run on single VM with 16GB of ram and 10 tservers ( I need to run them again w/ more resources, I think the box was stressed).

# create 100K random 8 char split points
$ cat /dev/urandom | base32 -w 8 | head -100000 | sort -u > splits.txt

# create a table and split it
$ accumulo shell -u root -p secret
root@uno> createtable foo
root@uno foo> addsplits -t foo -sf splits.txt
root@uno> quit

# after tablets balance, kill all tservers
$ pkill -f tserver

# after waiting a bit, restart tservers
$ tup.sh

# I periodically ran the following command to see how many tablets were assigned but not loaded
$ accumulo shell -u root -p secret -e 'scan -t accumulo.metadata -c future -np' | grep future | wc

# I periodically ran the following command to see what the user tablet assignment thread is up to
$ jstack <master pid> | grep -A 30 'Watching Normal'

Running the test above using Accumulo 1.9.3. I saw the following in the master logs for the time period where it was doing log recovery. The logged scan time is how long it took to scan all metadata tablets and make assignment decisions.

$ grep -a Normal master_localhost.debug.log | grep 'scan time'
2019-12-18 11:29:11,955 [master.Master] DEBUG: [Normal Tablets]: scan time 90.27 seconds
2019-12-18 11:30:22,197 [master.Master] DEBUG: [Normal Tablets]: scan time 70.14 seconds
2019-12-18 11:31:19,081 [master.Master] DEBUG: [Normal Tablets]: scan time 56.78 seconds
2019-12-18 11:32:11,901 [master.Master] DEBUG: [Normal Tablets]: scan time 52.71 seconds
2019-12-18 11:32:57,912 [master.Master] DEBUG: [Normal Tablets]: scan time 45.91 seconds
2019-12-18 11:33:41,018 [master.Master] DEBUG: [Normal Tablets]: scan time 43.00 seconds
2019-12-18 11:34:22,566 [master.Master] DEBUG: [Normal Tablets]: scan time 41.43 seconds
2019-12-18 11:34:55,499 [master.Master] DEBUG: [Normal Tablets]: scan time 32.83 seconds
2019-12-18 11:35:24,624 [master.Master] DEBUG: [Normal Tablets]: scan time 29.02 seconds
2019-12-18 11:35:46,747 [master.Master] DEBUG: [Normal Tablets]: scan time 22.02 seconds
2019-12-18 11:36:07,784 [master.Master] DEBUG: [Normal Tablets]: scan time 20.93 seconds
2019-12-18 11:36:25,454 [master.Master] DEBUG: [Normal Tablets]: scan time 17.56 seconds
2019-12-18 11:36:39,916 [master.Master] DEBUG: [Normal Tablets]: scan time 14.36 seconds
2019-12-18 11:36:54,231 [master.Master] DEBUG: [Normal Tablets]: scan time 14.21 seconds
2019-12-18 11:37:06,020 [master.Master] DEBUG: [Normal Tablets]: scan time 11.68 seconds
2019-12-18 11:37:12,430 [master.Master] DEBUG: [Normal Tablets]: scan time 6.30 seconds
2019-12-18 11:37:17,343 [master.Master] DEBUG: [Normal Tablets]: scan time 4.81 seconds
2019-12-18 11:37:20,489 [master.Master] DEBUG: [Normal Tablets]: scan time 3.05 seconds
2019-12-18 11:37:21,963 [master.Master] DEBUG: [Normal Tablets]: scan time 1.37 seconds
2019-12-18 11:37:23,460 [master.Master] DEBUG: [Normal Tablets]: scan time 1.39 seconds
2019-12-18 11:38:24,382 [master.Master] DEBUG: [Normal Tablets]: scan time 0.82 seconds

With this change applied to 1.9.4-SNAPSHOT I ran the following test and saw the following results. Seeing a speed a improvement. Looking at the logs, I think the part of the 78 seconds in the first scan was spent writing future and/or suspend locations to the metadata table. Because it scan more quickly, its able to complete more scans.

$ grep -a Normal master_localhost.debug.log | grep 'scan time'
2019-12-18 13:26:43,564 [master.Master] DEBUG: [Normal Tablets]: scan time 78.00 seconds
2019-12-18 13:27:22,673 [master.Master] DEBUG: [Normal Tablets]: scan time 39.00 seconds
2019-12-18 13:27:44,160 [master.Master] DEBUG: [Normal Tablets]: scan time 21.38 seconds
2019-12-18 13:28:04,659 [master.Master] DEBUG: [Normal Tablets]: scan time 20.39 seconds
2019-12-18 13:28:21,197 [master.Master] DEBUG: [Normal Tablets]: scan time 16.43 seconds
2019-12-18 13:28:40,954 [master.Master] DEBUG: [Normal Tablets]: scan time 19.65 seconds
2019-12-18 13:28:59,316 [master.Master] DEBUG: [Normal Tablets]: scan time 18.25 seconds
2019-12-18 13:29:16,469 [master.Master] DEBUG: [Normal Tablets]: scan time 17.05 seconds
2019-12-18 13:29:42,952 [master.Master] DEBUG: [Normal Tablets]: scan time 26.37 seconds
2019-12-18 13:29:58,854 [master.Master] DEBUG: [Normal Tablets]: scan time 15.80 seconds
2019-12-18 13:30:13,870 [master.Master] DEBUG: [Normal Tablets]: scan time 14.91 seconds
2019-12-18 13:30:37,890 [master.Master] DEBUG: [Normal Tablets]: scan time 23.92 seconds
2019-12-18 13:30:55,565 [master.Master] DEBUG: [Normal Tablets]: scan time 17.57 seconds
2019-12-18 13:31:11,776 [master.Master] DEBUG: [Normal Tablets]: scan time 16.11 seconds
2019-12-18 13:31:29,372 [master.Master] DEBUG: [Normal Tablets]: scan time 17.49 seconds
2019-12-18 13:31:45,990 [master.Master] DEBUG: [Normal Tablets]: scan time 16.51 seconds
2019-12-18 13:32:00,815 [master.Master] DEBUG: [Normal Tablets]: scan time 14.72 seconds
2019-12-18 13:32:12,827 [master.Master] DEBUG: [Normal Tablets]: scan time 11.91 seconds
2019-12-18 13:32:23,969 [master.Master] DEBUG: [Normal Tablets]: scan time 11.04 seconds
2019-12-18 13:32:35,357 [master.Master] DEBUG: [Normal Tablets]: scan time 11.28 seconds
2019-12-18 13:32:49,911 [master.Master] DEBUG: [Normal Tablets]: scan time 14.40 seconds
2019-12-18 13:33:02,303 [master.Master] DEBUG: [Normal Tablets]: scan time 12.28 seconds
2019-12-18 13:33:11,660 [master.Master] DEBUG: [Normal Tablets]: scan time 9.26 seconds
2019-12-18 13:33:21,043 [master.Master] DEBUG: [Normal Tablets]: scan time 9.28 seconds
2019-12-18 13:33:30,695 [master.Master] DEBUG: [Normal Tablets]: scan time 9.51 seconds
2019-12-18 13:33:40,911 [master.Master] DEBUG: [Normal Tablets]: scan time 10.11 seconds
2019-12-18 13:33:49,516 [master.Master] DEBUG: [Normal Tablets]: scan time 8.50 seconds
2019-12-18 13:33:59,547 [master.Master] DEBUG: [Normal Tablets]: scan time 9.93 seconds
2019-12-18 13:34:09,308 [master.Master] DEBUG: [Normal Tablets]: scan time 9.65 seconds
2019-12-18 13:34:18,776 [master.Master] DEBUG: [Normal Tablets]: scan time 9.37 seconds
2019-12-18 13:34:26,681 [master.Master] DEBUG: [Normal Tablets]: scan time 7.80 seconds
2019-12-18 13:34:34,827 [master.Master] DEBUG: [Normal Tablets]: scan time 8.04 seconds
2019-12-18 13:34:43,155 [master.Master] DEBUG: [Normal Tablets]: scan time 8.22 seconds
2019-12-18 13:34:50,803 [master.Master] DEBUG: [Normal Tablets]: scan time 7.55 seconds
2019-12-18 13:34:57,880 [master.Master] DEBUG: [Normal Tablets]: scan time 6.97 seconds
2019-12-18 13:35:05,446 [master.Master] DEBUG: [Normal Tablets]: scan time 7.46 seconds
2019-12-18 13:35:12,476 [master.Master] DEBUG: [Normal Tablets]: scan time 6.93 seconds
2019-12-18 13:35:19,125 [master.Master] DEBUG: [Normal Tablets]: scan time 6.54 seconds
2019-12-18 13:35:25,909 [master.Master] DEBUG: [Normal Tablets]: scan time 6.66 seconds
2019-12-18 13:35:31,793 [master.Master] DEBUG: [Normal Tablets]: scan time 5.75 seconds
2019-12-18 13:35:37,770 [master.Master] DEBUG: [Normal Tablets]: scan time 5.87 seconds
2019-12-18 13:35:41,815 [master.Master] DEBUG: [Normal Tablets]: scan time 3.94 seconds
2019-12-18 13:35:43,774 [master.Master] DEBUG: [Normal Tablets]: scan time 1.86 seconds
2019-12-18 13:35:45,194 [master.Master] DEBUG: [Normal Tablets]: scan time 1.32 seconds
2019-12-18 13:36:46,198 [master.Master] DEBUG: [Normal Tablets]: scan time 0.90 seconds

The following shows the progression of tablets (I didn't look at this info for 1.9.3, would have been useful) under log recovery. The count of tablets ASSIGNED are the number of tablets awaiting log recovery on a tserver.

$ grep -a 'Normal' master_localhost.debug.log | grep 'tablets are'
2019-12-18 13:25:25,454 [master.EventCoordinator] INFO : [Normal Tablets]: 100002 tablets are ASSIGNED_TO_DEAD_SERVER
2019-12-18 13:26:43,563 [master.EventCoordinator] INFO : [Normal Tablets]: 100002 tablets are SUSPENDED
2019-12-18 13:27:22,673 [master.EventCoordinator] INFO : [Normal Tablets]: 94646 tablets are ASSIGNED
2019-12-18 13:27:44,160 [master.EventCoordinator] INFO : [Normal Tablets]: 89778 tablets are ASSIGNED
2019-12-18 13:28:04,659 [master.EventCoordinator] INFO : [Normal Tablets]: 85679 tablets are ASSIGNED
2019-12-18 13:28:21,197 [master.EventCoordinator] INFO : [Normal Tablets]: 81756 tablets are ASSIGNED
2019-12-18 13:28:40,954 [master.EventCoordinator] INFO : [Normal Tablets]: 78558 tablets are ASSIGNED
2019-12-18 13:28:59,316 [master.EventCoordinator] INFO : [Normal Tablets]: 74943 tablets are ASSIGNED
2019-12-18 13:29:16,469 [master.EventCoordinator] INFO : [Normal Tablets]: 71495 tablets are ASSIGNED
2019-12-18 13:29:42,951 [master.EventCoordinator] INFO : [Normal Tablets]: 68224 tablets are ASSIGNED
2019-12-18 13:29:58,854 [master.EventCoordinator] INFO : [Normal Tablets]: 64142 tablets are ASSIGNED
2019-12-18 13:30:13,870 [master.EventCoordinator] INFO : [Normal Tablets]: 60757 tablets are ASSIGNED
2019-12-18 13:30:37,888 [master.EventCoordinator] INFO : [Normal Tablets]: 57529 tablets are ASSIGNED
2019-12-18 13:30:55,565 [master.EventCoordinator] INFO : [Normal Tablets]: 53843 tablets are ASSIGNED
2019-12-18 13:31:11,776 [master.EventCoordinator] INFO : [Normal Tablets]: 51577 tablets are ASSIGNED
2019-12-18 13:31:29,365 [master.EventCoordinator] INFO : [Normal Tablets]: 48807 tablets are ASSIGNED
2019-12-18 13:31:45,989 [master.EventCoordinator] INFO : [Normal Tablets]: 46176 tablets are ASSIGNED
2019-12-18 13:32:00,815 [master.EventCoordinator] INFO : [Normal Tablets]: 43651 tablets are ASSIGNED
2019-12-18 13:32:12,827 [master.EventCoordinator] INFO : [Normal Tablets]: 41391 tablets are ASSIGNED
2019-12-18 13:32:23,969 [master.EventCoordinator] INFO : [Normal Tablets]: 39225 tablets are ASSIGNED
2019-12-18 13:32:35,354 [master.EventCoordinator] INFO : [Normal Tablets]: 37255 tablets are ASSIGNED
2019-12-18 13:32:49,911 [master.EventCoordinator] INFO : [Normal Tablets]: 35103 tablets are ASSIGNED
2019-12-18 13:33:02,303 [master.EventCoordinator] INFO : [Normal Tablets]: 32662 tablets are ASSIGNED
2019-12-18 13:33:11,660 [master.EventCoordinator] INFO : [Normal Tablets]: 30964 tablets are ASSIGNED
2019-12-18 13:33:21,043 [master.EventCoordinator] INFO : [Normal Tablets]: 28915 tablets are ASSIGNED
2019-12-18 13:33:30,695 [master.EventCoordinator] INFO : [Normal Tablets]: 27149 tablets are ASSIGNED
2019-12-18 13:33:40,910 [master.EventCoordinator] INFO : [Normal Tablets]: 25228 tablets are ASSIGNED
2019-12-18 13:33:49,516 [master.EventCoordinator] INFO : [Normal Tablets]: 23434 tablets are ASSIGNED
2019-12-18 13:33:59,547 [master.EventCoordinator] INFO : [Normal Tablets]: 21655 tablets are ASSIGNED
2019-12-18 13:34:09,308 [master.EventCoordinator] INFO : [Normal Tablets]: 19822 tablets are ASSIGNED
2019-12-18 13:34:18,776 [master.EventCoordinator] INFO : [Normal Tablets]: 18168 tablets are ASSIGNED
2019-12-18 13:34:26,680 [master.EventCoordinator] INFO : [Normal Tablets]: 16561 tablets are ASSIGNED
2019-12-18 13:34:34,827 [master.EventCoordinator] INFO : [Normal Tablets]: 14927 tablets are ASSIGNED
2019-12-18 13:34:43,155 [master.EventCoordinator] INFO : [Normal Tablets]: 13438 tablets are ASSIGNED
2019-12-18 13:34:50,803 [master.EventCoordinator] INFO : [Normal Tablets]: 11763 tablets are ASSIGNED
2019-12-18 13:34:57,880 [master.EventCoordinator] INFO : [Normal Tablets]: 10240 tablets are ASSIGNED
2019-12-18 13:35:05,445 [master.EventCoordinator] INFO : [Normal Tablets]: 8774 tablets are ASSIGNED
2019-12-18 13:35:12,476 [master.EventCoordinator] INFO : [Normal Tablets]: 7218 tablets are ASSIGNED
2019-12-18 13:35:19,124 [master.EventCoordinator] INFO : [Normal Tablets]: 5830 tablets are ASSIGNED
2019-12-18 13:35:25,909 [master.EventCoordinator] INFO : [Normal Tablets]: 4352 tablets are ASSIGNED
2019-12-18 13:35:31,770 [master.EventCoordinator] INFO : [Normal Tablets]: 2848 tablets are ASSIGNED
2019-12-18 13:35:37,770 [master.EventCoordinator] INFO : [Normal Tablets]: 1775 tablets are ASSIGNED
2019-12-18 13:35:41,815 [master.EventCoordinator] INFO : [Normal Tablets]: 966 tablets are ASSIGNED
2019-12-18 13:35:43,774 [master.EventCoordinator] INFO : [Normal Tablets]: 114 tablets are ASSIGNED

I would like to repeat this test on better hardware and explore more where the master spends time during recovery. Watching the assignment thread in the master (using jstack <master pid> | grep -A 30 'Watching Normal') I observed it spending time in the following places.

  • sending load tablet messages
  • logging debug message about tablet state (like 2019-12-18 13:26:45,819 [master.Master] DEBUG: Normal Tablets location State: 2;3C44AJIF;3C3KH4MT@(localhost:10006[10001b787ff0019],null,null))
  • reading from metadata table

When this bug was seen on a cluster, jstack mostly showed it doing existence checks in HDFS. Did not see those with this change.

Because my VM was under such stress, I did not add data to tables. I attempted to do this with the following commands, but abandoned that effort. Want to do this on the next round.

# create 1000 random entries an insert them into accumulo
$ echo table foo > inserts.txt
$ cat /dev/urandom | base32 -w 16 | head -1000 | xargs printf 'insert %s f q v\n' >> inserts.txt
$ cat inserts.txt | accumulo shell -u root -p secret

# could run the following command before and after log recovery to verify data
accumulo shell -u root -p secret -e 'scan -t foo -np' | grep 'f:q' | sha1sum

@keith-turner keith-turner merged commit 828a321 into apache:1.9 Jan 2, 2020
keith-turner added a commit that referenced this pull request Jan 2, 2020
@keith-turner keith-turner deleted the cache-wal-exists branch January 2, 2020 21:03
keith-turner added a commit that referenced this pull request Jan 8, 2020
@ctubbsii ctubbsii mentioned this pull request Jan 28, 2020
@ctubbsii ctubbsii added this to the 1.10.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants