Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #558 use copy to avoid deadlock in tserver #559

Merged
merged 4 commits into from Jul 13, 2018

Conversation

Projects
None yet
3 participants
@keith-turner
Copy link
Contributor

keith-turner commented Jul 13, 2018

No description provided.

@keith-turner keith-turner changed the base branch from master to 1.9 Jul 13, 2018

keith-turner added some commits Jul 13, 2018

}

Set<String> beginClearingUnusedLogs() {
Set<String> doomed = new HashSet<>();

This comment has been minimized.

@ctubbsii

ctubbsii Jul 13, 2018

Member

I know I raised this before, but hopefully I can make my objection more clear this time around:

"Doomed" is an extremely ambiguous concept (it conveys no information about the end state, only that it will be "unfortunate", whatever that means in any given context). Using it here, especially in already error prone WAL code, introduces unnecessary confusion to subsequent contributors/debuggers. It should either be clearly documented when it is declared, with a direct, complete, and unambiguous answer to the question "What does it mean to be 'doomed' here?" or avoided entirely. I prefer to avoid: If we mean "logs to clear" then let's just say what we mean and call the variable "logsToClear". If we mean something else, then let's use similarly simple language to convey what we mean.

This comment has been minimized.

@keith-turner

keith-turner Jul 13, 2018

Author Contributor

I reverted changes to this code since 1.9.1 in order to minimize what I had to consider. In this process I lost the doomed comment. I can make the variable name more clear.

@keith-turner

This comment has been minimized.

Copy link
Contributor Author

keith-turner commented Jul 13, 2018

It would be useful if someone else could double check my anaylsis of otherLogs and currentLogs. In Eclipse I looked for any place one of the sets was mutated. When mutated I rebuilt the immutable copy. If someone else could pull this patch into an IDE and make sure I didn't miss any mutations of the sets I would appreciate it.

@keith-turner

This comment has been minimized.

Copy link
Contributor Author

keith-turner commented Jul 13, 2018

All ITs passed for merge of ec3b7ed into 42a3534

@mikewalch
Copy link
Member

mikewalch left a comment

LGTM

@keith-turner

This comment has been minimized.

Copy link
Contributor Author

keith-turner commented Jul 13, 2018

I ran a ~1hr CI test w/ agitation against this branch and it went well. I can run longer test w/ and w/o agitation against RC2 starting Mon.

		REFERENCED=648712264
		UNREFERENCED=8000024

@keith-turner keith-turner merged commit 8519514 into apache:1.9 Jul 13, 2018

2 checks passed

Jenkins This pull request looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@keith-turner

This comment has been minimized.

Copy link
Contributor Author

keith-turner commented Jul 18, 2018

This is the proper fix to the issue described in #539. Also this change may fix a deadlock introduced in #441 / #443.

@keith-turner

This comment has been minimized.

Copy link
Contributor Author

keith-turner commented Jul 19, 2018

While looking into creating the release notes, I was getting confused trying to unravel and understand the #441 and #539 bugs fixes. Looking back, I have it all sorted out in my mind now. I am writing up a summary of my understanding in this comment.

For each tablet there are three different sets of active WALogs. While a WALog exist in anyone of these sets it must not be garbage collected. The following are the three sets.

  1. WALogs related to the tablets active in memory map
  2. WAlogs related to the tablets inactive in memory map that is in the process of being minor compacted
  3. WAlogs related to a newly minor compacted file that is in the process of being added to the tablets row in the metadata table.

From Accumulo [1.8.0,1.9.0] only set 1 was consulted to determine what WALogs were active for a tablet. In #443 the code was changed to also consider set 2, however this may have introduced a possible deadlock in the code. So 1.9.1 consults sets 1 and 2, but may have a deadlock. This PR also considers set 3 and fixes the deadlock. So 1.9.2 consults sets 1,2,and 3 w/o having a deadlock.

keith-turner added a commit to keith-turner/accumulo that referenced this pull request Jul 19, 2018

keith-turner added a commit that referenced this pull request Jul 19, 2018

@keith-turner keith-turner deleted the keith-turner:accumulo-558 branch Dec 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.