Skip to content

Conversation

@jmark99
Copy link
Contributor

@jmark99 jmark99 commented Jul 26, 2021

Updated Garbage Collection code to no longer use a continue point when processing deletion candidates. The GC now uses an iterator that lasts during the lifetime of a GC cycle.

The GarbageCollectionTest was updated to work with the update, as was the GC integration test.

Closes #1351

Updated Garbage Collection code to no longer use a continue point when
processing deletion candidates. The GC  now uses an iterator that lasts
during the lifetime of a GC cycle.

The GarbageCollectionTest was updated to work with the update, as was
the GC integration test.

Closes apache#1351
@jmark99 jmark99 requested a review from keith-turner July 26, 2021 16:35
jmark99 added 2 commits July 26, 2021 14:18
Updated Garbage Collection code to no longer use a continue point when
processing deletion candidates. The GC  now uses an iterator that lasts
during the lifetime of a GC cycle.

The GarbageCollectionTest was updated to work with the update, as was
the GC integration test.

Closes apache#1351
Correcting some formatting issues.
Copy link
Contributor

@milleruntime milleruntime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes look good but I had to spend a good chunk of time learning how the GC currently uses the continue point. I would like to look at the changes closer and do some testing but I would be OK if you merged before I get the chance.

@jmark99
Copy link
Contributor Author

jmark99 commented Jul 26, 2021

Thanks for taking a look @milleruntime. Looking more closely is a good idea since we definitely want the GC to work as expected. I've run the unit and IT tests as well as the sunny profile. Additionally, I ran the GC test in the accumulo-testing repo, although I'd like to run it for a longer period of time. I may wait and see if @keith-turner would like to take a look before merging since he had created the initial ticket.

*/
boolean getCandidates(String continuePoint, List<String> candidates)
throws TableNotFoundException;
void processCandidates() throws TableNotFoundException, IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this method should be split up. Previously we had one method to gather the candidates and then another that processed them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the processCandidates method is only about three lines, not counting the loop, I'm not sure it is worth it. I could re-create a method, getCandidates, that would be a one-liner returning the required Iterator and then gather the candidates and process them separately.

In the current code using a continue point, the getCandidates method does get potential candidates and then returns to the collect method to do the processing. With the updated code using the iterator there is less of a separation between those two actions. Passing the iterator around between classes could become confusing as well. I had tried to do something similar to what you are suggesting but I was having issues with the iterator not keeping track of the candidates properly after being passed between multiple methods and classes. Most likely coder error, but I eventually caved and went back to the current form that you see now. I could try again if you feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is creating more complexity in code that is already overly complex. If we are going to change the interface, lets design it in a way that reduces complexity. It appears that the GC interfaces were done so users could potentially write their own but I don't think that ever happened. This then makes me think that if we want to make the GC pluggable, lets do it right (either as a part of the SPI or API). But this seems like much too daunting of a task and I don't think there is much for users to gain by writing their own. That brings me to the idea that we should at least refactor the GC to make it more readable and maintainable.

Unfortunately, I don't have a suggestion on how that could be done for just these few methods. I would have to spend a lot more time looking at the GC.

confirmDeletesTrace(gce, candidateMap);
gce.incrementInUseStat(origSize - candidateMap.size());
deleteConfirmed(gce, candidateMap);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous collect() method was confusing but it was the only public method of the GarbageCollectionAlgorithm so it was at least easier to follow from the one entry point. Since you are refactoring the methods, it might be a good opportunity to clean it up and make it easier to follow. The GC was already confusing but the way you split up the methods, is more confusing to me. It looks like you have the correct business logic between the methods but I think it could be organized better.

It seems the simplest solution would be to keep the same methods but just drop the continue point. Was there a reason you had to create new methods? I saw your comment in the tests about ConcurrentModificationException but I am not seeing where that could happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another look, but primarily I was following @keith-turner format suggestion in the inital ticket. Since the call to Ample takes place in the classes implementing the GarbageCollectionEnvironment code, the few short lines seemed to belong in a single method. Not sure if better naming of methods would clear things up.

Realistically the call to collect could be dropped altogether and just be replaced by a call to processCandidates (since that is all it calls anyway). I just kept it so the existing GC initial call would not be changed. Then the GCA could have a public method called' processCandidates' (or maybe even processDeletionCandidates). That seems clearer to me than using 'collect'. Then the the collectBatch call does what is implies. It collects a batch of potential deletion candidates and processes them. Maybe even get away from the 'collect' naming altogether and call it 'processBatch' or something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the ConcurrentModificationException occurs in the test. I would suggest refactoring the test GCE so that this does not happen rather than the GCA. The GarbageCollectionAlgorithm was designed to pull the logic of collection into a single place where it could be unit tested. This change pushes some logic of collection into the GarbCollEnv, resulting in a need to change logic in the test which means the full algorithm is not being tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is what was causing the CME in the unit test

this.candidates.removeAll(candidateMap.values());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keith-turner that was exactly the cause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have not looked in depth, so this may not be workable. Thinking one possible way to avoid the CME would be to make the getCandidates() method in TestGCE return a snapshot. Something like :

   Iterator<String> getCandidates() {
        return List.copyOf(candidates).iterator();
   }

jmark99 and others added 2 commits July 27, 2021 09:47
…onTest.java


Remove an extra character in the comment.

Co-authored-by: Mike Miller <mmiller@apache.org>
…onAlgorithm.java


Typo correction.

Co-authored-by: Mike Miller <mmiller@apache.org>
@keith-turner
Copy link
Contributor

@jmark99 I just took an initial look at this. I plan to take a more in-depth look later this afternoon.

@EdColeman EdColeman requested a review from mjwall July 27, 2021 15:13
@EdColeman
Copy link
Contributor

@mjwall might have additional insight if this overlaps with the issue that he has been tracking down.

jmark99 added 2 commits July 27, 2021 11:59
Refactor after updating getCandidates to return an iterator.

Flow is now more closely linked to original code but without use of
continue point.
@jmark99
Copy link
Contributor Author

jmark99 commented Jul 27, 2021

@milleruntime , @keith-turner I did some re-factoring based upon @keith-turner suggestion concerning the getCandidates method. It now returns an iterator. With that change, the flow now more closely resembles the original flow of the code but without the continue point.

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmark99 the latest changes look great. Made one small comment about making a method private, if that is possible.

Changed collectBatch access from public to private.
Copy link
Contributor

@milleruntime milleruntime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look better, thanks. Only other thing I had was the name of one method.

@mjwall
Copy link
Member

mjwall commented Jul 28, 2021

I can look at this tonight if it is still open

Renamed collectBatch method to deleteBatch.
@jmark99
Copy link
Contributor Author

jmark99 commented Jul 28, 2021

@mjwall I will wait until tomorrow to merge if you wish to take a look tonight.

@mjwall
Copy link
Member

mjwall commented Jul 29, 2021

This looks good to me @jmark99 . It is unrelated to what I am tracking in #1916. First I have looked at Ample, cool abstraction. Hope it is enough <pun intended>

@jmark99 jmark99 merged commit 485114a into apache:main Jul 29, 2021
Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late to this PR, but overall it looks great. I had only one comment about how we're tracing the candidate retrieval. Also I saw ConcurrentDeleteTableIT time out after this was committed, but I haven't had a chance to run it again to see if it's a fluke. Occasional timeouts are normal, so it might not be an issue at all.

@jmark99 jmark99 deleted the gh1351 branch October 25, 2021 17:17
@ctubbsii ctubbsii added this to the 2.1.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add test to GarbageCollectorIT for continue point or remove continue point

6 participants