-
Notifications
You must be signed in to change notification settings - Fork 478
avoids creating path object for GC refs #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In apache#3500 it was observed that GC spends a lot of time creating Path objects for each Reference. This changes avoids creating Path objects for the GC case.
ctubbsii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The StoredTabletFile is still a somewhat nicer API than going back to using String again (especially when it's a collection of pairs of strings, which gets quite confusing). If the problem is that StoredTabletFile is doing unnecessary validation, can we have a specific kind of TabletFile that is lighter weight, but still more expressive than strings?
core/src/main/java/org/apache/accumulo/core/metadata/schema/TabletMetadata.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/metadata/schema/TabletMetadata.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/metadata/schema/TabletMetadata.java
Show resolved
Hide resolved
|
Tried running the test mentioned on #3500 accumulo-jmh-test with these changes. Seeing a noticeable difference. Against 2.1.0 saw the following for the benchmark. With these changes and running the test against a locally installed 2.1.1-SNAPSHOT saw the following. |
I looked into refactoring StoredTabletFile first and decided against that as I could not see an easy way to do it. I think the changes @cshannon has made in 3.0 branch would be a prereq for doing this refactoring. We could open a follow on issue about making StoredTabletFile do lazy validation for 3.0. When those changes are made in 3.0 these changes could be removed. |
|
@keith-turner - I can do a follow on issue to do lazy validation because I agree with @ctubbsii that keeping the new API is a lot nicer than strings so being able to revert these changes in 3.0 would be nice. |
|
I added a PR accumulo-jmh-test so it can now run on the command line. |
cshannon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, I am working on a PR now for main that will fix this issue inside StoredTabletFile and will be up shortly.
|
At it's core, Path is a URI. The Path constructors perform validation on the input, then create and set the URI. Except, when you use the constructor Path(URI), that does not occur and does not incur the cost of the validation. If we used the Path constructors (and the validation) when writing to the metadata table, but used Path(URI) when reading from the metadata table, then we could still keep the Path object vs going back to String. |
The primary bottleneck does not appear to be he actual creation of the Path(). It turns out it's the call to Path.getParent() that is the main problem which is only done in our validation logic. |
|
PR to fix this in main is up: #3502 |
Right, so we call Path.getParent(), with calls new Path(String, String, String). Here's Path.getParent: The issue is the last line of the method, right? Could we not create a URI from (String, String, String) and then call Path(URI)? |
|
Is the plan to merge this to 2.1, and omit this change when merging to 3.0 ( |
The only thing to merge forward would be the change to |
Or just commit the change to Pair, because it's so trivial. If you want it "reviewed", consider this comment my +1 for the Pair changes. |
|
I think I will pull the pair changes out as their own commit push merge them and as @ctubbsii mentioned they are reviewed so I will not do another PR. |
@dlmarion do you think we might be able to optimize the validation instead of avoid it? |
|
Pushed the pair changes to 2.1 and main in fbb54c3 |
You mean like override |
|
accumulo/core/src/main/java/org/apache/accumulo/core/metadata/TabletFile.java Lines 61 to 78 in b759a14
Seems like we could just change the validation code to operate on the |
I am not sure, I have not looked into optimizing what is there. When I made this PR I was working under the assumption that it was the path object in general that was causing problems. However after making this PR it was discovered it was the way Accumulo using the path object. So I was wondering if we should pursue optimizing the validation instead of avoiding it. If we are going to avoid validating on read in 2.1.2, then I am wondering if we should validate on write in 2.1.2 as proposed in #3504. |
|
@keith-turner - see #3509 |
I removed the calls to |
|
If we merged in #3509 then I would not be opposed to reverting #3502, but I could see an argument either way. Lazy loading and lazy validation helps performance if you don't need to validate but there is always the issue of a mistake being made and you do lose the immediate check on creation so you may cause a runtime exception later. Adding validation to MetadataContraints would help prevent that in #3504 and #3506 but no guarantee future changes wouldn't invalid metadata be passed and not caught until later . It would make the changes in #3504 simpler if we reverted the lazy load as I had to do some extra work to specifically not lazy load to ensure validation. |
|
Closing based on last comment because #3509 is merged and the performance numbers look good on that PR. Can re-open to address this as follow-on, if still needed. |
In #3500 it was observed that GC spends a lot of time creating Path objects for each Reference. This changes avoids creating Path objects for the GC case.
Currently have tested these changes without issue against tests matching the pattern
*GC*ITand*Garb*IT. Looking into doing some further performance test to see if there is an observable difference.