Core: Deflake TestManifestCaching.testWeakFileIOReferenceCleanUp#5862
Conversation
| int numGC = 0; | ||
| int maxGC = 100; | ||
| while (manifestCache.estimatedSize() > 2 && numGC < maxGC) { | ||
| System.gc(); |
There was a problem hiding this comment.
in general it's not a good practice to depend on repeated (or even single) System.gc() calls to make something work.
I would probably suggest to use https://github.com/google/guava/blob/master/guava-testlib/src/com/google/common/testing/GcFinalization.java#L297 once here instead of plain System.gc() calls in a loop. That's also what the Caffeine lib is using for their testing
There was a problem hiding this comment.
Replaced it with GcFinalization.awaitFullGc in rebased commit, a0204a3.
I understand iceberg-bundled-guava relocate guava namespace from com.google.common to org.apache.iceberg.relocated.com.google.common. I don't know if and how to treat guava-testlib the same way.
ca75b5a to
a0204a3
Compare
| .create(); | ||
| } | ||
|
|
||
| protected HadoopCatalog hadoopCatalog(Map<String, String> catalogProperties) throws IOException { |
There was a problem hiding this comment.
can probably be made private
versions.props
Outdated
| com.fasterxml.jackson.*:* = 2.11.4 | ||
| com.google.errorprone:error_prone_annotations = 2.3.3 | ||
| com.google.guava:guava = 31.1-jre | ||
| com.google.guava:guava-testlib = 31.1-jre |
There was a problem hiding this comment.
nit: you should be able to replace the previous line with com.google.guava:* = 31.1-jre
| } | ||
|
|
||
| System.gc(); | ||
| GcFinalization.awaitFullGc(); |
There was a problem hiding this comment.
the comment in L188 says // Insert one more FileIO to trigger cache eviction. but aren't we triggering cache eviction already here?
It seems the original version of the test was expecting things to be evicted only when IO_MANIFEST_CACHE_MAX_FILEIO_DEFAULT is reached/exceeded
There was a problem hiding this comment.
I think that comment I made is inherently flawed since I use System.gc().
The main purpose of the test is to verify that cache entry with garbage collected keys will be removed from the cache, and GcFinalization.awaitFullGc() + manifestCache.cleanUp() does evict those entries. In 572f0ca, I simplified testWeakFileIOReferenceCleanUp by not exceeding IO_MANIFEST_CACHE_MAX_FILEIO_DEFAULT.
…CACHE_MAX_FILEIO_DEFAULT
|
Got the following error:
I'm not sure how to include guava-testlib into iceberg-bundled-guava. |
| CountingOutputStream.class.getName(); | ||
| Suppliers.class.getName(); | ||
| Stopwatch.class.getName(); | ||
| GcFinalization.class.getName(); |
There was a problem hiding this comment.
@rdblue any thoughts on adding a test class to GuavaClasses vs relaxing checkstyle so that classes from com.google.common.testing are allowed in tests?
There was a problem hiding this comment.
I'd probably allow the direct classes in tests.
There was a problem hiding this comment.
@rizaon for this you'd want to add something to our checktyle rules so that classes from com.google.common.testing are allowed to be used in tests directly rather than having to bundle them via GuavaClasses
There was a problem hiding this comment.
Using the suppression like this would allow people to e.g. use com.google.common.base.Preconditions directly in tests right? We still want to make sure that classes from com.google.common are used via org.apache.iceberg.relocated.* in tests as well.
You might need to play a bit with the correct checkstyle configuration so that
- all java code can't import
com.google.commondirectly - test java code can import
com.google.common.testingonly
There was a problem hiding this comment.
Please check if the rule breakdown in 3e06421 is OK.
|
@danielcweeks, can you help review this as well since you reviewed the original PR? |
| FileIO lastIO = cacheEnabledHadoopFileIO(); | ||
| ContentCache lastCache = contentCache(manifestCache, lastIO); | ||
| System.gc(); | ||
| GcFinalization.awaitFullGc(); |
There was a problem hiding this comment.
So this seems like a better approach, but it looks like the recommendation is to use awaitClear/awaitDone with some condition and timeout if possible. awaitFullGc appears to only wait for a single reference (probably something they construct as part of the test). Seems likely that we could still end up in a flaky test condition.
This attempt to deflake testWeakFileIOReferenceCleanUp by removing the failed assertion and changing the cache executor from
ForkJoinPool#commonPooltoRunnable#run.Close #5861