Skip to content

Core: Deflake TestManifestCaching.testWeakFileIOReferenceCleanUp#5862

Merged
danielcweeks merged 8 commits intoapache:masterfrom
rizaon:master-Deflake_testWeakFileIOReferenceCleanUp
Oct 10, 2022
Merged

Core: Deflake TestManifestCaching.testWeakFileIOReferenceCleanUp#5862
danielcweeks merged 8 commits intoapache:masterfrom
rizaon:master-Deflake_testWeakFileIOReferenceCleanUp

Conversation

@rizaon
Copy link
Contributor

@rizaon rizaon commented Sep 27, 2022

This attempt to deflake testWeakFileIOReferenceCleanUp by removing the failed assertion and changing the cache executor from ForkJoinPool#commonPool to Runnable#run.

Close #5861

@github-actions github-actions bot added the core label Sep 27, 2022
int numGC = 0;
int maxGC = 100;
while (manifestCache.estimatedSize() > 2 && numGC < maxGC) {
System.gc();
Copy link
Contributor

@nastra nastra Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general it's not a good practice to depend on repeated (or even single) System.gc() calls to make something work.
I would probably suggest to use https://github.com/google/guava/blob/master/guava-testlib/src/com/google/common/testing/GcFinalization.java#L297 once here instead of plain System.gc() calls in a loop. That's also what the Caffeine lib is using for their testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced it with GcFinalization.awaitFullGc in rebased commit, a0204a3.

I understand iceberg-bundled-guava relocate guava namespace from com.google.common to org.apache.iceberg.relocated.com.google.common. I don't know if and how to treat guava-testlib the same way.

@rizaon rizaon force-pushed the master-Deflake_testWeakFileIOReferenceCleanUp branch from ca75b5a to a0204a3 Compare September 27, 2022 12:18
.create();
}

protected HadoopCatalog hadoopCatalog(Map<String, String> catalogProperties) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably be made private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

versions.props Outdated
com.fasterxml.jackson.*:* = 2.11.4
com.google.errorprone:error_prone_annotations = 2.3.3
com.google.guava:guava = 31.1-jre
com.google.guava:guava-testlib = 31.1-jre
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you should be able to replace the previous line with com.google.guava:* = 31.1-jre

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

System.gc();
GcFinalization.awaitFullGc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment in L188 says // Insert one more FileIO to trigger cache eviction. but aren't we triggering cache eviction already here?
It seems the original version of the test was expecting things to be evicted only when IO_MANIFEST_CACHE_MAX_FILEIO_DEFAULT is reached/exceeded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that comment I made is inherently flawed since I use System.gc().

The main purpose of the test is to verify that cache entry with garbage collected keys will be removed from the cache, and GcFinalization.awaitFullGc() + manifestCache.cleanUp() does evict those entries. In 572f0ca, I simplified testWeakFileIOReferenceCleanUp by not exceeding IO_MANIFEST_CACHE_MAX_FILEIO_DEFAULT.

@github-actions github-actions bot added the build label Sep 27, 2022
@rizaon
Copy link
Contributor Author

rizaon commented Sep 27, 2022

Got the following error:

Error: eckstyle] [ERROR] /home/runner/work/iceberg/iceberg/core/src/test/java/org/apache/iceberg/TestManifestCaching.java:24:1: Use org.apache.iceberg.relocated.* classes from bundled-guava module instead. [BanUnrelocatedGuavaClasses]

I'm not sure how to include guava-testlib into iceberg-bundled-guava.

CountingOutputStream.class.getName();
Suppliers.class.getName();
Stopwatch.class.getName();
GcFinalization.class.getName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue any thoughts on adding a test class to GuavaClasses vs relaxing checkstyle so that classes from com.google.common.testing are allowed in tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably allow the direct classes in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rizaon for this you'd want to add something to our checktyle rules so that classes from com.google.common.testing are allowed to be used in tests directly rather than having to bundle them via GuavaClasses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3063b1a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the suppression like this would allow people to e.g. use com.google.common.base.Preconditions directly in tests right? We still want to make sure that classes from com.google.common are used via org.apache.iceberg.relocated.* in tests as well.
You might need to play a bit with the correct checkstyle configuration so that

  • all java code can't import com.google.common directly
  • test java code can import com.google.common.testing only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if the rule breakdown in 3e06421 is OK.

@rdblue
Copy link
Contributor

rdblue commented Oct 4, 2022

@danielcweeks, can you help review this as well since you reviewed the original PR?

@rdblue rdblue requested a review from danielcweeks October 4, 2022 17:46
@github-actions github-actions bot added the INFRA label Oct 5, 2022
Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM now, thanks for fixing @rizaon

FileIO lastIO = cacheEnabledHadoopFileIO();
ContentCache lastCache = contentCache(manifestCache, lastIO);
System.gc();
GcFinalization.awaitFullGc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems like a better approach, but it looks like the recommendation is to use awaitClear/awaitDone with some condition and timeout if possible. awaitFullGc appears to only wait for a single reference (probably something they construct as part of the test). Seems likely that we could still end up in a flaky test condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 713679b

@danielcweeks danielcweeks merged commit 5eb78a4 into apache:master Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test flakiness: TestManifestCaching > testWeakFileIOReferenceCleanUp

4 participants