-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-33641][test] Suppress the DirectoryNotEmptyException in StreamingWithStateTestBase to prevent test failures #23914
Conversation
…ingWithStateTestBase to prevent test failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. while going through the code and Jira once more: I second what @snuyanzin suggests in his comment in FLINK-33641. StreamingTestBase#tempFolder
is a class field which makes all tests use the same temporary folder. You might want to make it an object field, instead. Otherwise, we might run into the same issue in other tests later on if a contributor overlooks that the temporary folder is shared between test classes. That should fix FLINK-33641. WDYT?
@@ -22,10 +22,14 @@ | |||
import org.apache.flink.test.junit5.MiniClusterExtension; | |||
|
|||
import org.junit.jupiter.api.extension.RegisterExtension; | |||
import org.slf4j.Logger; | |||
import org.slf4j.LoggerFactory; | |||
|
|||
/** Base class for unit tests that run multiple tests and want to reuse the same Flink cluster. */ | |||
public class StreamAbstractTestBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamAbstractTestBase
is only used by StreamingTestBase
. Can't we merge the two? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some attempts to put MiniClusterExtension
in StreamingTestBase
, but it violated the rules of Arch Unit Test (the scala code could not be well recognized).
|
||
/** Base class for unit tests that run multiple tests and want to reuse the same Flink cluster. */ | ||
public class StreamAbstractTestBase { | ||
|
||
protected final Logger log = LoggerFactory.getLogger(getClass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The logger is not used anywhere else except for StreamingWithStateTestBase
. It feels like a premature optimization with the risk that other test classes will miss that there's a protected log field already present and will create their own logger, anyway. That's a proposal you can reject if you want, but to me it would be good enough to have the code change being done in StreamingWithStateTestBase
as a private field.
@@ -63,7 +64,7 @@ class StreamingWithStateTestBase(state: StateBackendMode) extends StreamingTestB | |||
override def before(): Unit = { | |||
super.before() | |||
// set state backend | |||
baseCheckpointPath = tempFolder.toFile | |||
baseCheckpointPath = Files.createTempDirectory("junit").toFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
baseCheckpointPath = Files.createTempDirectory("junit").toFile | |
baseCheckpointPath = Files.createTempDirectory(tempFolder, "junit").toFile |
maybe, we should still utilize the temporary folder that is created in the parent class to utilize JUnit5's temporary folder feature.
@Jiabao-Sun I would suggest to update JUnit #23917 where they improved error output for |
Thanks @XComp, @snuyanzin.
The |
Thanks for clarification. My lack of Scala knowledge made me mix up the class and companion object declaration.
That, I still don't get. Why would concurrent execution of tests cause this issue if each test class/instance has it's own temporary directory (because the tempFolder field is an object field)? Or do you mean if "test methods" are run concurrently? As far as I understand, that shouldn't happen: The concurrent test method execution has to be enabled explicitly using |
@Jiabao-Sun , @XComp junit5.10.1 makes it always failing and it is becoming a bit more clear there are 2 threads
suspect that in JUnit5 they made removal in Since junit 5.10.1 makes it failing even locally val baseCheckpointPath = Files.createTempDirectory(getClass.getCanonicalName)
Files.deleteIfExists(baseCheckpointPath); locally it helps |
Thanks for sharing this finding, @snuyanzin . That would mean that we should wait for the snapshot cleanup to finish before calling |
Thanks @snuyanzin @XComp, I'll try to fix it. |
ideally yes |
…it rules restrictions
Hi @snuyanzin, I think the I debugged it locally, and the directory structure cleared by SnapshotDirectory is start with
I tend to avoid too much CI failure through this PR until we find the root cause. |
I tested with junit5.10.1 and I think it also introduced some changes
May be they fixed something between these versions (They have a number of statements about I would propose to go with newer JUnit to avoid refixing this again after update |
I put commit with WA I've described above at #23917 moreover I've scheduled it both in flink ci and my own ci more than 5 times in total and there is no any failure... If there is no objections we can merge it to avoid too much ci failures and continue looking for more optimal solution in a calmer way https://dev.azure.com/snuyanzin/flink/_build/results?buildId=2678&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55493&view=results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting catch with the ArchUnit failure. Does it mean that ArchJunit is not able to recognize Scala? Because the MiniClusterExtension was still present; just in a different parent class (which is written in scala). 🤔
Anyway, the change looks reasonable. 👍 I created FLINK-33820 to cover the investigation efforts and added a in-code comment to your PR referring to it. But we should get the workaround into master rather sooner than later to make master
more stable again.
Thanks for the good collaboration @Jiabao-Sun and @snuyanzin . Much appreciated :-)
Thanks @XComp, the ArchUnit's rules require the object StreamingTestBase {
@RegisterExtension
private val _: MiniClusterExtension = new MiniClusterExtension(
() =>
new MiniClusterResourceConfiguration.Builder()
.setNumberTaskManagers(1)
.setNumberSlotsPerTaskManager(4)
.build())
} |
Sounds reasonable. Can you create a follow-up ticket for that one? |
@XComp , @Jiabao-Sun |
ArchUnit does not support scala, it is mentioned in readme flink/flink-architecture-tests/README.md Lines 62 to 66 in 1c884ab
|
Thanks @snuyanzin, it makes sense to me. |
What is the purpose of the change
[FLINK-33641][test] Suppress the DirectoryNotEmptyException in StreamingWithStateTestBase to prevent test failures
Brief change log
Suppress the DirectoryNotEmptyException in StreamingWithStateTestBase to prevent test failures
Verifying this change
This change is already covered by existing tests.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation